Reading PDF files on a portable device

This is a holy grail for people who want to carry their libraries in their pockets and read without glasses or special lighting. The problem is that a lot of the people who distribute ebooks seem to think that using a page-description language like PDF is a suitable distribution method. But actually reading something formatted for an 8″x6″ page on a 2″x3.5″ screen is difficult.

I have it figured out. I’m not sure if this is something that’s changed recently in the software, or if it was always like this and I was too stupid to figure it out.

I’ve asked on several mailing lists, and they seem to have been too stupid to figure it out, too. The best suggestion was from Peter
Flynn
who posted a LaTeX file that resizes the PDF’s pages to be the size of your device, and trims the margins. But on my Nokia N810, I need reading glasses to read the resulting page. You can make a case that I should get better reading glasses, but I don’t think I’d enjoy even good ones.

My new method is as follows:

  • Run:
    pdftohtml -stdout pdf-filename > html-filename
  • open the html-filename in emacs.
  • There are two major problems you want to fix here:
    • Every line in the original ends with a break. The original line-length of the paper book is unlikely to be useful on your portable device, so what you need, and what I’ve always before failed to fine, is a way to distinguish the breaks that are actually new paragraphs from the ones that are just line breaks. For the PDF files I’ve looked at since yesterday, the ones that are just linebreaks end in &nbsp;<br>, and the ones that are new paragraphs end in <br>. So what I’m doing these days is replacing the &nbsp;<br> with just a space. I may decide at some point to replace the <br>‘s with <p>‘s, but so far what I’m doing now looks pretty good.
    • There is junk like page numbers between the pages. This varies by book, but for the book I’m reading at the moment, there was a file url at the bottom of every page and an anchor tag of a line like “dummy 2” at the top of every page. It would be real programming to write something that would continue a paragraph across a page break like this, so I’m putting up with new pages translating to new paragraphs even when they obviously shouldn’t. But I’m using emacs to remove anything that writes some distracting non-text. In this case, that’s removing the file url and the “dummy 2” text. Be careful about the file url — it might not be on a line by itself.

If you’re using FBReader on a device with a sizeable memory card, you’re done. Just put this html file on your device.

Otherwise, do whatever you normally do to sizeable html files (zipping is probably a good idea) and put that on your device.

Marty Sasaki, RIP

[marty from post to his high school facebook page]

Marty from post to his high school facebook page

Marty’s death apparently happened about six months ago.
He stopped posting to his blog
on August 13. His recorder teacher, who told me about it, had
seen him at her student recital (which may have been the one on
September 12) two days before he died.

[marty from fellow photographer's page]

Marty from fellow photographer’s tripod page

We shared a cubicle in 1981-2, when we were both programmers in
the Radiology Department of the Brigham and Women’s Hospital.
Although it was at that point one of the better jobs I’ve ever had
in my life, we both found some of the political aspects of it
frustrating. We would occasionally both get into his car and go
to a hill in Brookline and fly kites.

[One of Marty's kites]

One of Marty’s kites

He was at that point not long out of MIT, and in much better
touch with the cutting edge of programming than I was, so I
learned a lot from him. He was the first person I ever saw using
emacs, and it was his copy of The TEXbook that
introduced me to Donald Knuth and TEX.

When he left that job for another job in the Harvard Medical
Area, he was the first person I ever kept in touch with by email
and a “talk” program that ran on the Vax.

We eventually fell out of touch, but then when I was just
starting to be the Administrator of the Boston Recorder
Society
, I got an email from him (in my capacity as
administrator; we’d neither of us particularly identified as
recorder players when we knew each other). He was thinking about
picking up the recorder again, and wondered if what the BRS was
doing would help. He must have decided that it wouldn’t, because
I don’t think he ever came to one of our meetings, but he did get
involved in other recorder-related activities in the Boston area,
and I occasionally saw him there.

The most recent real conversation we had was when he came as
part of the group that spelled the Cantabile Band at the Walk for
Hunger last year. He was looking quite a bit thinner than when
I’d most recently seen him, and seeming more mobile. We talked
about how much more energy blogging takes than you would expect,
and about the process of winding up the affairs of a dead person.
He was talking to me instead of playing because he’d gotten
frustrated by the playing — most of the other players in the
group were a lot more experienced than he was. But I had a bit
the same sense of returning peace that I remembered from flying
kites on the hill in Brookline.

He will be remembered at a recital on Saturday.
I won’t be able to go, because there’s a memorial service for
another friend at the same time. Having conflicting memorial
services makes me feel old, but that’s another post.

Cryptonomicon

I assumed when I started reading this
book
that it was a sequel to The
Baroque Cycle
, but it turns out that it was
actually published four years before Quicksilver,
the first volume of the Cycle.

Stephenson says about the project:

The series will incorporate many characters and
stories, tied together by a few common threads. For example,
certain family names keep popping up. Crypto, money, and
computers seem to find their way into all of the
storylines.

I was sure I enjoyed Cryptonomicon more for
having read it after Baroque Cycle, but then I
reread the first chapter of Quicksilver because it
was provided free at the end of the Cryptonomicon
ebook, and I realized that I’d probably have enjoyed it more if
I’d already met Enoch Root and Daniel Waterhouse’s descendants,
too.

So if you want to read long novels with topics to do with
history and science and technology, start wherever you like.
Probably the best guide is which period you’re more interested
in the history of: the 17th and 18th centuries or the 20th
century.

I was amused that a book about the wonders of modern
cryptography would have the boilerplate DRM at the end:

By payment of the required fees, you have been granted
the non-exclusive, non-transferable right to access and read the
text of this e-book on-screen. No part of this text may be
reproduced, transmitted, down-loaded, decompiled, reverse
engineered, or stored in or introduced into any information
storage and retrieval system, in any form or by any means,
whether electronic or mechanical, now known or hereinafter
invented, without the express written permission of
PerfectBound.

I’m running late today, so I’ll reserve the right to discuss
this book more later, but consider this a recommendation.

http://rcm.amazon.com/e/cm?t=laymusicorg-20&o=1&p=8&l=as1&asins=0060512806&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=000000&bg1=FFFFFF&f=ifr

The Elizabethans were doing ASCII sorting

One of the oddities of Elizabethan publishing, which I have
retained in my transcriptions of Elizabethan music, is that they
write roman numerals differently from the way your clock does.

Specifically, your clock writes “4” as “IV”, that is, one
subtracted from 5. The Elizabethans didn’t do that — they
wrote “IIII”, and similarly “VIIII” for “9” and “XVIIII” for
“19”.

There turns out to be a major advantage to this for computer
sorting — if you don’t go up past 50, the ascii roman numeral
sort ends up in numeric order. If you were to sort the digits
on a clock in ascii, you would end up with “IX” coming before
“VIII”, but in the Elizabethan coding, “VIII”, “VIIII”, and “X”
come out in the right order (unlike “8”, “9”, and “10”).

I was thinking I might have to write some code to get the
pieces in the right order, but a typical Elizabethan music book
has 20 or 21 pieces in it, so using their roman numerals, I can
just tell mysql to “order by” and everything just works!

What ereader device do I recommend

You would expect a one-a-day blog project like this to run into
trouble in December, and this year is worse than most for that.
In addition to the party (and attendant cleaning and
cooking) and shopping and spending several days in Fall River, I
also lost a day on Tuesday officiating at a special election and
I have the December 17 concert and I’m trying to wrap up Bonnie’s
estate by the end of the year.

One of the tricks I’ve learned for writing a blog post when you
don’t have time to write a blog post is to take something out of
an email you wrote someone. This morning someone asked me to
tell them what ebook reader I recommend, and this is what I
wrote:

I don’t recommend any of the special-purpose e-readers, but the Sony
is probably better than the Kindle if you want to get locked into a
single-purpose, black and white device that doesn’t fit in your
pocket. Everybody who’s actually seen a Barnes and Noble Nook seems
to hate it, and apparently you won’t be able to get one until January
at the earliest.

What I use is a Nokia N810 Internet Tablet. They aren’t making them
any more, but you might still be able to buy one somewhere. They’ve
been replaced by the N900, which is also a phone, but not a phone
anyone in this country would want to actually use.

The iPod Touch is another one to think about. It’s a bit smaller and
lower resolution than what I have, but of course you can use all those
thousands of apps in the app store. John really likes one that has a
candle on the screen and you can blow on it and snuff it out.

Or of course, if they already have iPhones they should try the reading
applications on that. Stanza seems to be the one a lot of people
like.

If you don’t insist on putting it in your pocket, some of the netbooks
are good deals, and give you a lot more functionality for less money
than the Kindle or the Sony.

The best website for reading long discussions about this is
teleread.org.

I later added:

The other thing wrong with the e-ink devices (Sony and Kindle and
Nook) is that you need a reading light to read in bed.

And I should have added that some of them have fairly limited
support for using larger fonts, which is strange since being
able to read at your preferred font size is one of the major
advantages of ebooks over dead tree ones.

Corporate bureaucracy

A dog park friend works as a web developer at some corporation
that believes a computer is a computer is a computer.

So the computer they buy for everyone works quite well for the
people who use one browser and a wordprocessor and maybe
a spreadsheet and maybe a mail client.

But if you do web development, you have to test on lots of
browsers, and often have lots of windows open in each, and an
editor with numerous windows open.

So the way he tells the story, he had requested numerous times
that he get more memory on his computer. And then one day he
realized that he could double the amount of memory for $10, and
he had that much in his pocket, so he did it. And then they
yelled at him for not following the proper procedure.

Borrowed another ebook

This one’s even worse than the
first one
from a usability standpoint.

The problem is that this one’s a PDF file, but instead of
reading it with one of the many excellent PDF readers in the
world (including Adobe’s), I still have to read it with Adobe
Digital Editions.

Adobe Digital Editions, instead of having menus across the top
with helpful items like “rotate screen”, and “go to full screen for
the text”, has buttons scattered around the part of the screen
that isn’t text. With the epub format, two of the buttons
enlarged and reduced the font size, but the PDF’s don’t reflow,
so all you can do is change the size of the text window. The
largest size I managed to get on my 14 inch laptop is readable,
but if I had an “enlarge font” button, I would still push it.
Especially if I were trying to read in bed, which I haven’t
bothered to do with this one.

On reading the epub book last week, I found myself wishing I
had a netbook, but with this one, I doubt that I would be able
to get a readable size of text, so this book would probably be
even less readable with a netbook.

It isn’t clear what the rationale for having some books in epub
format and some in PDF, but they seem to be about half and half,
so if there are only 108 books and half of them are unreadable,
that gives me even less incentive to buy another gadget.

I should mention that my eyes are a lot better than those of
most people my age. When I was younger I was unusually good at
reading fine print. Until I turned 40, I could read the
condensed Oxford English Dictionary without the magnifying
glass. Now I still don’t carry reading glasses
around with me, although in my home, I usually do have a pair
within reach. So if I can’t get a good font, there are a
lot of people in the world who can’t read the book even by squinting.

I think this is our tax dollars at work. It’s sad that people
whose job is to serve the public have so little concept of
how to implement technology to do that.