Reading PDF files on a portable device

This is a holy grail for people who want to carry their libraries in their pockets and read without glasses or special lighting. The problem is that a lot of the people who distribute ebooks seem to think that using a page-description language like PDF is a suitable distribution method. But actually reading something formatted for an 8″x6″ page on a 2″x3.5″ screen is difficult.

I have it figured out. I’m not sure if this is something that’s changed recently in the software, or if it was always like this and I was too stupid to figure it out.

I’ve asked on several mailing lists, and they seem to have been too stupid to figure it out, too. The best suggestion was from Peter
who posted a LaTeX file that resizes the PDF’s pages to be the size of your device, and trims the margins. But on my Nokia N810, I need reading glasses to read the resulting page. You can make a case that I should get better reading glasses, but I don’t think I’d enjoy even good ones.

My new method is as follows:

  • Run:
    pdftohtml -stdout pdf-filename > html-filename
  • open the html-filename in emacs.
  • There are two major problems you want to fix here:
    • Every line in the original ends with a break. The original line-length of the paper book is unlikely to be useful on your portable device, so what you need, and what I’ve always before failed to fine, is a way to distinguish the breaks that are actually new paragraphs from the ones that are just line breaks. For the PDF files I’ve looked at since yesterday, the ones that are just linebreaks end in &nbsp;<br>, and the ones that are new paragraphs end in <br>. So what I’m doing these days is replacing the &nbsp;<br> with just a space. I may decide at some point to replace the <br>‘s with <p>‘s, but so far what I’m doing now looks pretty good.
    • There is junk like page numbers between the pages. This varies by book, but for the book I’m reading at the moment, there was a file url at the bottom of every page and an anchor tag of a line like “dummy 2” at the top of every page. It would be real programming to write something that would continue a paragraph across a page break like this, so I’m putting up with new pages translating to new paragraphs even when they obviously shouldn’t. But I’m using emacs to remove anything that writes some distracting non-text. In this case, that’s removing the file url and the “dummy 2” text. Be careful about the file url — it might not be on a line by itself.

If you’re using FBReader on a device with a sizeable memory card, you’re done. Just put this html file on your device.

Otherwise, do whatever you normally do to sizeable html files (zipping is probably a good idea) and put that on your device.

Marty Sasaki, RIP

[marty from post to his high school facebook page]

Marty from post to his high school facebook page

Marty’s death apparently happened about six months ago.
He stopped posting to his blog
on August 13. His recorder teacher, who told me about it, had
seen him at her student recital (which may have been the one on
September 12) two days before he died.

[marty from fellow photographer's page]

Marty from fellow photographer’s tripod page

We shared a cubicle in 1981-2, when we were both programmers in
the Radiology Department of the Brigham and Women’s Hospital.
Although it was at that point one of the better jobs I’ve ever had
in my life, we both found some of the political aspects of it
frustrating. We would occasionally both get into his car and go
to a hill in Brookline and fly kites.

[One of Marty's kites]

One of Marty’s kites

He was at that point not long out of MIT, and in much better
touch with the cutting edge of programming than I was, so I
learned a lot from him. He was the first person I ever saw using
emacs, and it was his copy of The TEXbook that
introduced me to Donald Knuth and TEX.

When he left that job for another job in the Harvard Medical
Area, he was the first person I ever kept in touch with by email
and a “talk” program that ran on the Vax.

We eventually fell out of touch, but then when I was just
starting to be the Administrator of the Boston Recorder
, I got an email from him (in my capacity as
administrator; we’d neither of us particularly identified as
recorder players when we knew each other). He was thinking about
picking up the recorder again, and wondered if what the BRS was
doing would help. He must have decided that it wouldn’t, because
I don’t think he ever came to one of our meetings, but he did get
involved in other recorder-related activities in the Boston area,
and I occasionally saw him there.

The most recent real conversation we had was when he came as
part of the group that spelled the Cantabile Band at the Walk for
Hunger last year. He was looking quite a bit thinner than when
I’d most recently seen him, and seeming more mobile. We talked
about how much more energy blogging takes than you would expect,
and about the process of winding up the affairs of a dead person.
He was talking to me instead of playing because he’d gotten
frustrated by the playing — most of the other players in the
group were a lot more experienced than he was. But I had a bit
the same sense of returning peace that I remembered from flying
kites on the hill in Brookline.

He will be remembered at a recital on Saturday.
I won’t be able to go, because there’s a memorial service for
another friend at the same time. Having conflicting memorial
services makes me feel old, but that’s another post.


I assumed when I started reading this
that it was a sequel to The
Baroque Cycle
, but it turns out that it was
actually published four years before Quicksilver,
the first volume of the Cycle.

Stephenson says about the project:

The series will incorporate many characters and
stories, tied together by a few common threads. For example,
certain family names keep popping up. Crypto, money, and
computers seem to find their way into all of the

I was sure I enjoyed Cryptonomicon more for
having read it after Baroque Cycle, but then I
reread the first chapter of Quicksilver because it
was provided free at the end of the Cryptonomicon
ebook, and I realized that I’d probably have enjoyed it more if
I’d already met Enoch Root and Daniel Waterhouse’s descendants,

So if you want to read long novels with topics to do with
history and science and technology, start wherever you like.
Probably the best guide is which period you’re more interested
in the history of: the 17th and 18th centuries or the 20th

I was amused that a book about the wonders of modern
cryptography would have the boilerplate DRM at the end:

By payment of the required fees, you have been granted
the non-exclusive, non-transferable right to access and read the
text of this e-book on-screen. No part of this text may be
reproduced, transmitted, down-loaded, decompiled, reverse
engineered, or stored in or introduced into any information
storage and retrieval system, in any form or by any means,
whether electronic or mechanical, now known or hereinafter
invented, without the express written permission of

I’m running late today, so I’ll reserve the right to discuss
this book more later, but consider this a recommendation.

The Elizabethans were doing ASCII sorting

One of the oddities of Elizabethan publishing, which I have
retained in my transcriptions of Elizabethan music, is that they
write roman numerals differently from the way your clock does.

Specifically, your clock writes “4” as “IV”, that is, one
subtracted from 5. The Elizabethans didn’t do that — they
wrote “IIII”, and similarly “VIIII” for “9” and “XVIIII” for

There turns out to be a major advantage to this for computer
sorting — if you don’t go up past 50, the ascii roman numeral
sort ends up in numeric order. If you were to sort the digits
on a clock in ascii, you would end up with “IX” coming before
“VIII”, but in the Elizabethan coding, “VIII”, “VIIII”, and “X”
come out in the right order (unlike “8”, “9”, and “10”).

I was thinking I might have to write some code to get the
pieces in the right order, but a typical Elizabethan music book
has 20 or 21 pieces in it, so using their roman numerals, I can
just tell mysql to “order by” and everything just works!

What ereader device do I recommend

You would expect a one-a-day blog project like this to run into
trouble in December, and this year is worse than most for that.
In addition to the party (and attendant cleaning and
cooking) and shopping and spending several days in Fall River, I
also lost a day on Tuesday officiating at a special election and
I have the December 17 concert and I’m trying to wrap up Bonnie’s
estate by the end of the year.

One of the tricks I’ve learned for writing a blog post when you
don’t have time to write a blog post is to take something out of
an email you wrote someone. This morning someone asked me to
tell them what ebook reader I recommend, and this is what I

I don’t recommend any of the special-purpose e-readers, but the Sony
is probably better than the Kindle if you want to get locked into a
single-purpose, black and white device that doesn’t fit in your
pocket. Everybody who’s actually seen a Barnes and Noble Nook seems
to hate it, and apparently you won’t be able to get one until January
at the earliest.

What I use is a Nokia N810 Internet Tablet. They aren’t making them
any more, but you might still be able to buy one somewhere. They’ve
been replaced by the N900, which is also a phone, but not a phone
anyone in this country would want to actually use.

The iPod Touch is another one to think about. It’s a bit smaller and
lower resolution than what I have, but of course you can use all those
thousands of apps in the app store. John really likes one that has a
candle on the screen and you can blow on it and snuff it out.

Or of course, if they already have iPhones they should try the reading
applications on that. Stanza seems to be the one a lot of people

If you don’t insist on putting it in your pocket, some of the netbooks
are good deals, and give you a lot more functionality for less money
than the Kindle or the Sony.

The best website for reading long discussions about this is

I later added:

The other thing wrong with the e-ink devices (Sony and Kindle and
Nook) is that you need a reading light to read in bed.

And I should have added that some of them have fairly limited
support for using larger fonts, which is strange since being
able to read at your preferred font size is one of the major
advantages of ebooks over dead tree ones.

Corporate bureaucracy

A dog park friend works as a web developer at some corporation
that believes a computer is a computer is a computer.

So the computer they buy for everyone works quite well for the
people who use one browser and a wordprocessor and maybe
a spreadsheet and maybe a mail client.

But if you do web development, you have to test on lots of
browsers, and often have lots of windows open in each, and an
editor with numerous windows open.

So the way he tells the story, he had requested numerous times
that he get more memory on his computer. And then one day he
realized that he could double the amount of memory for $10, and
he had that much in his pocket, so he did it. And then they
yelled at him for not following the proper procedure.

Borrowed another ebook

This one’s even worse than the
first one
from a usability standpoint.

The problem is that this one’s a PDF file, but instead of
reading it with one of the many excellent PDF readers in the
world (including Adobe’s), I still have to read it with Adobe
Digital Editions.

Adobe Digital Editions, instead of having menus across the top
with helpful items like “rotate screen”, and “go to full screen for
the text”, has buttons scattered around the part of the screen
that isn’t text. With the epub format, two of the buttons
enlarged and reduced the font size, but the PDF’s don’t reflow,
so all you can do is change the size of the text window. The
largest size I managed to get on my 14 inch laptop is readable,
but if I had an “enlarge font” button, I would still push it.
Especially if I were trying to read in bed, which I haven’t
bothered to do with this one.

On reading the epub book last week, I found myself wishing I
had a netbook, but with this one, I doubt that I would be able
to get a readable size of text, so this book would probably be
even less readable with a netbook.

It isn’t clear what the rationale for having some books in epub
format and some in PDF, but they seem to be about half and half,
so if there are only 108 books and half of them are unreadable,
that gives me even less incentive to buy another gadget.

I should mention that my eyes are a lot better than those of
most people my age. When I was younger I was unusually good at
reading fine print. Until I turned 40, I could read the
condensed Oxford English Dictionary without the magnifying
glass. Now I still don’t carry reading glasses
around with me, although in my home, I usually do have a pair
within reach. So if I can’t get a good font, there are a
lot of people in the world who can’t read the book even by squinting.

I think this is our tax dollars at work. It’s sad that people
whose job is to serve the public have so little concept of
how to implement technology to do that.

Bought some electronics

I had a conversation at the dog park a couple of weeks ago with someone who’s more
expert than I am about broadcast television and maybe some other
kinds of consumer electronics. I asked him what he has for an
audio setup in his living room.

He said he went with a cheap surround sound setup and is
replacing things as they break or he gets disgusted with

He currently has a good center speaker (because the original
cheap one broke), which makes TV and
movies sound pretty good. He’s thinking about upgrading the
front speakers, because when he plays music, it all goes through
those, and they don’t sound as good as the center channel.

We decided that I could just buy the center channel and a
receiver, and use my current speakers and subwoofer, plus the
rear speakers from my computer set which I never really bothered
to wire to the rear of the computer room.

In fact, when I went to order, the receiver I ended up with
came with a free pair of rear speakers, so I’ll be able to use the
computer rear speakers for whatever the 6’th and 7’th speakers
in a 7.1 channel setup are.

And I broke down and bought the cheapest blu-ray disk player
that connects to netflix. When I’m tired, turning the computer
on and booting windows and firing up the Internet Explorer
browser to watch my Netflix Watch Now stuff is too hard, and I
end up watching junk on the TV set.


I also wrote the local linux users mailing list for advice
about external speakers that would stand up well to being put in
a backpack and taken to my mother’s once a month or so, so that
I’d have off-site backup.

The consensus was that you should buy an aluminum external
enclosure and a recognized brand of SATA internal drive with a
good warranty.

So what’s coming is 2 of those, and two Western Digital 500 GB
drives, for about $100.

Other stuff

And while I was at it I bought a long USB extension, because I
don’t seem to be able to keep wireless keyboards and mice
working on the living room computer.

And some speaker wire with connectors, in case what I have
isn’t the right stuff for connecting my old stuff to the new

This should all come the end of next week — I’m sure I’ll have
things to tell you about it then.

I’m back, and what’s next

I seem to have returned to the land of the living — I woke up
this morning wanting to get out of bed and walk the dog. I then
did a reasonable imitation of my usual morning routine, and still
don’t feel like it’s quite time to go back to bed.

As far as what the diagnosis is, since it’s getting better and
not worse, I don’t see any need to burden the medical care system
with this problem, so you’re going to have to put up with my lay
diagnosis. I was running a fever for a good bit of Saturday and
most of Sunday, so I would normally call it flu, not a cold.

Because people have been worrying about flu lately, I’ve been
just saying it’s a cold. I’m not someone who’s ever had the kind
of cold a lot of people get where it slows them down for a week or
even longer, but they never run a fever or get into a state where
they should clearly be in bed. I suspect that this isn’t because
I’m immune to those viruses; I suspect it’s because the virus that
gives some poeple a stuffed up head but not much else for a week
gives me a fever and a stuffed up head for a couple of days.

But if it is flu, I had the regular flu vaccine 2 weeks ago.
So it’s either a regular flu virus that got in under the wire
before my immunity took hold (or even got a little bit of help
from the virus in the vaccine), or a flu strain that isn’t in the
regular virus. In which case, it’s entirely possible that it’s
H1N1. But if so, I don’t seem to be one of the people that H1N1

What I would have been doing if I hadn’t been in bed

I have to move the site from the old
ISP (hostrocket) to the
new ISP (dreamhost). Note
that this isn’t in any way a criticism of hostrocket as a host if
it meets your needs. I acquired the dreamhost account when I
desperately needed a way to move a bunch of mailman
mailing lists to a new place. They’d been hosted on my home
machine when I had my internet connection from speakeasy, and this wasn’t
going to work when I started connecting with comcast.

Hostrocket doesn’t offer mailman, and while I could probably
have managed to move the mailman lists to what they offer instead,
the non-technical people who’ve been administering some of the
mailman lists would have had a lot of trouble, and I thought that
even for my purposes, mailman was better. So I found a coupon
code that gave me the first year of dreamhost hosting for very
little money. Last Spring I moved the music publishing part of
the site to dreamhost, and now I’m moving the rest of it, before
I owe hostrocket for another year.

Just moving the existing site to a place on dreamhost and
pointing the laymusic dns to the new place would be easy, but what
I’m trying to do is to move the pieces that should be on this site
and that I want to maintain
into the laymusic wordpress installation, and then I’ll just have
a pointer to the old stuff for historical reasons.

The job is a bit less tedious than it might be because of the
program that adds files to the wordpress media library. I may
write a version of that that creates a post from the part of a file between
certain markers. But mostly it’s tedious because it involves
doing minimal updating of a lot of stuff that could use major
rewriting, but that would be major thinking, and that isn’t going
to happen before October 15.

I have a cold

It was coming on yesterday, which is why after trying to make a
post come out through the masses of wool in my head all morning, I
gave up and tagged the post
for the West
Gallery Quire
as my post for October 2.

This isn’t quite as much cheating as when I use the posts I do
anyway for the Cantabile Band or
, but it’s pretty close, since there wasn’t any
actual writing involved.

I knew I should stop even trying to work later in the
afternoon, when I managed to break both the CSS and the DNS for
this site, and spent a fairly long time before managing to fix

So today I’m going ot take the day off. This means I can’t
write you about how much fun the New England Sacred Harp
was going to be, or about what I cooked to take
there, since I’m not going.

I hope this cold clears up by tomorrow so that I can go and
write about those things then.