Reading PDF files on a portable device

This is a holy grail for people who want to carry their libraries in their pockets and read without glasses or special lighting. The problem is that a lot of the people who distribute ebooks seem to think that using a page-description language like PDF is a suitable distribution method. But actually reading something formatted for an 8″x6″ page on a 2″x3.5″ screen is difficult.

I have it figured out. I’m not sure if this is something that’s changed recently in the software, or if it was always like this and I was too stupid to figure it out.

I’ve asked on several mailing lists, and they seem to have been too stupid to figure it out, too. The best suggestion was from Peter
Flynn
who posted a LaTeX file that resizes the PDF’s pages to be the size of your device, and trims the margins. But on my Nokia N810, I need reading glasses to read the resulting page. You can make a case that I should get better reading glasses, but I don’t think I’d enjoy even good ones.

My new method is as follows:

  • Run:
    pdftohtml -stdout pdf-filename > html-filename
  • open the html-filename in emacs.
  • There are two major problems you want to fix here:
    • Every line in the original ends with a break. The original line-length of the paper book is unlikely to be useful on your portable device, so what you need, and what I’ve always before failed to fine, is a way to distinguish the breaks that are actually new paragraphs from the ones that are just line breaks. For the PDF files I’ve looked at since yesterday, the ones that are just linebreaks end in &nbsp;<br>, and the ones that are new paragraphs end in <br>. So what I’m doing these days is replacing the &nbsp;<br> with just a space. I may decide at some point to replace the <br>‘s with <p>‘s, but so far what I’m doing now looks pretty good.
    • There is junk like page numbers between the pages. This varies by book, but for the book I’m reading at the moment, there was a file url at the bottom of every page and an anchor tag of a line like “dummy 2” at the top of every page. It would be real programming to write something that would continue a paragraph across a page break like this, so I’m putting up with new pages translating to new paragraphs even when they obviously shouldn’t. But I’m using emacs to remove anything that writes some distracting non-text. In this case, that’s removing the file url and the “dummy 2” text. Be careful about the file url — it might not be on a line by itself.

If you’re using FBReader on a device with a sizeable memory card, you’re done. Just put this html file on your device.

Otherwise, do whatever you normally do to sizeable html files (zipping is probably a good idea) and put that on your device.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: