Reading PDF files on a portable device

This is a holy grail for people who want to carry their libraries in their pockets and read without glasses or special lighting. The problem is that a lot of the people who distribute ebooks seem to think that using a page-description language like PDF is a suitable distribution method. But actually reading something formatted for an 8″x6″ page on a 2″x3.5″ screen is difficult.

I have it figured out. I’m not sure if this is something that’s changed recently in the software, or if it was always like this and I was too stupid to figure it out.

I’ve asked on several mailing lists, and they seem to have been too stupid to figure it out, too. The best suggestion was from Peter
who posted a LaTeX file that resizes the PDF’s pages to be the size of your device, and trims the margins. But on my Nokia N810, I need reading glasses to read the resulting page. You can make a case that I should get better reading glasses, but I don’t think I’d enjoy even good ones.

My new method is as follows:

  • Run:
    pdftohtml -stdout pdf-filename > html-filename
  • open the html-filename in emacs.
  • There are two major problems you want to fix here:
    • Every line in the original ends with a break. The original line-length of the paper book is unlikely to be useful on your portable device, so what you need, and what I’ve always before failed to fine, is a way to distinguish the breaks that are actually new paragraphs from the ones that are just line breaks. For the PDF files I’ve looked at since yesterday, the ones that are just linebreaks end in &nbsp;<br>, and the ones that are new paragraphs end in <br>. So what I’m doing these days is replacing the &nbsp;<br> with just a space. I may decide at some point to replace the <br>‘s with <p>‘s, but so far what I’m doing now looks pretty good.
    • There is junk like page numbers between the pages. This varies by book, but for the book I’m reading at the moment, there was a file url at the bottom of every page and an anchor tag of a line like “dummy 2” at the top of every page. It would be real programming to write something that would continue a paragraph across a page break like this, so I’m putting up with new pages translating to new paragraphs even when they obviously shouldn’t. But I’m using emacs to remove anything that writes some distracting non-text. In this case, that’s removing the file url and the “dummy 2” text. Be careful about the file url — it might not be on a line by itself.

If you’re using FBReader on a device with a sizeable memory card, you’re done. Just put this html file on your device.

Otherwise, do whatever you normally do to sizeable html files (zipping is probably a good idea) and put that on your device.

Amazon and Macmillan

So far, the best comment I’ve read on the current war
between Amazon and Macmillan
, which has caused a lot of
books people would be buying and reading to disappear from the
Amazon shelves, is this
by Cory Doctorow on Boing Boing.

He points out how ridiculous both sides look — both Amazon masquerading as a defender of
consumer rights by demanding low prices for ebooks and Macmillan
masquerading as a friend of the book industry for demanding that
ebooks sell at the price of hardcovers.

He says:

If true, Macmillan demanding a $15 pricetag for its ebooks is just plain farcical. Although there are sunk costs in book production, including the considerable cost of talented editors, copy-editors, typesetters, PR people, marketers, and designers, the incremental cost of selling an ebook is zero. And audiences have noticed this. $15 is comparable to the discounted price for a new hardcover in a chain bookstore, and it costs more than zero to sell that book. Demanding parity pricing suggests that paper, logistics, warehousing, printing, returns and inventory control cost nothing. This is untrue on its face, and readers are aware of this fact.

If true, Amazon draping itself in the consumer-rights flag in demanding a fair price is even more farcical. Though Amazon’s physical-goods sales business is the best in the world when it comes to giving buyers a fair shake, this is materially untrue when it comes to electronic book sales, a sector that it dominates. As mentioned above, Amazon’s DRM and license terms on its Kindle (as well as on its Audible audiobooks division, which controls the major share of the world’s audiobook sales) are markedly unfair to readers. Amazon’s ebooks are locked (by contract and by DRM) to the Kindle (this is even true of the “DRM-free” Kindle books, which still have license terms that prohibit moving the books). This is not due to rightsholder-demands, either: as I discovered when I approached Amazon about selling my books without DRM and without a bad license agreement for Kindle and Audible, they will not allow copyright owners to modify their terms, nor to include text in the body of the work releasing readers from those terms.

…[lots of good stuff about the bad effect of DRM on the marketplace, LEC]

If Macmillan wants to flex its muscle on an issue of substance and moment, an issue that will make it the hero of readers and writers and booksellers everywhere, it can demand that Amazon, Apple, B&N, and all the other ebook readers allow for interoperability and remove contracts that undo centuries’ worth of book-ownership norms.

And if Amazon wants to throw its toys out of the pram over a consumer rights issue, let it announce that it will offer a fair deal for any book that publishers and writers will allow a fair deal — no DRM, no abusive EULA, just “This book is governed by 17USC, the United States Copyright Law. Do not violate that law.” Let Amazon label the books that are a bad deal for readers with warnings: “At the publisher’s request, this book is licensed under terms that prohibit reading it on other devices, selling it used, or giving it to your children.” And let them put a gleaming seal of approval on the books that offer fair terms and a fair shake.

And trust readers to make up their minds.

In combination with the Apple announcement that the new Apple
bookstore for the iPad will have a different proprietary
format for the books it sells, this has been a bad week for
readers of ebooks. I haven’t been buying DRM that can’t be
broken — maybe I should go back to not buying DRM that can’t be
*legally* broken.

I’m currently reading:

  • A hardcover from the library for my
    bedtime book (and dealing with the light and the reading glasses
    when I want to stop).
  • A DRM’d ebook from the library on my laptop for
    my reading downstairs.
  • A Project Gutenberg ebook on my Nokia
    for when I’m out of the house and don’t want to carry anything
    as heavy as either the dead tree book or the laptop

It would really be nice if the publishers of the hardcover and
the library ebook would sell me what I want to buy and put their
books out in a format I can enjoy on my device of choice. I’m
not the only person who wants this, and there are publishers (Baen for instance) who seem to
stay in business selling it to me and others like me. But it’s
not looking like either the big publishers or the retailers are
getting the message.

Publishing on the web

I’ve been sending a lot of email lately to people who
transcribe music the way I do and are wondering whether and how to put it on
the web.

Putting other people’s transcriptions on my site is addressed briefly in the
, but of course there are lots more details than a two
paragraph answer can deal with.

One person who’s also a student of my recorder teacher
transcribes in Sibelius. She gives printouts to anyone who
asks, but seems to have decided putting it on the web is
impossibly complicated. My teacher has been really excited
about being able to point workshop students to the music he’s
going to be using on the web, so that they can look at it
beforehand, and has been encouraging her to get hers up, too. She discussed it with the Sibelius
support people, but her eyes glazed over when they said “install
a PDF writer”. Apparently she has an old wreck of a computer
that breaks when you install pretty much anything. So if she
hadn’t figured out how to do it in 2003, that computer is never
going to be able to do it, and she doesn’t like computers enough
to want to spend $200 on a better one.

Another person is doing transcriptions from Petrucci’s
Odhecaton. He’s quite capable of putting his own site up, and
had decided to use a wordpress
for his transcriptions. We have an ongoing
conversation about how to provide the kinds of transcriptions
various kinds of players want. I thought about the blog
solution when I was setting up the Serpent Publications
, but was having too much trouble using the WordPress
media stuff, and I already had the database set up. I suspect
that when he has a few dozen transcriptions, he’ll find the blog
solution clumsy, but it should work fine until then. I would
probably have used if it had been available when I was starting

A third person has essentially transcribed all of Dowland’s
part songs, including the lute tablature, and converted the lute
tablature to notation suitable for guitar players. This would
actually be a really good supplement to the Dowland
that’s on my site, and I’d be happy to have it, but he hasn’t yet
done any thinking about licensing, so I pointed him to some
reading matter
, and haven’t heard from him since. There is a
lot of stuff to think about. I also suggested if what he really
wants to do is sell his work.

It’s quite exciting to be in touch with so many people
doing the kind of thing I do. I hope they all get what they
want out of doing it.

The Elizabethans were doing ASCII sorting

One of the oddities of Elizabethan publishing, which I have
retained in my transcriptions of Elizabethan music, is that they
write roman numerals differently from the way your clock does.

Specifically, your clock writes “4” as “IV”, that is, one
subtracted from 5. The Elizabethans didn’t do that — they
wrote “IIII”, and similarly “VIIII” for “9” and “XVIIII” for

There turns out to be a major advantage to this for computer
sorting — if you don’t go up past 50, the ascii roman numeral
sort ends up in numeric order. If you were to sort the digits
on a clock in ascii, you would end up with “IX” coming before
“VIII”, but in the Elizabethan coding, “VIII”, “VIIII”, and “X”
come out in the right order (unlike “8”, “9”, and “10”).

I was thinking I might have to write some code to get the
pieces in the right order, but a typical Elizabethan music book
has 20 or 21 pieces in it, so using their roman numerals, I can
just tell mysql to “order by” and everything just works!


Last October, there was a ruling from the FTC that proposed
hefty fines for bloggers who fail to disclose “compensation” for
their reviews. It’s described in indignant detail on the
Teleread blog

I’ve been ignoring that ruling. It certainly doesn’t apply
directly to me. I’m not organized enough to ask for free copies
of the books and DVD’s I review, and I certainly don’t have any
other direct compensation for what I do on this blog.

I don’t know how to explain my relationship with Google Adsense
in terms that won’t violate my agreement with them, but I found a
place to point
that does.

When I link to products on, I allegedly get a cut if
you order them. But I really don’t think I’ve been telling you to
order the products unless I really like them, and I certainly
haven’t been telling anyone to get books or movies there unless
that’s really the way they like buying books and movies.

It’s been a long time since I actually got a check from either
of these programs.

I personally get almost all the movies I watch from Netflix. When I buy ebooks,
I get them from Fictionwise, which has a
much more enlightened policy on DRM than does Amazon, and a
discount structure that allows you to at least pretend you’re
getting a lot of free books. I get most of my hard copy books
from the library; if I think I really want to own a dead tree
copy, I either go to a bricks and mortar bookstore or order
online, often used.

Lots of the other things I buy I haven’t gotten from Amazon,
either, even if I pointed to the picture that Amazon keeps
online for us.

We are a Schwerpunkt

A couple of days ago, I added the Counterize
wordpress plugin to this blog, so I’ve been wasting
time looking at all the information they pull out of the logs
for me.

It turns out that one of the top sources of referrals is a site
which says:

Von Laura Conrad transkribierte Stücke, Schwerpunkt Renaissance.

My German comes mostly from singing Schubert, but this might
mean something like “From Laura Conrad, transcribed pieces,
Focal Point for Renaissance music.” Instead of “Focal Point”, you
could say “Center of Gravity”, too. There might be some more
idiomatic translation, but I like both of these too well to go
looking for it.

I had to look up “Schwerpunkt”, since Schubert and his
publishers don’t use the word. And the first definitions I
found were from the military use of the word as one of the
possible tactics in a blitzkrieg. So I wasn’t sure it
was much of a compliment, but I decided they probably meant that
this was a good site to go to for transcriptions of Renaissance
music, which is a good thing to have people say.

I did write them and point out that the music transcriptions
have been moved to,
so they might want to change their link.

Three Quarters Done, and Happy Thanksgiving

Yesterday was the three quarter mark on this year of blogging
every day. I’ve been meditating on how I’ll blog when I don’t
have to do it every day.

There will be fewer junk posts because it’s almost lunch time
and I have to write something.

There will also be no posts at all on days I don’t have time to
write one.

But I believe I read books and watch movies with more
concentration because I know I’ll want to write about it later,
and I’ll keep doing that. And I’ll keep writing about the toys
I want to complain about.

This is a short, easy one because I still have housecleaning to
do before I make the turkey and stuffing and cranberry sauce and
organize the drinks and appetizers. My guests are making most
of the side dishes, but there’s still a lot of work.

So Happy Thanksgiving, if you’re one of the people who
celebrates it today.

Agent to the Stars, by John Scalzi

I was reminded that I hadn’t told you about Agent to the
, which I read a few weeks ago, when I read John
Scalzi’s blog entry
this morning.

The blog says that if you’re going to self-publish, please
don’t pay anyone to do it. You can get it online for free, and if
you need hardcopy, he recommends, as do I.

As someone to get advice about self-publishing from, he’s one
of the obvious successes. He wrote Agent to the
when he was a struggling young writer, and published
it online, suggesting that people send him a dollar if they liked
it. He wasn’t expecting to make much money that way, but he
stopped counting when he had $4,000, and the interest in the free
online book made it easier for him to sell his subsequent books to
“real” publishers.

Anyway, if you’re looking for a funny, lightweight science
fiction novel to read, I recommend this one. I can’t tell you
much about the plot, since there would be spoilers, but it’s very
well done. And you can download it for free if you have a way to
read books in html form that you enjoy.

Julie and Julia (the book)

So far I’ve only read the book;
I’ll probably tell you more when the movie comes out on DVD and I
see it next month or so.

I enjoyed it. When I realized how big a pain reading the PDF
from the library was
, I decided that if it wasn’t finished
by the time it expired, with just reading it on the laptop at
lunchtime, I would take the hardcover out of the library. But
then I saw that Fictionwise had a 100%
rebate on it, so I bought it from there.

100% rebates aren’t quite the same thing as getting something
for free. It’s their way of getting people to sometimes send
them money even if they’re mostly shopping on micropay rebates.
So you shouldn’t get the 100% rebate if you aren’t going to use
it to buy something you really want, but if there are several books on your wishlist that
you’re intending to give them money for, you might as well give
them money for something else, and then get the books you really
want for free. So I finished Julie and Julia in the comfort of my normal
reading device.

I discussed it with a friend who
said she’d enjoyed it, but she had several friends who hadn’t
because of the liberal use of the f-word. This could be another
post, but the conclusion of the other post would be that I don’t believe in judging people because of
their use of that diction, but I don’t use it because I’m aware
that there are a lot of people who do.

In any case, it was fun to read about someone tackling all
those recipes hardly anyone does these days. She finishes with
the Pâté de Canard en Croûte,
where you bone the duck and stuff it with pâté and
then bake it inside of a pastry shell. Most food writers
wouldn’t describe their hysterical weeping fits when the pastry
went straight from a too-dry heap to a buttery puddle.

The other impressive thing was actually doing it at all. I’ve
been feeling heroic for just getting a blog entry out there
every day, when I don’t even have a job or a commute. She not
only did a blog post in the morning before work, but put
together a shopping list, then shopped on the way home and
cooked after that. She got some help on the shopping and
cooking from her husband and friends, but really it was a pretty
heroic effort.

I thought that the book was a little long for the
material, but of course that may well make it a better

Borrowed another ebook

This one’s even worse than the
first one
from a usability standpoint.

The problem is that this one’s a PDF file, but instead of
reading it with one of the many excellent PDF readers in the
world (including Adobe’s), I still have to read it with Adobe
Digital Editions.

Adobe Digital Editions, instead of having menus across the top
with helpful items like “rotate screen”, and “go to full screen for
the text”, has buttons scattered around the part of the screen
that isn’t text. With the epub format, two of the buttons
enlarged and reduced the font size, but the PDF’s don’t reflow,
so all you can do is change the size of the text window. The
largest size I managed to get on my 14 inch laptop is readable,
but if I had an “enlarge font” button, I would still push it.
Especially if I were trying to read in bed, which I haven’t
bothered to do with this one.

On reading the epub book last week, I found myself wishing I
had a netbook, but with this one, I doubt that I would be able
to get a readable size of text, so this book would probably be
even less readable with a netbook.

It isn’t clear what the rationale for having some books in epub
format and some in PDF, but they seem to be about half and half,
so if there are only 108 books and half of them are unreadable,
that gives me even less incentive to buy another gadget.

I should mention that my eyes are a lot better than those of
most people my age. When I was younger I was unusually good at
reading fine print. Until I turned 40, I could read the
condensed Oxford English Dictionary without the magnifying
glass. Now I still don’t carry reading glasses
around with me, although in my home, I usually do have a pair
within reach. So if I can’t get a good font, there are a
lot of people in the world who can’t read the book even by squinting.

I think this is our tax dollars at work. It’s sad that people
whose job is to serve the public have so little concept of
how to implement technology to do that.