Tag Archives: ebook

Digitizing my paperback books

Part one: what do I want?

I hinted at this topic in a previous post. I have a big collection of (mostly) paperback Science Fiction books – some hardcover books too. I used to read a lot more in the pre-Internet days, nowadays it’s just during my holidays that I get enough time to read whole books in a short enough time… so many of those old paperbacks are 20-30 years old and yellowed.

In this digital age it would be appropriate to have digital versions of my books and save them from crumbling to dust. I am in anticipation of Sony’s new e-reader, the PRS-T1 which I want to buy once it is out:

This is a very nice device. It is also a lot cheaper than the previous generation Sony e-reader (the PRS-650) while at the same time adding wireless connectivity. This device needs content once I have it in my possession.

A lot of the “newer” books, and those written by contenporary authors can be purchased online, or downloaded from fan sites where people scan their own collections into EPUB or MOBI e-books.  That is all good and well, but on my bookshelves I have many dozens good books that will probably never see a new life as an e-book. That is very unfortunate… I had a lot of fun reading them and do not want to see them go into oblivion.

I decided to do something about this. I am going to try and describe (and hopefully implement) how I am going to digitize my book library. Note: at the moment this is all just ideas, “dreams” if you wish, although I have already found quite a bit of information on the Internet that I will be sharing with you. I want it to be more than just a dream.

What does one need to get a paper book converted into an e-book?

  1. the book’s pages need to be scanned
  2. the scanned bitmaps may have to be cleaned-up digitally (enhancing the contrast between characters and background, de-skewing or rotating the text blocks, …)
  3. I need an Optical Character Recognition (OCR) program to convert the bitmap images into character text
  4. I need an e-book editor to layout the bare text that I got from the OCR program – the ebook has to look largely like the original paper version.
  5. I want to use a library program to make my book catalogue available, to myself of course, to my e-reader device, and possibly to other interested parties.

And I want this to be as low-cost as possible. Any software that I am going to use for this should preferably be Open Source and run on Slackware.

Those are the main topics I will write about. Each of these topics deserves its own separate article. Why is that?

I can already see how this project will confront me with interesting challenges. I am going to write a multi-post story with interlinked articles (this being the first) in order to preserve this hobby project of mine for posterity. Having separate topic articles allows me to split up your feedback as well (heh… I hope I do get some feedback!), so that discussions about, say, scanning techniques will not interfere with talk about what is the best OCR program for Linux.

The articles are not going to be “static” per se. I value your feedback and important new insights will find their way back into the main text.

Let’s see where this ends. It is probably going to take days, or weeks,  to write. It delends a bit on Slackware development – if that picks up speed again, I will have less time for this ebook side show. But for the moment , there is silence in the ChangeLog.txt and I have time to spare.

Eric

E-book management on Slackware

Managing your e-book collection

In an earlier post, I hinted about a Slackware package I was trying to create for Calibre. The reason being that I bought my wife an e-reader: a Sony PRS650 with a touch screen using infrared instead of a touch-sensitive layer and pearl e-ink technology. Both those features make for an extremely pleasant reading experience.

However, the Sony Reader software that accompanies the device, is a Windows-only application (of course…) and second, it is not all that much of an application either. Even for Windows, the usual advice you get it is to install Calibre for  managing your e-books – including uploading them to your device.

So, I needed to have Calibre available on my Slackware computers. In fact, I used to have a package for Calibre already! At some point in time the Calibre developer decided to increase the required version of the Python interpreter to 2.7.1. And since Slackware ships Python 2.6.6 to this day, I was no longer able to compile updated packages  (I got stuck at 0.7.23 but I guess it would have been possible to keep compiling Calibre as far as 0.7.45).

I still wanted a recent version of Calibre, the software has updates about once a week! So I spent quite a lot of time researching how I could add an embedded Python interpreter plus several supporting Python modules into the Calibre package.

And I think I succeeded. I have uploaded Slackware packages for calibre-0.8.6 to my repository yesterday (for Slackware 13.1 as well as Slackware 13.37). During the period where I did not actually have an e-reader at my disposition (it arrived at the house only a few days ago) I used the testing genius of my pal mrgoblin who happened to have an e-reader device in his possession. His beta tests made me realize that I was missing the dbus-python module which is needed for Calibre to recognize when a device is plugged into the computer.

I must say, using Calibre is a lot of fun! I have a small collection of e-books and after installing the Slackware package, I was able to transfer my books to the Sony device and read them there. Then, I managed to almost brick the device by ripping out the USB cable before selecting “Eject Device” in Calibre… let that be a warning for prospective users! It took a lot of reading about soft and hard resets before I had a working e-reader again. I had to reset the device to factory defaults – which means you lose all the books that were already present on the device. It was a good learning experience with only minor inconvenience (because I had transfered only two books to the e-reader at the time) but I kept feeling my wife’s prying eyes in the back of my neck… she was not too pleased with seeing her birthday present getting bricked only 15 minutes after unpacking it!

Calibre will also be very useful for everyone who owns a Kindle (Amazon’s own e-reader). The Kindle only accepts Amazon’s own MOBI format and refuses the “open” EPUB format (which is the most commonly used e-book format outside the US). Using Calibre, you can easily convert your EPUB collection to MOBI format – when you select an EPUB file and tell it to upload it to a Kindle, Calibre will show a dialog that prompts for the automatic conversion to the Kindle’s format. Perfect!

OK, enough talk. Get a package and/or SlackBuild script at:

and don’t forget to also install the icu4c and podofo packages; these two are the only dependencies now. If you want to build the package yourself, be warned if you are running Slackware-current. There is a bug in the “file” utility in Slackware-current which prevents it from recognizing a ZIP file as such, and this bug will cause the SlackBuild script to fail. Thanks to Francesco Allertsen who first ran into this issue and reported it to me, a quick fix is to change the line 235 in the calibre.SlackBuild script:

if $(file ${SOURCE[$i]} | grep -qi “: zip”); then

to:

if $(echo ${SOURCE[$i]} | grep -qi “.zip$”); then

I hope to see a fixed “file” soon. A bugfix has been applied to the file repository already, so file-5.08 should detect ZIP files correctly when it gets released.

Have fun! Eric