Digitizing my paperback books
Part one: what do I want?
I hinted at this topic in a previous post. I have a big collection of (mostly) paperback Science Fiction books – some hardcover books too. I used to read a lot more in the pre-Internet days, nowadays it’s just during my holidays that I get enough time to read whole books in a short enough time… so many of those old paperbacks are 20-30 years old and yellowed.
In this digital age it would be appropriate to have digital versions of my books and save them from crumbling to dust. I am in anticipation of Sony’s new e-reader, the PRS-T1 which I want to buy once it is out:
This is a very nice device. It is also a lot cheaper than the previous generation Sony e-reader (the PRS-650) while at the same time adding wireless connectivity. This device needs content once I have it in my possession.
A lot of the “newer” books, and those written by contenporary authors can be purchased online, or downloaded from fan sites where people scan their own collections into EPUB or MOBI e-books. That is all good and well, but on my bookshelves I have many dozens good books that will probably never see a new life as an e-book. That is very unfortunate… I had a lot of fun reading them and do not want to see them go into oblivion.
I decided to do something about this. I am going to try and describe (and hopefully implement) how I am going to digitize my book library. Note: at the moment this is all just ideas, “dreams” if you wish, although I have already found quite a bit of information on the Internet that I will be sharing with you. I want it to be more than just a dream.
What does one need to get a paper book converted into an e-book?
- the book’s pages need to be scanned
- the scanned bitmaps may have to be cleaned-up digitally (enhancing the contrast between characters and background, de-skewing or rotating the text blocks, …)
- I need an Optical Character Recognition (OCR) program to convert the bitmap images into character text
- I need an e-book editor to layout the bare text that I got from the OCR program – the ebook has to look largely like the original paper version.
- I want to use a library program to make my book catalogue available, to myself of course, to my e-reader device, and possibly to other interested parties.
And I want this to be as low-cost as possible. Any software that I am going to use for this should preferably be Open Source and run on Slackware.
Those are the main topics I will write about. Each of these topics deserves its own separate article. Why is that?
I can already see how this project will confront me with interesting challenges. I am going to write a multi-post story with interlinked articles (this being the first) in order to preserve this hobby project of mine for posterity. Having separate topic articles allows me to split up your feedback as well (heh… I hope I do get some feedback!), so that discussions about, say, scanning techniques will not interfere with talk about what is the best OCR program for Linux.
The articles are not going to be “static” per se. I value your feedback and important new insights will find their way back into the main text.
Let’s see where this ends. It is probably going to take days, or weeks, to write. It delends a bit on Slackware development – if that picks up speed again, I will have less time for this ebook side show. But for the moment , there is silence in the ChangeLog.txt and I have time to spare.