Lifehacker had a nice feature this morning on an online PDF to TXT tool. I don’t know about you, but I find myself hip deep in PDF’s that I’m always having to OCR and turn into something else.
PDFTextOnline doesn’t require registration and lets you edit them post transition.
I wanted to give this a good testing, so I tried a few things. One was a copy of a Spinoza’s work The Ethics that had a nice presentation in PDF and was all text. It translated it out perfectly, and actually came out rather nice for a text file.
You can see the original here. You can see the text it came out with here:
Pretty nice job, and better than you’d get copying and pasting.
So, let’s try something really hard….
A Google Books scan of William James’s Psychology - which is, to say the least, rather rough. Unfortunately, this didn’t work - it was a bit over the 10mb limit. So, to try again I grabbed his “The Meaning of Truth” which was a bit smaller but equally as rough.
The file, which was about 300 pages and 4mb, took roughly a minute to translate - MUCH faster than any desktop OCR program I’ve ever used.
Of course, I’d find out why in a moment. The only page it captured was the Google attribution page. It just created blank pages for the rest. So, while it may be based on the “best PDF content extraction money can buy” it pretty much just works on pdf’s that are already text (having been printed that way, or converted via other OCR).
This is a useful tool, as it saves you some long editing work that would arise simply from copying and pasting - but don’t expect it to do any heavy lifting.
Lifehacker - PDFs: Pull and Format Text from PDFs with PDFTextOnline
No Comments on "PDFTextOnline - A Review"