Extracting text from PDF

Helge Blischke h.blischke at srz.de
Wed Jun 27 07:25:30 PDT 2007

Rolf wrote:
> I'm trying to extract text automatically from PDFs with pdftotext to make them searchable. This usually works well, except with PDFs generated by cups-pdf. There is no text output at all. What is the reason for this and is there a way to change this?
> regards, Rolf

cups-pdf uses Ghostscript to generate the PDF. Your problem is
most likely due to a Ghostscript version that does not
generate Unicode maps for the used fonts in the PDF, a feature
pdftotext depends on.

If that is the case, I'd suggest to upgrade Ghostscript.


Helge Blischke

H.Blischke at acm.org

