Extracting text from PDF

Helge Blischke h.blischke at srz.de
Wed Jun 27 07:25:30 PDT 2007


Rolf wrote:
> I'm trying to extract text automatically from PDFs with pdftotext to make them searchable. This usually works well, except with PDFs generated by cups-pdf. There is no text output at all. What is the reason for this and is there a way to change this?
> 
> regards, Rolf

cups-pdf uses Ghostscript to generate the PDF. Your problem is
most likely due to a Ghostscript version that does not
generate Unicode maps for the used fonts in the PDF, a feature
pdftotext depends on.

If that is the case, I'd suggest to upgrade Ghostscript.

Helge

-- 
Helge Blischke
Softwareentwicklung

H.Blischke at acm.org




More information about the cups mailing list