Extracting text from PDF

Wed Jun 27 07:25:30 PDT 2007

Rolf wrote:
> I'm trying to extract text automatically from PDFs with pdftotext to make them searchable. This usually works well, except with PDFs generated by cups-pdf. There is no text output at all. What is the reason for this and is there a way to change this?
> 
> regards, Rolf

cups-pdf uses Ghostscript to generate the PDF. Your problem is
most likely due to a Ghostscript version that does not
generate Unicode maps for the used fonts in the PDF, a feature
pdftotext depends on.

If that is the case, I'd suggest to upgrade Ghostscript.

Helge

-- 
Helge Blischke
Softwareentwicklung

H.Blischke at acm.org