Extracting text from PDF
Helge Blischke
h.blischke at srz.de
Wed Jun 27 07:25:30 PDT 2007
Rolf wrote:
> I'm trying to extract text automatically from PDFs with pdftotext to make them searchable. This usually works well, except with PDFs generated by cups-pdf. There is no text output at all. What is the reason for this and is there a way to change this?
>
> regards, Rolf
cups-pdf uses Ghostscript to generate the PDF. Your problem is
most likely due to a Ghostscript version that does not
generate Unicode maps for the used fonts in the PDF, a feature
pdftotext depends on.
If that is the case, I'd suggest to upgrade Ghostscript.
Helge
--
Helge Blischke
Softwareentwicklung
H.Blischke at acm.org
More information about the cups
mailing list