Extracting text from PDF

Kurt Pfeifle kurt.pfeifle at infotec.com
Thu Jun 28 11:04:19 PDT 2007

> > > Most likely, your PDF contains what text you see on screen (or on
> > > paper, when printed) only in the form of bitmaps, not proper fonts...
> >
> > How can I check this?

I forgot one more method I wanted to mention: a commandline. Try this:

   pdffonts /path/to/your/pdf

If you don't have that command, install package "xpdf-utils" or "poppler-utils" (depending on your distribution).

> In acroread or in kpdf look for the menu entry where you can look at
> the document properties. There you should see a tab which allows you
> to check for the fonts.
> See if the fonts are there, and what kind of names they have.
> That said, this problem ("a bitmap font was used") usually does not
> appear with Firefox. Helge's guess about the root of the problem may
> be a much better one.
> If you use your firefox to "print to file" your job, please upload
> the resulting PostScript. I/we can then try to convert with a
> CUPS/pstops + Ghostscript commandline chain (using different versions
> of Ghostscript and parameter variations) to see if we find one which
> does not show your problem....

Kurt Pfeifle
System & Network Printing Consultant --- Linux/Unix/Windows/Samba/CUPS
Infotec Deutschland GmbH - A RICOH Company ......... Stuttgart/Germany

More information about the cups mailing list