Extracting text from PDF

Kurt Pfeifle k1pfeifle at gmx.net
Wed Jul 4 08:32:01 PDT 2007


Helge Blischke wrote:
> Rolf Kutz wrote:
>> Helge Blischke schrieb:
>>
>>
>>> The glyph naming scheme used here is quite proprietary, thus
>>> the pstotext utilities cannot cope with it.
>>
>>
>> Is there something I can do about it?
>>
>> - Rolf
> 
> As for the PostScript file, no (at least not without heavy
> programming). But your pdffonts list from the PDF file
> created by gs 8.57 shows that the glyph names are udnerstood
> by gs 8.57. Be happy with it.


Here are some additional details:

   "10. Known Problems.

    [....] Ghostscript has been writing incorrect ToUnicode CMap
    without CMapName into the PDF since version 8.10 (rev. 3611).
    This bug is fixed in version 8.54 (rev. 6201). We recommend
    to re-generate PDF files created by the affected Ghostscript
    versions. Since version 8.54 (rev. 6590) Ghostscript can read
    the incorrect PDF files."

(Quote from http://ghostscript.com/doc/current/Ps2pdf.htm#Problems)

-- 
Kurt Pfeifle
System & Network Printing Consultant ---- Linux/Unix/Windows/Samba/CUPS
Infotec Deutschland GmbH  .....................  Hedelfinger Strasse 58
A RICOH Company  ...........................  D-70327 Stuttgart/Germany




More information about the cups mailing list