Extracting text from PDF
Kurt Pfeifle
k1pfeifle at gmx.net
Wed Jul 4 08:32:01 PDT 2007
Helge Blischke wrote:
> Rolf Kutz wrote:
>> Helge Blischke schrieb:
>>
>>
>>> The glyph naming scheme used here is quite proprietary, thus
>>> the pstotext utilities cannot cope with it.
>>
>>
>> Is there something I can do about it?
>>
>> - Rolf
>
> As for the PostScript file, no (at least not without heavy
> programming). But your pdffonts list from the PDF file
> created by gs 8.57 shows that the glyph names are udnerstood
> by gs 8.57. Be happy with it.
Here are some additional details:
"10. Known Problems.
[....] Ghostscript has been writing incorrect ToUnicode CMap
without CMapName into the PDF since version 8.10 (rev. 3611).
This bug is fixed in version 8.54 (rev. 6201). We recommend
to re-generate PDF files created by the affected Ghostscript
versions. Since version 8.54 (rev. 6590) Ghostscript can read
the incorrect PDF files."
(Quote from http://ghostscript.com/doc/current/Ps2pdf.htm#Problems)
--
Kurt Pfeifle
System & Network Printing Consultant ---- Linux/Unix/Windows/Samba/CUPS
Infotec Deutschland GmbH ..................... Hedelfinger Strasse 58
A RICOH Company ........................... D-70327 Stuttgart/Germany
More information about the cups
mailing list