Ghostscript / CUPS-PDF default font problem
Helge Blischke
h.blischke at acm.org
Mon Dec 21 12:31:34 PST 2009
Chris wrote:
>> Chris wrote:
>>
>> > UPDATE:
>> > I've gotten a little closer to what my issue may be...
>> > It definitely has to do with the way the embedded fonts in the PDF are
>> > encoded.
>> >
>> > I was able to get the PDF encoding to change by switching from the
>> > default "CUPS-PDF Postscript driver" to the "Generic Postscript driver"
>> > in Ubuntu.
>> >
>> > For some reason when using the "CUPS-PDF driver" the fonts are
>> > [TrueType (CID), Type-H encoded] when embedded into the PDF.
>> > When I copy the word "test" from a pdf and paste it I get:
>> >
>> > When using the "Generic Postscript driver" the fonts are
>> > [TrueType, Custom encoded] when embedded into the PDF.
>> > When I copy the word "test" from a pdf and paste it I get: WHVW
>> >
>> > I think I need to find a way to embed the font as [TrueType, ANSI
>> > encoded]
>> >
>> > Does anyone have any recommendations? I'm running out of ideas...
>> >
>> > Thanks,
>> > Chris
>>
>> Please post (an URL to) a sample PDF file and tell us which Ghostscript
>> version you are using.
>>
>> Without looking into the PDF file it is hardly possible to give you any
>> reasonable hints.
>>
>> Helge
>>
>
> I'm using ghostscript version 8.70
>
> PDF encoded with Generic PS Driver:
> https://home.comcast.net/~thwiang/EncodedUsingGenericPSDriver.pdf
>
> PDF encoded with CUPS-PDF Driver:
> https://home.comcast.net/~thwiang/EncodedUsingCUPS-PDFDriver.pdf
>
> Thanks
Well, both PDF creators you used (Ghostscript 8.70 in the generic case and
pdftopdf in the CUPS-PDF case) embed the TrueType font(s) you used as
subsets (which is mandatory woth TT fonts, as they tend to be quite large).
By design, the character encoding in these cases depends on the input
sequence of characters, i. e. the first character used gets the codd 0, the
second the codde 1 and so on; the only guarantee is that the same glyph of
the same font gets the same unique code throughout this process.
To allow human readable gain via copy and paste (or any sort of text
extraction) the PDF spec requires a PDF object (which is logically a table)
that establishes a correspondence between the mentioned "ad hoc" character
codes and the unicode numbers of the respective glyphs. This table can only
be generated if the font contains appropriate information, and obviously the
used font (FreeMono) lacks this information.
Helge
More information about the cups
mailing list