Ghostscript / CUPS-PDF default font problem

Mon Dec 21 12:31:34 PST 2009

Chris wrote:

>> Chris wrote:
>>
>> > UPDATE:
>> > I've gotten a little closer to what my issue may be...
>> > It definitely has to do with the way the embedded fonts in the PDF are
>> > encoded.
>> >
>> > I was able to get the PDF encoding to change by switching from the
>> > default "CUPS-PDF Postscript driver" to the "Generic Postscript driver"
>> > in Ubuntu.
>> >
>> > For some reason when using the "CUPS-PDF driver" the fonts are
>> > [TrueType (CID), Type-H encoded] when embedded into the PDF.
>> > When I copy the word "test" from a pdf and paste it I get: 􀁗􀁈􀁖􀁗
>> >
>> > When using the "Generic Postscript driver" the fonts are
>> > [TrueType, Custom encoded] when embedded into the PDF.
>> > When I copy the word "test" from a pdf and paste it I get: WHVW
>> >
>> > I think I need to find a way to embed the font as [TrueType, ANSI
>> > encoded]
>> >
>> > Does anyone have any recommendations? I'm running out of ideas...
>> >
>> > Thanks,
>> > Chris
>>
>> Please post (an URL to) a sample PDF file and tell us which Ghostscript
>> version you are using.
>>
>> Without looking into the PDF file it is hardly possible to give you any
>> reasonable hints.
>>
>> Helge
>>
> 
> I'm using ghostscript version 8.70
> 
> PDF encoded with Generic PS Driver:
> https://home.comcast.net/~thwiang/EncodedUsingGenericPSDriver.pdf
> 
> PDF encoded with CUPS-PDF Driver:
> https://home.comcast.net/~thwiang/EncodedUsingCUPS-PDFDriver.pdf
> 
> Thanks

Well, both PDF creators you used (Ghostscript 8.70 in the generic case and 
pdftopdf in the CUPS-PDF case) embed the TrueType font(s) you used as 
subsets (which is mandatory woth TT fonts, as they tend to be quite large).
By design, the character encoding in these cases depends on the input 
sequence of characters, i. e. the first character used gets the codd 0, the 
second the codde 1 and so on; the only guarantee is that the same glyph of 
the same font gets the same unique code throughout this process.

To allow human readable gain via copy and paste (or any sort of text 
extraction) the PDF spec requires a PDF object (which is logically a table) 
that establishes a correspondence between the mentioned "ad hoc" character 
codes and the unicode numbers of the respective glyphs. This table can only 
be generated if the font contains appropriate information, and obviously the 
used font (FreeMono) lacks this information.

Helge