Extracting text from PDF

Kurt Pfeifle k1pfeifle at gmx.net
Tue Jul 3 11:57:41 PDT 2007


Rolf Kutz wrote:
> Kurt Pfeifle schrieb:
>> If you use your firefox to "print to file" your job, please upload the 
>> resulting PostScript. I/we can then try to convert with a CUPS/pstops 
>> + Ghostscript commandline chain (using different versions of Ghostscript
>> and parameter variations) to see if we find one which does not show your
>> problem....
> 
> Here is a link to the Postscript:
> 
> http://www.technology-forum.com/tmp/1004.ps
> 
> Regards, Rolf


Sooo... most likely, Helge was right about his suspicion.

Evidence:

(1)  I used ESP Ghostscript 8.15.3 to convert the PS (running the "ps2pdf"
     shell script utility unmodified that comes with it). Result similar
     to what you describe. Filesize 22.994 Bytes.

     "pdffonts 1004.pdf" output:

     name                                  type         emb sub uni object ID
     ------------------------------------- ------------ --- --- --- ---------
     RCISND+Nimbus_Sans_L.Bold.0.0.Set0    Type 1C      yes yes no      14  0
     AJKQFS+DejaVu_Serif.Book.0.0.Set0     Type 1C      yes yes no       9  0
     VKJNGT+Nimbus_Sans_L.Regular.0.0.Set0 Type 1C      yes yes no      12  0


(2)  Then I run GPL Ghostscript 8.57 (some weeks ago self-compiled, with
     not much tweaking what-so-ever -- I just wanted to see if it builds
     and now has the "cups" device included). Result: fonts are properly
     embedded; PDF is searchable. Filesize 25.876 Bytes.

     "pdffonts 1004.pdf" output:

     name                                  type         emb sub uni object ID
     ------------------------------------- ------------ --- --- --- ---------
     RCISND+Nimbus_Sans_L.Bold.0.0.Set0    Type 1C      yes yes yes     13  0
     AJKQFS+DejaVu_Serif.Book.0.0.Set0     Type 1C      yes yes yes      8  0
     VKJNGT+Nimbus_Sans_L.Regular.0.0.Set0 Type 1C      yes yes yes     11  0


I'm not sure if a PDF attachment would make it to the list. I'll send
it to you with private mail.

So Helge was right with his advice to upgrade Ghostscript to solve this
problem.

-- 
Kurt Pfeifle
System & Network Printing Consultant ---- Linux/Unix/Windows/Samba/CUPS
Infotec Deutschland GmbH  .....................  Hedelfinger Strasse 58
A RICOH Company  ...........................  D-70327 Stuttgart/Germany




More information about the cups mailing list