Extracting text from PDF
Kurt Pfeifle
k1pfeifle at gmx.net
Tue Jul 3 11:52:41 PDT 2007
Rolf Kutz wrote:
> Kurt Pfeifle schrieb:
>>>>> Most likely, your PDF contains what text you see on screen (or on
>>>>> paper, when printed) only in the form of bitmaps, not proper fonts...
>>>> How can I check this?
>>
>> I forgot one more method I wanted to mention: a commandline. Try this:
>>
>> pdffonts /path/to/your/pdf
>
> rk at hydra:~$ pdffonts PDF/1004.pdf
> name type emb sub uni object ID
> ------------------------------------ ------------ --- --- --- ---------
> RCISND+Nimbus_Sans_L.Bold.0.0.Set0 Type 1C yes yes no 14 0
> QPHYDB+Nimbus_Roman_No9_L.Regular.0.0.Set0 Type 1C yes yes no
> 9 0
> VKJNGT+Nimbus_Sans_L.Regular.0.0.Set0 Type 1C yes yes no 12 0
>
> Same result as above. I hope you can see something from this.
Yes. After some re-formatting... :-)
name type emb sub uni object ID
------------------------------------------ ------------ --- --- --- ---------
RCISND+Nimbus_Sans_L.Bold.0.0.Set0 Type 1C yes yes no 14 0
QPHYDB+Nimbus_Roman_No9_L.Regular.0.0.Set0 Type 1C yes yes no 9 0
VKJNGT+Nimbus_Sans_L.Regular.0.0.Set0 Type 1C yes yes no 12 0
The "uni" column contains "no" if the PDF file contains no "ToUnicode" map.
AFAIR, this is what Helge pointed at too.
--
Kurt Pfeifle
System & Network Printing Consultant ---- Linux/Unix/Windows/Samba/CUPS
Infotec Deutschland GmbH ..................... Hedelfinger Strasse 58
A RICOH Company ........................... D-70327 Stuttgart/Germany
More information about the cups
mailing list