[cups.general] Alternative way to find no. of pages inside PDF?

Jerome Alet alet at librelogiciel.com
Sat Jun 14 08:00:48 PDT 2008


On Sat, Jun 14, 2008 at 07:21:08AM -0700, Kurt Pfeifle wrote:
> I know I can use pdfinfo from the xpdf package like this
> 
>    pdfinfo some.pdf|grep Pages:|awk '{print $2}'
> 
> in order to find out how many pages are in a given PDF, and get 
> the result very fast. But what if I do not have a "pdfinfo" utility 
> available (Ghostscript is available...) 
> 
> Does anybody of you know a different (fast enough) method to know 
> the number of pages inside a PDF? 
> 
> ...
> Which other way than running the above command, grepping for that 
> line and awk-printing the final number (which is 4-5 times slower 
> than using pdfinfo) can you think of? 
> 

What about using pkpgcounter from www.pykota.com ?

It's MUCH (see below) slower than pdfinfo, but is not limited to PDF :

--- CUT ---
jerome at nordine:~/PDL/PCL$ time -p pkpgcounter PCLXL_ref20r22.pdf
265
real 1.26
user 1.16
sys 0.06
jerome at nordine:~/PDL/PCL$ time -p pdfinfo PCLXL_ref20r22.pdf
Title:          Volume 2
Author:         Steve Claiborne
Creator:        Microsoft Word 9.0
Producer:       Acrobat Distiller 4.05 for Windows
CreationDate:   Wed Oct 25 14:24:29 2000
ModDate:        Wed Oct 25 14:25:48 2000
Tagged:         no
Pages:          265
Encrypted:      no
Page size:      612 x 792 pts (letter)
File size:      1134477 bytes
Optimized:      yes
PDF version:    1.2
real 0.05
user 0.02
sys 0.03
jerome at nordine:~/PDL/PCL$ 
--- CUT ---

This speed comparison was done on a Dual-Pentium III 1 Ghz.
The test document is the PCLXL (aka PCL6) technical reference,
consisting of 265 pages of text and graphics, and around 1 MBytes
of PDF content.

The parsing is done in 50 lines of Python without the help of any 
external software like gs. It's really not optimized code, and 
simple enough to be ported to any other language of your choice 
easily, provided said language supports regular expressions.

NB : currently there's a limitation with PDF documents containing several
other ones, causing PDF object's major and minor numbers to be reused
inside such document, causing pkpgcounter to stop passed the first sub-
document. In reality you shouldn't encounter this bug excepted in very
rare circumstances (like in some PCL related documentations from HP).

hth

Jerome Alet





More information about the cups mailing list