[cups.general] Alternative way to find no. of pages inside PDF?

Sun Jun 15 08:02:08 PDT 2008

On Sun, Jun 15, 2008 at 07:12:05AM -0700, Kurt Pfeifle wrote:
> > 
> > What about using pkpgcounter from www.pykota.com ?
> 
> Unfortunately this is ruled out... (for the same reason xpdf/pdfinfo
> cannot be used: space limitations on an old Solaris box; also, AFAIU,
> pkpgcounter would require a nearly full-blown installation of a rather
> current Python package).

You can probably rewrite this in Perl, Bash (grep ?) or whatever
if you want. The code is available from 
http://trac.pykota.com/browser/pkpgcounter/trunk/pkpgpdls/pdf.py
from lines 50 to 107.

The code first splits the PDF document into small PDF objects,
but this part is not really needed if you only want the number
of pages. I had something else in mind when writing this, that
I never took the time to achieve, so this can probably be removed
easily.

What you are interested in are lines 100-107, especially line 100
which contains the (Python) regular expression :

        r"(/Type)\s?(/Page)[/>\s]"

You simply have to count how many times this regular expression 
occurs in the PDF file, minus the times you find the string "<</Type 
/Page>>" (empty ages which are not rendered).

A smaller Python program doing counting PDF pages from stdin is below :

--- CUT ---
#! /usr/bin/env python
import sys
import re

newpageregexp = re.compile(r"(/Type)\s?(/Page)[/>\s]", re.I)
datas = sys.stdin.read()
print len(newpageregexp.findall(datas)) - datas.count("<</Type /Page>>")
--- CUT ---

It's even 4 times faster than pkpgcounter, and probably works with
all releases of Python since 1.5.x

Feel free to rewrite this in any language already present on your Sun
box, and this should solve your problem.

NB : maybe some checks should be added for safety, like ensuring 
size is >= 1 and the like... 

bye

Jerome Alet