[cups.development]Simple accounting filter/backend (With Perl or another script language)

Wed Oct 27 05:57:45 PDT 2004

On Wed, Oct 27, 2004 at 08:48:24AM +0000, Nayco wrote:
> 
> So, That's the method you used in Pykota ?

yes, so you can see what is done by downloading the CVS tree.

I prefer to NOT launch the gs + bbox thing when counting %%Page: 
comments produce a plausible result, this is less safe, but
saves a lot of CPU.

anyway, no method is really fool proof, as you easily demonstrate
below...

> Anyway, lets imagine users are non-malicious: Your method should 
> work with all (most) of the "normal" printjobs ? 

correct.

> Ok, I tried manually with "grep" with a document printed to file 
> (2 pages, and I asked 37 copies...): 
> 
> $ grep -i "1 dict dup " toto.ps
> 1051 dict dup begin
> 1 dict dup /NumCopies 37 put setpagedevice
> 1 dict dup /NumCopies 37 put setpagedevice
> 
> Sounds good :) !!! So, I can assume here that NumCopies shows the 
> right value, and appears 2 times because of the 2 pages length of 
> the document... I'm right ? Ok, this document comes from Mozilla 
> Firefox. I'll try with others. 

You're 100% right.

> But, something I can't explain:
> 
> $ grep -i "Bounding" toto.ps
> %%BoundingBox: 0 0 612 792   #Only one Bounding box ?
> 
> Although:
> $ /usr/bin/gs -sDEVICE=bbox -dNOPAUSE -c save pop -f toto.ps -c quit 2>&1
> %%BoundingBox: 0 0 612 792
> %%HiResBoundingBox: 0.036000 0.324000 611.891981 791.981976
> %%BoundingBox: 0 0 612 792
> %%HiResBoundingBox: 0.036000 0.324000 611.891981 791.981976

these are different things : in the latter case you grep THE OUTPUT
of ghostscript, NOT ITS INPUT

> Ok, two pages, but no 37 copies :)

Exactly what I noticed too. That's why I said this method only worked
"most of the time" and not always.

> Lastly:
> $ grep -i "BeginNonPPDF" toto.ps
> <nothing>

Normal. I didn't say (even if it was unclear) that the problematic files
contained BOTH of the two snippets I submitted. Only that I noticed
two different type of PostScript documents which weren't correctly
accounted for using the bbox thing : one type contains the first
form of code, the other type contains the other form.

BTW I search for the %%BeginNonPPDFeature comment and extract
the number of copies from it because it's EASIER than to extract
the REAL number of copies from the PostScript code which really
follows this comment.

This means that some ingenious "student" could fool the program
by modifying the comments and putting "only 1 copy" there while
keeping the postscript code which contains "50 copies, please"
completely intact.

> So, what I need now is to make the algorhythm... I need to sort 
> these methods from most accurate to less accurate, as to correctly 
> nest my "IFs" ;) What do you think about it ? 

Nothing special. If you want to use the algorithm used in PyKota
you're free to do so because it's free software. 

If you want you can even use the file in question (pdlanalyzer.py) 
completely independantly of PyKota (no need to install anything 
beside Python and this file, although Python-Psyco is recommended 
for performance reasons). This will give you instant support
for PostScript (binary and DSC compliant), PDF, PCL3, PCL4, PCL5,
PCLXL (aka PCL6), and ESC/P2. So instead of launching 
gs + bbox, just launch "python pdlanalyzer.py file1.ps ..."
(or use a pipe)

Just keep in mind that it's NOT perfect, and that some day you'll
find a PostScript file which fools your program either intentionnally
or not.

bye

Jerome Alet