Excessive number of processes for the same printer
adamh at densi.com
Fri Aug 5 10:32:00 PDT 2005
We have a document processing program written in Python which outputs PDFs to about 30 potential printers (mostly Windows printers connected with Samba, but also some LPD and JetDirect printers; all are laserjets). The Python program generates PDFs and ships them off to CUPS using "lpr" on the commandline -- usually each Python script invocation ships all generated PDFs to the same printer.
We have three serious problems with this setup:
1. When several hundred PDFs are directed to the same printer (in our most recent case, a Windows shared printer), we end up with 5 processes per job (as described by "ps ax"):
- a "/usr/bin/perl -w /usr/lib/cups/filter/pdftops"
- a process named after the printer (e.g., "pc26_hp1200")
- "/usr/bin/perl /usr/lib/cups/filter/foomatic-rip"
- a process named after the printer's URL (e.g., "smb://domain.com/pc26/hp1200")
With 200 jobs, that makes 200 * 5 = 1000 processes -- all running at once. We end up with horrible thrashing, and after an hour or two the kernel starts killing random processes when it runs out of memory and swap space.
Is there a way to throttle this? If each print job were run in serial (per printer) we wouldn't have a problem. If that's not possible, would there be a straightforward way to have our Python script do all that foomatic-rip processing -- keeping in mind that we use different printer drivers for different printers? I couldn't find any arguments to "lpr" which would do all the processing *before* placing documents in the queue, but that would suit us just fine.
The closest solution I can find is the FilterLimit directive in cupsd.conf... which leads to our second problem.
2. When a printer is unavailable (for instance, a Windows machine is turned off), all those foomatic processes continue to run until the printer becomes available again. The delay could easily be several months, or even forever.
Is it a good idea to perhaps run a cronjob which inspects queued jobs and emails us if a printer has been unavailable for a prolonged period? Is there a way to configure CUPS to automatically kill jobs which haven't printed in a week? Or even better, can CUPS automatically re-queue those jobs somehow, so we don't have so many processes running permanently on our print server?
This problem kind of blends in with Problem #1 above: the simplest form of throttling I can think of is to set a FilterLimit. But if a printer is turned off, then after very few jobs have been printed to that printer (and thus started their foomatic-rip processes), the FilterLimit will be reached and CUPS won't start printing anything until those processes finish -- which they won't. Am I correct in this assumption? Or will CUPS be able to detect that the stagnant foomatic-rip processes aren't consuming CPU power?
3. When we do a large number of near-simultaneous prints, jobs don't come out of the Windows printer in the order they were queued by our Python script. I'm (wildly) guessing this is because the foomatic-rip processes are finishing in a different order than they started, which again makes me wish they'd run in series. Is my diagnosis correct? Has anybody had a similar issue? And if so, was it solved? How?
We are running Debian (Sarge).
More information about the cups