High Volume, Many Printer setup

Wed Jun 1 15:02:12 PDT 2005

Michael Sweet wrote:
> Kelly Sauke wrote:
> 
>> ...
>> 1).  If one of the cups machines goes down, when the client hits the
>> BrowseTimeout it starts removing the printers from that host.  While
>> the client is removing all the printers, all other printing commands
>> (lp, lpstat, etc) lock up and wait for this process to get done.
> 
>> ...
> 
> That shouldn't happen.  Printer timeouts will be staggered over the
> browse interval period, and they take essentially no time at all to
> process.  Certainly no time for the high-end systems you are using!
> 
> The general rule-of-thumb is to set the BrowseInterval to be greater
> than the number of printers, and the BrowseTimeout to be 3 times
> the interval.

There are 364 printers defined on each of the two servers.  I've set the BrowseInterval
on those servers to be 728.  When the client hits the BrowseTimeout on the first printer,
it appears the rest of them aren't far behind and eventually the client cupsd can't process
all the timeouts and becomes very unresponsive to other cups requests.
It seems to happen after the first 70 printers or so have
been removed then any commands (lpstat, lp) start to take from 20 sec to >1 minute to complete.  This is on
the dual 3Ghz machines with 3G ram doing nothing but running cupsd.  It doesn't appear as though the BrowseTimeouts
are as staggered as they should be.  Even with the BrowseInterval at 2*num_printers it seems that the BrowseTimeouts
are hitting at 2 timeouts per second when one would expect them to hit at 1 every 2 seconds.  2 per sec seems to be faster
than cups can remove them.

It also appears that this state only gets entered when it has an implicitclass created.  If the client
only has printers from one host and no implicit class gets created, when those printers hit their
BrowseTimout, cupsd handles it just fine and goes along its merry way.

I've changed the LogLevel to debug and didn't get any more information than 'Remote destination
"printer at host" has timed out; deleting it...'

> 
>> 2).  Say we bump the BrowseTimeout up so that printers don't get
>> removed right away if one of the servers goes down.  Now say cups02
>> is down but the printer definitions still exist on the client
>> (because it hasn't hit the BrowseTimeout yet).   I queue multiple
>> jobs to the same printer.  The first job will get sent to cups01.
>> Say that job takes a long time to print.  The next job queued to the
>> same printer gets sent to the down cups02.  After about 5 seconds,
>> this job is canceled by cups instead of staying in the queue to go to
>> cups02 when it comes back or going to cups01 when it has finished
>> printing.  Shouldn't this job stay queued while there is a valid
>> printer definition?  I can post the logs from the client showing the
>> job cancellation if that would be helpful.
> 
> 
> That is strange - when the client's ipp backend cannot connect to
> the remote printer, it should immediately stop trying if the job
> is queued on an implicit class and retry the job on the next printer
> in the class.
> 
> If you can post a snippet of the error_log with the LogLevel set to
> debug, that might be useful...

This issue is fixed in 1.1.23.  When I upgraded this behavior went away.
>