[cups] CUPS takes 100% CPU/file descriptor leak

Wed Jul 26 12:02:55 PDT 2017

Yeah, I know, yet another of these.

Running CUPS 2.1.3 on Ubuntu 16.  Seeing this particular problem on a 
server that is fed by other CUPS servers, and which then speaks to 
printers.  That is, the only inbound print jobs to this server are 
coming from CUPS elsewhere.  I don't know if this matters and am not in 
a position to find out, sadly.

At the end of *receiving* a print job, it doesn't appear to quite 
finish.  The job gets turned around and prints successfully, but the 
socket from the upstream server is never closed out.  Digging in with 
strace I'm seeing epoll_wait return a bunch of results, then a whole 
cycle of recvfrom and poll for each fd epoll_wait returns.  Turning on 
debug logging, I'm seeing "Read: status=100" many, many times/second.  
Turning on debug2 logging, I see:

d [26/Jul/2017:13:25:41 -0400] [Client 1007] cupsdReadClient: error=0, 
used=0, state=HTTP_STATE_POST_RECV, data_encoding=HTTP_ENCODING_LENGTH, 
data_remaining=0, request=(nil)(), file=19

for each fd.

So I found in the code for cupsdReadClient() that if httpGetReady 
returns 0 and if there is nothing further in the receive buffer (as 
determined with recv using MSG_PEEK) the connection is closed and it 
cleans up the connection.  But that recv is coming back with at least 1 
byte still to receive, so it's not done.  But the job has already been 
passed along to the printer by this point and is on paper, so that byte 
(or more) really can't be very important, can it?  Or is it possible I'm 
hitting a bug in the recvfrom system call such that it's reporting data 
in the buffer when there actually isn't?

Problem is that since it's not going in to the "clean up the connection" 
code, we're leaking file descriptors with every print job as they're 
just accumulating in the CLOSE_WAIT state waiting on cupsd to actually 
call close() on them.  Eating a CPU isn't a problem, I've got more than 
one.  But over the course of a week I run out of file descriptors and 
the whole thing just falls over. A daily cron job to restart cupsd gets 
the job done, but that isn't really a solution to the underlying problem.

Any ideas out there?

-- 
Joel Lord
Sr. Systems Architect
Advanced Information Management