[cups.general] Jbase and cups and a stuck queue, hopefully someone recognizes this problem.

Mark Krenz mark.krenz at cookmedical.com
Thu Jul 12 06:06:54 PDT 2007


Kurt Pfeifle wrote:
> Mark Krenz wrote:
>   
>> Kurt Pfeifle wrote:
>> It submits a job to cups by running lpr and piping the data to it.  The
>> lpr process sits there forever (we've seen times as high as 2-3 hours). 
>>     
>
> If so, your lpr may be waiting for the piped-in data. Can you rule out
> that the db process supposed to pipe data to lpr does fail to do so?
>   
Its hard to tell because we have been stracing the processes after this 
problem starts and so all we see are the lpr processes stuck at recv(3, 
<unfinished ...>  and just waiting there.   I tried running cat - | 
strace lpr  and it actually stopped on read(0,  instead of recv(3, 


>> So it doesn't really timeout.
>>     
>
> I assume the lpr process is indeed CUPS's own lpr client command (which
> talks IPP to the CUPS server).
>   
Yes it is.

> This indicates more that your lpr process never gets any data to pass
> on to the cupsd, than it indicates anything to be wrong with CUPS....
>
> My suggestion to you is to change the "piping" thingie. Instead, let
> the DB write its job to a temporary file; and let then lpr print that
> file (and delete it when done; see the "-r" parameter for lpr in the
> man page).
>
> This way you can better discover if the DB fails to write its jobfiles
> sometimes.
>   

I agree, but it might not be possible to change that behavior as the 
jspprint processes that act as spoolers for jbase are compiled 
programs.  And I doubt we have access to the source code.  I can check 
though.

When this problem occurs, I actually can run print jobs through lpr from 
the command line in the way you describe (from a file).   Those go 
through fine.

  Could it be that we are sending jobs to cups that cups doesn't like?  
Are there any characters to look out for?  How is the jspprint process 
supposed to tell lpr that its done sending it data?  I seem to remember 
that there is some signal that is sent, but don't remember what.

> Your lpr process does not even open such a connection yet. It is still
> waiting for its own data... (You saw it in your access_log too: CUPS
> does not see a connection coming in from your hanging lpr...)
>   
Actually, there was one print job that took more than 2600 seconds to 
run.  We did an strace of it from jspprint and use -ff to follow the 
children.  This is a section of the log that was for the lpr process 
that was spawned and it tried to make its first attempt to send the job 
to cups:

time(NULL)                              = 1184145385
send(3, "!\0\10page-top\0\4\0\0\0$", 17, 0) = 17
time(NULL)                              = 1184145385
send(3, "\3", 1, 0)                     = 1
fstat64(5, {st_mode=S_IFREG|0600, st_size=52181, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0xb73ed000
_llseek(5, 0, [0], SEEK_SET)            = 0
read(5, "\f\n ----> P I C K L I S T <----              Page # 1\n 
\n\33(0Y\33(s0p12.0h0v0s0b0T\n *"..., 32768) = 32768
time(NULL)                              = 1184145385
send(3, "\f\n ----> P I C K L I S T <----              Page # 1\n 
\n\33(0Y\33(s0p12.0h0v0s0b0T\n *"..., 32768, 0) = 32768
read(5, "________\n\f\n ----> P I C K L I S T <----              Page # 
1\n \n\33(0Y\33(s0p12.0h0v"..., 32768) = 19413
read(5, "", 12288)                      = 0
time(NULL)                              = 1184145385
send(3, "________\n\f\n ----> P I C K L I S T <----              Page # 
1\n \n\33(0Y\33(s0p12.0h0v"..., 19413, 0) = 19413
read(5, "", 32768)                      = 0
recv(3, "", 2048, 0)                    = 0
close(3)                                = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(631), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
send(3, "POST /printers/INVCMI HTTP/1.1\r\n", 32, 0) = 32
send(3, "Content-Length: 52553\r\n", 23, 0) = 23
send(3, "Content-Type: application/ipp\r\n", 31, 0) = 31
send(3, "Host: localhost\r\n", 17, 0)   = 17
send(3, "\r\n", 2, 0)                   = 2
time(NULL)                              = 1184145686
send(3, "\1\1\0\2\0\0\0\1", 8, 0)       = 8
time(NULL)                              = 1184145686


It did this every 300 seconds (thus the question about the timeout).  
Eventually after several tries, the job went through:

send(3, "!\0\10page-top\0\4\0\0\0$", 17, 0) = 17
time(NULL)                              = 1184147793
send(3, "\3", 1, 0)                     = 1
_llseek(5, 0, [0], SEEK_SET)            = 0
read(5, "\f\n ----> P I C K L I S T <----              Page # 1\n 
\n\33(0Y\33(s0p12.0h0v0s0b0T\n *"..., 32768) = 32768
time(NULL)                              = 1184147793
send(3, "\f\n ----> P I C K L I S T <----              Page # 1\n 
\n\33(0Y\33(s0p12.0h0v0s0b0T\n *"..., 32768, 0) = 32768
read(5, "________\n\f\n ----> P I C K L I S T <----              Page # 
1\n \n\33(0Y\33(s0p12.0h0v"..., 32768) = 19413
read(5, "", 12288)                      = 0
time(NULL)                              = 1184147793
send(3, "________\n\f\n ----> P I C K L I S T <----              Page # 
1\n \n\33(0Y\33(s0p12.0h0v"..., 19413, 0) = 19413
read(5, "", 32768)                      = 0
recv(3, "HTTP/1.1 200 OK\r\n", 2048, 0) = 17
time(NULL)                              = 1184148008
recv(3, "Date: Wed, 11 Jul 2007 10:00:08 GMT\r\nServer: 
CUPS/1.1\r\nConnection: Keep-Alive\r\nK"..., 2048, 0) = 210
time(NULL)                              = 1184148008
time(NULL)                              = 1184148008
time(NULL)                              = 1184148008
time(NULL)                              = 1184148008
time(NULL)                              = 1184148008
time(NULL)                              = 1184148008
time(NULL)                              = 1184148008
time(NULL)                              = 1184148008
time(NULL)                              = 1184148008
recv(3, 
"\1G\0\22attributes-charset\0\niso-8859-1H\0\33attributes-natural-language\0\5en-us\2E\0\7job-u"..., 
181, 0) = 181
close(5)                                = 0
munmap(0xb73ed000, 4096)                = 0
unlink("/var/spool/cups/tmp/46949fe9b4c8e") = 0
exit_group(0)                           = ?



The strange thing is, usually the jobs don't go through.  At least not 
in an hour or two.   So from looking at the strace, it does seem as if 
lpr is waiting on cups for some reason.




-- 
Mark S. Krenz
Linux System Administrator
Cook Incorporated
mark.krenz at cookmedical.com
phone: (812)339-2235 ext.4224
Cell: 322-8117
alt. cell: 322-4449





More information about the cups mailing list