[cups.general] Jbase and cups and a stuck queue, hopefully someone recognizes this problem.

Mark Krenz mark.krenz at cookmedical.com
Mon Jul 23 06:27:00 PDT 2007


Kurt Pfeifle wrote:
> Not sure what it exactly is that pipes use, maybe "ctrl+d". I'm pretty
> sure that lpr does not use anything special. Just the standard code that
> indicates "Over!"...
>
> To rule out any "jspprint <--> lpr" issues, I'd make jspprint print to
> file. And lpr print from file.
>
> If a CUPS problem *then* occurs, you'll know for sure where to continue
> digging.
>   
Please, if anyone else wants to jump in and help, I'd be grateful for it.

Ok, this problem just keeps getting weirder.  I've now had a consultant 
from Red Hat come on site for 3 days, look at the problem thoroughly and 
be baffled.  Nothing was wrong with my cups config or system config as 
far as he could tell.  We also had a Linux guru from jBase login and 
check the config.  Both of them also investigated the problem while it 
was happening and found nothing wrong.

On this past Saturday however, I tried a new investigation tactic 
suggested by my wife.  tcpdump the localhost interface.  I finally found 
something which I think can prove that lpr <--> cups communications is 
failing.  And it almost looks like packets aren't making it through the 
localhost interface.  But I can't be sure.  I'm not saying cups is the 
trigger for this problem, but that is definitely around where the 
failure is occuring.  Maybe its even within the kernel.

Below are the relevant parts of the straces of lpr, cups and the tcpdump 
of the localhost interface.  Follow the nanosecond timestamps to keep 
track of what happens:

lpr strace:

1185021297.243723 close(4)              = 0
1185021297.243766 munmap(0xb73ef000, 4096) = 0
1185021297.243814 open("/etc/passwd", O_RDONLY) = 4
1185021297.243869 fcntl64(4, F_GETFD)   = 0
1185021297.243912 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
1185021297.243954 fstat64(4, {st_mode=S_IFREG|0644, st_size=1463, ...}) = 0
1185021297.244042 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb73ef000
1185021297.244091 _llseek(4, 0, [0], SEEK_CUR) = 0
1185021297.244143 _llseek(4, 0, [0], SEEK_SET) = 0
1185021297.244204 stat64("jak.test2", {st_mode=S_IFREG|0644, 
st_size=17879, ...}) = 0
1185021297.244305 open("jak.test2", O_RDONLY) = 5
1185021297.244370 send(3, "POST /printers/JOSHUA HTTP/1.1\r\n", 32, 0) = 32
1185021297.244477 send(3, "Content-Length: 18134\r\n", 23, 0) = 23
1185021297.244589 send(3, "Content-Type: application/ipp\r\n", 31, 0) = 31
1185021297.244704 send(3, "Host: localhost\r\n", 17, 0) = 17
1185021297.244784 send(3, "\r\n", 2, 0) = 2
1185021297.244847 time(NULL)            = 1185021297
1185021297.244933 send(3, "\1\1\0\2\0\0\0\1", 8, 0) = 8
1185021297.245005 time(NULL)            = 1185021297
1185021297.245067 send(3, "\1G\0\22attributes-charset\0\niso-8859-1", 
34, 0) = 34
1185021297.245149 time(NULL)            = 1185021297
1185021297.245208 send(3, "H\0\33attributes-natural-language\0\5en-us", 
37, 0) = 37
1185021297.245272 time(NULL)            = 1185021297
1185021297.245320 send(3, 
"E\0\vprinter-uri\0#ipp://localhost:631/printers/JOSHUA", 51, 0) = 51
1185021297.245387 time(NULL)            = 1185021297
1185021297.245435 send(3, "B\0\24requesting-user-name\0\4root", 29, 0) = 29
1185021297.245494 time(NULL)            = 1185021297
1185021297.245542 send(3, "B\0\10job-name\0\tjak.test2", 22, 0) = 22
1185021297.245598 time(NULL)            = 1185021297
1185021297.245646 send(3, 
"I\0\17document-format\0\30application/octet-stream", 44, 0) = 44
1185021297.245711 time(NULL)            = 1185021297
1185021297.245758 send(3, "\2B\0\njob-sheets\0\4noneB\0\0\0\4none", 29, 
0) = 29
1185021297.245819 time(NULL)            = 1185021297
1185021297.245866 send(3, "\3", 1, 0)   = 1
1185021297.245915 fstat64(5, {st_mode=S_IFREG|0644, st_size=17879, ...}) 
= 0
1185021297.246004 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb73ee000
1185021297.246052 _llseek(5, 0, [0], SEEK_SET) = 0
1185021297.246103 read(5, "Linux version 2.4.21-47.0.1.ELsmp 
(brewbuilder at hs20-bc2-2.build.redhat.com) (gcc version 3.2.3 20030502 
(Red Hat Linux 3.2.3-56)) #1 
1185021297.253485 read(5, "", 12288)    = 0
1185021297.253530 time(NULL)            = 1185021297
1185021297.253579 send(3, "Linux version 2.4.21-47.0.1.ELsmp 
(brewbuilder at hs20-bc2-2.build.redhat.com) (gcc version 3.2.3 20030502 
(Red Hat Linux 3.2.3-56)) #1 
1185021297.260863 read(5, "", 32768)    = 0
1185021297.260914 recv(3, "", 2048, 0)  = 0
1185021598.248734 close(3)              = 0
1185021598.248819 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3


cups strace:

1185021297.233490 send(5, 
"D\0\17media-supported\0\7DefaultD\0\0\0\5Tray1D\0\0\0\5Tray2D\0\0\0\5Tray3D\0\0\0\5Tray4D\0\0\0\10EnvelopeD\0\0\0\6Manua
1185021297.233909 select(1024, [0 1 3 5], [5], NULL, {1, 0}) = 1 (out 
[5], left {1, 0})
1185021297.234384 send(5, "D\0\rmedia-default\0\6Letter", 24, 0) = 24
1185021297.234669 select(1024, [0 1 3 5], [5], NULL, {1, 0}) = 1 (out 
[5], left {1, 0})
1185021297.235085 send(5, 
"D\0\17sides-supported\0\3oneD\0\0\0\rtwo-long-edgeD\0\0\0\16two-short-edge", 
60, 0) = 60
1185021297.235393 select(1024, [0 1 3 5], [5], NULL, {1, 0}) = 1 (out 
[5], left {1, 0})
1185021297.235808 send(5, "D\0\rsides-default\0\3one", 21, 0) = 21
1185021297.236095 select(1024, [0 1 3 5], [5], NULL, {1, 0}) = 1 (out 
[5], left {1, 0})
1185021297.236509 send(5, "#\0\24finishings-supported\0\4\0\0\0\3", 29, 
0) = 29
1185021297.236832 select(1024, [0 1 3 5], [5], NULL, {1, 0}) = 1 (out 
[5], left {1, 0})
1185021297.237248 send(5, "#\0\22finishings-default\0\4\0\0\0\3", 27, 0) 
= 27
1185021297.237541 select(1024, [0 1 3 5], [5], NULL, {1, 0}) = 1 (out 
[5], left {1, 0})
1185021297.237957 send(5, "#\0\fprinter-type\0\4\0\0000\34", 21, 0) = 21
1185021297.238096 send(5, "\3", 1, 0)   = 1
1185021297.238405 select(1024, [0 1 3 5], [], NULL, {1, 0}) = 1 (in [5], 
left {0, 990000})
1185021297.244679 select(6, [5], NULL, NULL, {1, 0}) = 1 (in [5], left 
{1, 0})
1185021297.244764 recv(5, "POST /printers/JOSHUA 
HTTP/1.1\r\nContent-Length: 18134\r\n", 2048, 0) = 55
1185021297.245137 select(6, [5], NULL, NULL, {1, 0}) = 0 (Timeout)
1185021298.239049 select(1024, [0 1 3 5], [], NULL, {1, 0}) = 0 (Timeout)


tcpdump -i lo:

08:34:57.234432 localhost.localdomain.36406 > localhost.localdomain.ipp: 
.. [tcp sum ok] 168:168(0) ack 3483 win 32767 <nop,nop,timestamp 9013610 
9013610> (DF) (
08:34:57.235134 localhost.localdomain.ipp > localhost.localdomain.36406: 
P 3483:3543(60) ack 168 win 32767 <nop,nop,timestamp 9013610 9013610> 
(DF) (ttl 64, id 
08:34:57.235149 localhost.localdomain.36406 > localhost.localdomain.ipp: 
.. [tcp sum ok] 168:168(0) ack 3543 win 32767 <nop,nop,timestamp 9013610 
9013610> (DF) (
08:34:57.235840 localhost.localdomain.ipp > localhost.localdomain.36406: 
P [tcp sum ok] 3543:3564(21) ack 168 win 32767 <nop,nop,timestamp 
9013610 9013610> (DF)
08:34:57.235856 localhost.localdomain.36406 > localhost.localdomain.ipp: 
.. [tcp sum ok] 168:168(0) ack 3564 win 32767 <nop,nop,timestamp 9013610 
9013610> (DF) (
08:34:57.236546 localhost.localdomain.ipp > localhost.localdomain.36406: 
P [tcp sum ok] 3564:3593(29) ack 168 win 32767 <nop,nop,timestamp 
9013610 9013610> (DF)
08:34:57.236562 localhost.localdomain.36406 > localhost.localdomain.ipp: 
.. [tcp sum ok] 168:168(0) ack 3593 win 32767 <nop,nop,timestamp 9013610 
9013610> (DF) (
08:34:57.237283 localhost.localdomain.ipp > localhost.localdomain.36406: 
P [tcp sum ok] 3593:3620(27) ack 168 win 32767 <nop,nop,timestamp 
9013610 9013610> (DF)
08:34:57.237300 localhost.localdomain.36406 > localhost.localdomain.ipp: 
.. [tcp sum ok] 168:168(0) ack 3620 win 32767 <nop,nop,timestamp 9013610 
9013610> (DF) (
08:34:57.237989 localhost.localdomain.ipp > localhost.localdomain.36406: 
P [tcp sum ok] 3620:3641(21) ack 168 win 32767 <nop,nop,timestamp 
9013610 9013610> (DF)
08:34:57.238005 localhost.localdomain.36406 > localhost.localdomain.ipp: 
.. [tcp sum ok] 168:168(0) ack 3641 win 32767 <nop,nop,timestamp 9013610 
9013610> (DF) (
08:34:57.238121 localhost.localdomain.ipp > localhost.localdomain.36406: 
P [tcp sum ok] 3641:3642(1) ack 168 win 32767 <nop,nop,timestamp 9013610 
9013610> (DF) 
08:34:57.238176 localhost.localdomain.36406 > localhost.localdomain.ipp: 
.. [tcp sum ok] 168:168(0) ack 3642 win 32767 <nop,nop,timestamp 9013610 
9013610> (DF) (
08:34:57.244406 localhost.localdomain.36406 > localhost.localdomain.ipp: 
P 168:200(32) ack 3642 win 32767 <nop,nop,timestamp 9013611 9013610> 
(DF) (ttl 64, id 6
08:34:57.244519 localhost.localdomain.36406 > localhost.localdomain.ipp: 
P [tcp sum ok] 200:223(23) ack 3642 win 32767 <nop,nop,timestamp 9013611 
9013610> (DF) 
08:34:57.278689 localhost.localdomain.ipp > localhost.localdomain.36406: 
.. [tcp sum ok] 3642:3642(0) ack 223 win 32767 <nop,nop,timestamp 9013615 
9013611> (DF) 
08:35:59.238760 localhost.localdomain.ipp > localhost.localdomain.36366: 
F [tcp sum ok] 1642048074:1642048074(0) ack 1634938722 win 32758 
<nop,nop,timestamp 901


I included a line after the problem area just to show what happened 
afterwords.  Now, from looking at the tcpdump and the strace of lpr, it 
seems that lpr is waiting for data back from cups, cups sends some data 
back, but lpr never receives it.

I'm not running any kind of host firewall on the machine, there is 
nothing to indicate that localhost is configured wrong:

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:9342790 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9342790 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:676574996 (645.2 Mb)  TX bytes:676574996 (645.2 Mb)

And yes that is from the output of ifconfig while we're having the 
problem.   Up until this point in my career I had never questioned that 
the localhost interface would do its job.   I don't know, maybe lpr is 
hanging or something, but the fact that it continues retrying tells me 
otherwise.

What should I do now?  If you need more information, I recorded 
practically everything I could from the machine while we were having the 
problem.  I ever did a find on /proc and cat'd out everything to a file 
(except kcore of course).

-- 
Mark S. Krenz
Linux System Administrator
Cook Incorporated
mark.krenz at cookmedical.com
phone: (812)339-2235 ext.4224
Cell: 322-8117
alt. cell: 322-4449





More information about the cups mailing list