ApeosPort HTTP/1.1 IPP Error Hidden By a CUPS Bug

Duncan McEwan duncan at ecs.vuw.ac.nz
Thu Feb 17 18:07:57 PST 2011


Hi,

I recently reported problems printing via IPP to various different Fuji/Xerox
ApeosPort multifunction printer/copiers we have here.  Those problems were
related to the printers not correctly responding to the "Expect: 100-continue"
header that CUPS sent in each IPP request.  Thanks to Michael for responding so
quickly to those issues with the fixes in STR3778 and STR3791.

However, we would prefer to continue running 1.4.4 for now (with a likely
upgrade to the latest 1.4.x soon) so I decided to try backporting those fixes.

During that process I discovered one way that the ApeosPort appears not to
implement HTTP/1.1 correctly (perhaps one of the things Michael was referring
to in STR3791 when he wrote "I've already contacted Fuji-Xerox about this issue
and are working with them on a fix for this and some IPP conformance issues").

But (perhaps more interestingly for this audience) I also found what I think
is a bug in the CUPS svn trunk that hid this ApeosPort HTTP/1.1 problem. I'd be
happy to submit an official bug report if this is confirmed as a bug and it
isn't already known about or fixed.

Details of the ApeosPort issue and the CUPS bug follow.  I can provide network
traces and CUPS debug logs if more information is required.

I first applied the patches provided in STR3778 and STR3791 to sources obtained
from the SVN trunk.  Printing to an ApeosPort no longer suffered from a
two minute delay before printing started and the job seemed to complete OK.

Then I backported the patches to 1.4.4.  The patches in STR3778 that Michael
provided for the 1.4 head needed some minor adjustments to apply cleanly
against 1.4.4, but that was not too difficult.

Again, I no longer saw the approx 2 minute delay before a print job reached the
printer.  But the print job never completed because the ipp backend continuously
looped trying and failing to get the job status from the printer.

I also tried with sources from the 1.4 branch head in case differences between
1.4.4 and the head prevented the patches working correctly on 1.4.4.  Same
result!

I figured there must be some other difference(s) between the trunk and 1.4.x
branch that made the patches work in the former case but not the latter.
So I started digging and found I was wrong!

The real reason for the 1.4.4 and 1.4 head failures was unrelated to CUPS and
was due to the previously mentioned (probable) bug in the ApeosPort's HTTP/1.1
implementation.  It seems that in some situations they close connections without
including a "Connection: closed" header in their HTTP/1.1 response.  From a
quick reading of the appropriate section in RFC2616 I believe this behaviour is
incorrect.

So CUPS sends a Get-Job-Attributes request on a socket connection that the
printer has closed resulting in the backend seeing an IPP_SERVICE_UNAVAILABLE
error.  As this is not a fatal error it goes on to send a Get-Printer-Attributes
request.  I think because this causes an OS error while *writing* the data (the
Get-Job-Attributes request error occurred when *reading* the response), the CUPS
library reconnects to the printer and then successfully re-sends the request.  Then the IPP backend loops around to try the Get-Job-Attributes
again, but once again the printer has closed the connection following its
response to the Get-Printer-Attributes request.  And so it goes on...

Printing works fine if you pass the backend a waitjob=off option.

I decided to investigate further to see what changes in the trunk allowed it
to work.  But I found that if it wasn't for a bug in that version of CUPS it
would suffer from the same problem!

The significant difference between the 1.4.x and trunk versions of the IPP
backend is that the latter calls cupsSendRequest() to send the Print-Job request
whereas the former calls cupsDoRequest() (which calls cupsSendRequest()
indirectly via cupsDoIORequest()).

If the 1.4.x code receives a chunked response, cupsDoIORequest() ends up
calling httpFlush() which consumes the trailing "0\r\n\r\n" (end of chunked
encoding indication).

With the trunk version, httpFlush() isn't called so that data remains in the
buffer and gets read as a response to the first Get-Job-Attributes request that
the backend sends.  As this is not a valid HTTP response cupsDoIORequest()
returns -1 with ipp_status set to "server-error-internal-error".  This is a
fatal error which causes the backend to break out of the job status checking
loop (after printing the diagnostic "ERROR: Unable to get job attributes:
Unknown").

Once I realised what was happening I added a call to httpFlush() after the
cupsGetResponse() call that reads the response to the Print-Job request.  That
caused the same continuously looping behaviour I was seeing with the 1.4.x
code.  Note: I'm not claiming that the addition of the httpFlush() is the
correct fix.  I have no idea what effect that might have on (for eg) responses
from printers that don't use chunked encoding.  I only did it to confirm my
understanding of the problem).

So although it originally appeared to me that the patched trunk code worked
better than that from the 1.4 branch, that wasn't the case.  Also, even if the
ApeosPort RFC2616 violation was fixed, the 1.5.x ipp backend would still fail
to get the job status because of the buffered end of chunked encoding
indication.

Finally, for now, I plan to work around the ApeosPort problem in our 1.4.x IPP
backend by adding an option to tell it to reopen the connection before sending
the Get-Job-Attributes request.  I do realise this is pretty hacky though!  A
better fix might be if the CUPS library transparently reopened the connection
and retried the request on detecting a read error (it already does this for
write errors, at least in some circumstances).  But I don't know how that
might affect all the code that calls the library so I chickened out of
attempting that change...

Duncan





More information about the cups-devel mailing list