Problem with FUJI XEROX ApeosPort MultifunctionDevice and the "Expect: 100-continue header"

Mon Jan 24 20:19:39 PST 2011

> and a followup after doing some investigation...

I didn't see your more recent posts until I was most of the way through drafting a reply to your first response (timezone differences and a holiday weekend for us here in Wellington, New Zealand).  Thanks for filing and then resolving STR #3778.

I spent quite a bit of time during the latter part of last week going over the CUPS code, trying to understand the cause of our delayed printing problem. This response was mostly written based on that work and prior to seeing your emails about STR #3778.

>From my investigations and contrary to what I originally stated, I'm now suspecting that the issue described/fixed in #3778 may have something to do with the printing delays we are seeing.

Unfortunately, we're not in a position right now to upgrade our CUPS installation to see if your fix to #3778 also prevents the delays.  And, despite the time I spent on it, I wasn't able to understand the code well enough to say with certainty what the cause or fix is, so I don't even know if upgrading would help.  So would you mind having a quick read over the following theory to see if it possibly explains what we are seeing?

Before describing my theory, some comments on your initial response...

> On Jan 17, 2011, at 8:45 PM, Duncan McEwan wrote:
> > ...
> > I am far from an expert in the IPP and/or HTTP protocols, but I did have a
> > quick read of rfc2616 and found the following passage:
> >
> >    An origin server MAY omit a 100 (Continue) response if it has already
> >    received some or all of the request body for the corresponding request.
>
> Correct, but the origin server cannot respond with a *successful* status
> until it has received *all* of the request body (since successful
> statuses allow for Keep-Alive and pipelining; failure status codes cause
> the connection to close...)

OK - as I said, I'm not familiar with all the details of the HTTP and IPP protocols and so I'm quite likely missing something here.  But it seems to me that the origin server *should have* received a complete request body since my tcpdump packet trace shows the HTTP request header and IPP request were both transmitted by CUPS without waiting for the 100-continue response.

> > As an aside, I'm not sure why CUPS is bothering with the "Expect:
> > 100-continue" header in this case, as the Get-Printer-Attributes request
> > is not large.  I can see that it might be useful to include it in the
> > "Print-Job" request if the file to be printed is big.
>
> The main reason is to get the "I need authentication" or "I need to upgrade
> to TLS" response before sending any of the request data.

OK - this makes sense.

> > I think the problem here may be that instead of getting no response and
> > therefore continuing as if one had been received, CUPS gets the
> > "HTTP/1.1 200 OK", which might confuse it (though I haven't had a careful
> > look at the relevant code to see if this is indeed the case).
>
> Nope, the code basically accepts any status response; if no response has
> been received we wait up to 1 second to get one.

Yes - having spent time looking at the code I can now see that its not the
"HTTP/1.1 200 OK" response that is confusing CUPS - rather it seems to be bogus
responses that I'm theorizing are being pulled out of a previously received
packet.

My original cups trace log (still available at
http://ecs.victoria.ac.nz/~duncan/co256-copier-cups-trace.txt) is
27077 lines long.  I can certainly understand that you wouldn't be interested
in delving into that!  So I spent some time extracting out what I thought
were the "interesting" bits and now have a 204 line
http://ecs.victoria.ac.nz/~duncan/cups-log-summary.txt file.  The number at
the start of each line is the line number from the full log file.

The following may be an explanation of the delayed printing we are seeing, based on the contents of that summary log file.

- CUPS sends an initial Get-Printer-Attributes request

- The problem in httpUpdate() described in STR #3778 causes pending writes to
  be flushed (causing the "malformed packet") and also consumes all of the
  incoming headers of the response.

- That causes ippReadIO() to fail to read the incoming message (line 33 of
  summary trace, 4530 of full trace) which triggers a retransmission of the
  request and another response from the printer.

- Somehow (sorry - couldn't understand the code well enough to see why) there
  is still a buffered copy of a response from the printer with some of the
  bytes already consumed.

- The next attempt to read a response from the printer gets bytes from this
  buffered response, which are out of sync with what it expects.

  This causes the sequence of "httpUpdate: Bad response line ..." that you can
  see at lines 51, 83, 113, 144, 157 and 163, each of which results in a
  "WARNING: recoverable: Network host 'co256-copier.ecs.vuw.ac.nz' is busy;
  will retry in XX seconds..." message.

  Note: from looking at the packets in the tcpdump trace these bad responses
  do seem to corespond to what you would expect if CUPS was reading data from
  the initial response packet up until a linefeed (0x0a) character each time it
  occurs.

- After the 30 second time out, the buffered data has all been consumed and
  so CUPS finally gets an expected response to its Get-Printer-Attributes
  request and can now send the Print-Job request.  The result is a delay of
  5 + 10 + 15 + 20 + 25 + 30 = 105 seconds from when the first
  Get-Printer-Attributes is sent until the Print-Job.

- A similar sequence occurs after the Print-Job request (another malformed
  packet, followed by ippReadIO() failing to read a header and then a sequence
  of "httpUpdate: Bad response line ..." messages.  But as this is after the
  print job has started appearing on the printer it isn't as noticable!

Finally, if the above is indeed the explanation of our delayed printing
problem, I'm assuming that it is only affecting the Fuji/Xerox printers because
they don't respond to the "Expect: 100-continue" header line in the original
request in the way that CUPS wants them to.  Our HP printers do respond with
"100 Continue" and we don't see the delay with them.

I don't know whether or not it is legal for the Fuji/Xerox printers to not
respond with the "100 continue" message as my knowledge of the subtlties of
HTTP/IPP is not great.

If this is the printers fault, would an acceptable workaround be a configuration
option to tell CUPS not to expect the "100 Continue" response for certain
(broken) printers?

Thanks for your time looking at this so far.

Duncan