SSL produces intermittent printer outage

John A. Sullivan III jsullivan at opensourcedevel.com
Sat Feb 12 16:51:08 PST 2011


> On 02/11/2011 08:36 PM, John A.Sullivan III wrote:
> > Hello, all.  Because our print servers and our entire environment is multi-tenant, we encrypt traffic wherever possible including our print jobs.  However, we are having a persistent problem using encryption with CUPS version 1.4.4-3 on Debian Lenny (I believe CUPS is from Squeeze, though).
> >
> > Every one in a while, the printers will not appear in the printer list of the various applications. [...]
> >
> I might be completely wrong, but: Have you checked that the server isn't
> running out of entropy?
> cat /proc/sys/kernel/random/entropy_avail
>
> Try if running rngd helps (/sbin/rngd -r /dev/random -r /dev/urandom)
>
> hth,
><snip>

It did more than help :)  You hit it right on the head and opened a can of worms which very much needed to be opened as this explains a lot of performance issues we have been seeing in other areas.  I'll paste in an email I sent to our hardware supplier, Pogo Linux (with whom, by the way, we are very satisfied - extremely helpful and knowledgeable and worth the small premium we pay them).  Oh, also, did you mean -o /dev/random rather than -r /dev/random? Here's the email:

Hello, all.  We have been noticing surprisingly poor performance on our Pogo systems given the load they are under and random application crashes.  It has been an utterly fascinating day troubleshooting this.  It seems to have very little to do with Pogo although you may be able to help us.  I also thought I would pass this along in case you have other customers with similar issues.  All of the systems involved are Atlas 1201.

The bottom line questions for Pogo are, is there a hardware random number generator source available on these devices? If not, do you sell something akin to an entropy key (http://www.entropykey.co.uk/)? No need to read further but I'll include how we came across this in case it helps others.

This began as a printing problem on our Debian Lenny CUPS server running inside a VServer on Centos 5.5 on an Atlas 1201. Every once in a while, the printers failed to appear in the printer list of the various applications.  The result ranged from nuisance (OpenOffice shows only a Generic Printer) to really nasty (Timberline crashes and can potentially corrupt its database).  This only happens when we encrypt print traffic.  Someone on the CUPS list asked if we had sufficient entropy and suggested using rngd if we did not.  That led to a half day's research on the subject and opened a huge can of worms which needed to be open.

To sum it up, we make heavy use of encryption in our multi-tenant environment.  However, for security reasons, Linux has a limited number of sources available to fill the entropy pool used for /dev/random which will block when it is out of bits until the pool is replenished.  Most of the entropy comes from mouse and keyboard.  Of course, our servers in the data center are headless and handless! No mouse or keyboard.  The entropy pool (which should normally be around 4096) was usually under 200.  This would explain why everything from our printing, to our web servers (mostly SSL), to our email (both traffic and web interface), to our virtual desktops (transmitted via NX over SSH), were getting slower and slower and slower - they rely upon /dev/random which blocks until it can gain sufficient random data.

To address this until we obtain some form of hardware RNG, we have used an ugly rngd hack which runs rngd and tells it to use the non-blocking /dev/urandom to feed the entropy pools.  That is a theoretical security compromise although there are no published exploits.  To do this, we need to do two things:

1) set /proc/sys/kernel/random/write_wakeup_threshold to a much higher value than the default 128 (sysctl -w kernel.random.write_wakeup_threshold=1024)
2) run rngd to use /dev/urandom as the randomness source (rngd -t 1 -r /dev/urandom)

There is another, theoretically more secure option but it is very new and not even packaged by major distributions.  That is to use haveged developed by one of the fellows in the references below.  It can be downloaded from http://www.issihosts.com/haveged/.  We have installed it on some of our less critical systems and will see if it proves stable.  We are also tracking available entropy on our systems to compare haveged, rngd, and doing nothing.  We'll see how we fare.  If haveged works well, it may eliminate the need to purchase a separate hardware RNG if there is not one already built in to the Atlas 1201 servers.

The nice thing about VServer is that doing this on the host does it on all the guests automatically.  For KVM devices, we will need to do this in each VM.  Do you have any idea how we add entropy for Windows systems?

I do not have a lot of analytical data to measure performance since implementing these changes but I can say, anecdotally, that our web servers for asset tracking/helpdesk, monitoring, and project management which were running as slow as molasses are now behaving like regular web servers.  It will take us a week or two to know if the printing issues have been resolved as it was a rare and random event.  I'll paste in some references below.  Thanks - John

http://www.chrissearle.org/blog/technical/increase_entropy_26_kernel_linux_box
https://www.centos.org/modules/newbb/viewtopic.php?topic_id=17789&forum=43&post_id=65467  - very thorough
http://communities.vmware.com/message/530909
http://www.arnebrodowski.de/blog/273-Entropy-drained.html
http://www.linuxfromscratch.org/hints/downloads/files/entropy.txt
http://strugglers.net/~andy/blog/2010/06/06/adventures-in-entropy-part-1/
http://strugglers.net/~andy/blog/2010/06/07/adventures-in-entropy-part-2/
http://www.entropykey.co.uk/
http://www.issihosts.com/haveged/




More information about the cups mailing list