Software, technology, sysadmin war stories, and more. Feed
Saturday, June 2, 2012

How to tell when your print server truly hates you

Back in April, I wrote a post about a tool called catslow. It was something a couple of my friends invented on our school's Unix box to work around a serious flow control configuration problem on the dialup lines. I always thought it was kind of silly, and never had any need for it. Last night, that all changed.

I have this dumb little print server box called a Hawking Technology HPS12U. It's just a generic plastic box with a barrel plug DC connector, an Ethernet port, a parallel port, and a couple of USB ports. It basically lets you take a non-networked printer and put it directly on your network.

I bought it several years ago when I decided there was no reason for my LaserJet to take up room on my desk for how infrequently I used it. The idea was to buy this box, put it relatively far away, then drag Ethernet over to it. Power would come from a local outlet, so it wouldn't take up precious space on my desk's power strip, either.

That was the theory, and for the most part, it's allowed me to do that. The problem is that it's never been completely right in the head. I've always had problems printing anything of any size. Back when I was working on school projects, sometimes I would need to print out a dozen pages in order to use them as reference. More often than not, it would just get stuck while printing.

When I say stuck, I mean the "data received" light would be lit on my printer, but it would never do the clicks and whirs of actually transferring toner to paper. It would just sit there. Likewise, the computer would just sit there and look at me stupidly.

Cancelling the job never really seemed to work, and I'd usually get a page full of raw binary junk for my trouble. I had to power cycle that printer more times than I'd like to admit, and probably caught it in the middle of feeding paper far too many times. I had to pop the cover and extract a wayward sheet more than once before I figured out how to make it stop safely: pull the paper tray and wait.

This didn't just happen when printing from Windows. It would also happen from my Mac and even from my Linux box. I couldn't figure it out. I could still print relatively small jobs, so I just had to break them up and send a page or two at a time. This was stupid, but it did work.

At some point, I decided it was time to try a different strategy, so I dug around on some used parts sites and found the PostScript module for this printer. The way HP did it in those days (and may still for all I know) is that you bought a SIMM to expand the memory, and the PS interpreter came on the chip. Don't ask me how that works, but that's how they implemented it.

I figured this would let my machines submit relatively small PostScript jobs instead of big raw PCL dumps, and that would mean less data going to the printer. Then maybe it would work. Besides, more RAM has to be good, right? I wasn't sure if my problem was just the printer running out of memory or what, so throwing more at it couldn't hurt.

I did this, and things seemed to improve, but I still never printed enough to really gauge it one way or the other. That was pretty much the status quo until last night when the thing crossed me for the last time. I wanted to print a single sheet containing a relatively complicated black and white screenshot, and it just would not go.

After going through the reset and restart dance a few times, I finally just gave up on printing from my Mac and decided to do it from the Linux box where I could have much more control over what happened. First, I turned the file into a PostScript dump by printing to a file from xv, and then I used nc to fling it at the printer over the network.

That's when I saw it at last: the print server and my machine were having a full-on food fight on the network. For some reason, the print server would eventually dial down the TCP window to zero and stop accepting new traffic. This would make my end start buffering it, and when that filled up, write() would block and nc would just sit there.

Eventually, something would clear, and it would start going again for a second or two, but eventually it would hang up again. Here's what it looks like when it starts lobbing "win 0" at my Linux box:

01:56:19.750965 IP a.b.c.10.9100 > a.b.c.2.39052: Flags [.], ack 2130009, win 0, options [nop,nop,TS val 9702476 ecr 2378987518], length 0

Every time this happens, my sending machine backs off and waits for 5 seconds before trying again. Sometimes it starts up again, but other times, it just gets stuck firing these back and forth. Here's what it looks like when the Mac is involved:

02:03:25.054898 IP a.b.c.10.9100 > a.b.c.3.51962: Flags [.], ack 1127001, win 0, options [nop,nop,TS val 9745041 ecr 683216363], length 0
02:03:33.254996 IP a.b.c.3.51962 > a.b.c.10.9100: Flags [.], seq 1127001:1127002, ack 1, win 33304, options [nop,nop,TS val 683224499 ecr 9745041], length 1
02:03:33.255420 IP a.b.c.10.9100 > a.b.c.3.51962: Flags [.], ack 1127001, win 0, options [nop,nop,TS val 9745860 ecr 683224499], length 0
02:03:41.469104 IP a.b.c.3.51962 > a.b.c.10.9100: Flags [.], seq 1127001:1127002, ack 1, win 33304, options [nop,nop,TS val 683232654 ecr 9745860], length 1

On the Linux box, it backs off even more:

02:20:49.138620 IP a.b.c.2.34813 > a.b.c.10.9100: Flags [.], ack 1, win 512, options [nop,nop,TS val 2380457152 ecr 9158], length 0
02:20:49.138960 IP a.b.c.10.9100 > a.b.c.2.34813: Flags [.], ack 178105, win 0, options [nop,nop,TS val 10485 ecr 2380430763], length 0
02:21:15.698617 IP a.b.c.2.34813 > a.b.c.10.9100: Flags [.], ack 1, win 512, options [nop,nop,TS val 2380483712 ecr 10485], length 0
02:21:15.698948 IP a.b.c.10.9100 > a.b.c.2.34813: Flags [.], ack 178105, win 0, options [nop,nop,TS val 13140 ecr 2380430763], length 0
02:22:08.818618 IP a.b.c.2.34813 > a.b.c.10.9100: Flags [.], ack 1, win 512, options [nop,nop,TS val 2380536832 ecr 13140], length 0
02:22:08.818969 IP a.b.c.10.9100 > a.b.c.2.34813: Flags [.], ack 178105, win 0, options [nop,nop,TS val 18449 ecr 2380430763], length 0

When that happens, it has officially crossed from merely ridiculous to completely hopeless. Deleting the job in the Mac OS printer job list doesn't actually do anything useful. Whatever process is doing the writing is still out there underneath all of this despite my request. If I now power-cycle the printer, when it wakes up, it will start receiving data mid-stream, and it will start spewing binary gunk as soon as it gets a ^L.

Naturally, both lpq and lprm claim there are no active jobs. Finally, I just reboot the stupid print server box so it will RST the connection and stop trying to push from the Mac. Only now can I reset the printer without inviting the wrath of the binary garbage monster.

For some reason, this struck me as a problem related to things going too quickly. I had always wondered about this ever since first noticing the correlation with big jobs and badness in years past, but now it was time to take action. I wanted to see if something could be done about it.

At this point, the only thing to do was admit that I had to do something really evil, so I sat down and wrote a quick program to open a file, open a TCP socket, and throw data from one to the other 1 KB at a time while sleeping 100 ms between writes.

This one behaves significantly differently. Oh, sure, it still gets wedged every now and then, but the difference here is that it usually manages to recover and starts going again! Yes, capping the send rate to a mere 80 kbps seems to be the key. Slowly but surely, it grinds through the file and eventually finishes, and I get a piece of paper as my prize.

How long does it take to run at this glacial pace, including the hangs?

Almost eight minutes. Yes, this stupid thing reduces my printer from 8 pages per minute (its official rating) to 8 minutes per page.

Fixing the print server is not an option. It's a scary little embedded box with an ancient MIPS Linux environment and a hard-coded root backdoor to boot -- username: edimax, password: software01 - enjoy!

I guess I have to make a trip to a store this weekend and sort out yet another one of life's little technology failures. I'm sure it will be all kinds of fun finding something which actually has a Centronics... err, sorry, IEEE-1284... port on it. It'll probably wind up being a combination of a USB print server and some kind of adapter cable in between.

Oh, yeah, I'm sure that will end well.