Writing

Feed Software, technology, sysadmin war stories, and more.

Thursday, March 5, 2020

Popular posts from leaky bug-tracking systems

My web server logs referrer data whenever it's available. It means I pick up a bunch of attempted referrer spam from some crazy Kyivstar GSM blocks, but it also means I can see when certain sites link to me. While this has shrunk over the years as browsers curtail what data they're willing to convey to foreign origins, it's not quite dead yet.

Watching this over the past year or so has showed me that a handful of my posts are apparently very popular in bug tracking systems. I see quite a few inbound visitors from various corporate JIRA systems, and even a couple from some GitHub issue tracking pages.

I figured I'd list some of those here, just because they seem to be problems that a lot of people are still happening, years after they were identified and documented. Maybe having all of these "greatest (bug) hits" in one place will get the word about them around more.

I'll leave out the details of who's having what problems for their sake.

March 17, 2013: Time handling is garbage

This is where I tear into the whole thing where mktime() and strptime() are a recipe for disaster. It's inconsistent across different quasi-Unix systems I have access to.

At least 2 separate companies have this one.

July 15, 2019: Your nines are not my nines

So-called "cloud services" are found doing this thing where they measure the success of ALL queries to them, not just per-customer, or other slices that matter to actual humans. It turns out that your business is tiny and insignificant and your solid wall of failed requests doesn't even move the needle on their end. They don't notice and don't care.

I'm guessing the two companies linking to this one are actually providers of same, and are trying to do right by their customers. Or, I suppose they might be customers of some giant evil vendor and they're suffering. It's hard to be sure.

October 27, 2014: Reading /proc/pid/cmdline can hang forever

This one still seems to be going strong based on reading some of the pages I could actually load. People doing things with Linux "containers" (cough, snort) seem to get in trouble with this a lot.

At least 2 projects and at least 1 company has hit this one.

January 30, 2017: Don't setenv in multi-threaded code on glibc

The environment really, REALLY is not your friend. Get what you need out of it at the beginning of your run and never touch it again. Maybe grab a copy of "environ", parse that, and access it if you need the details. Definitely do this before you start any threads.

At least 6 separate companies/projects have hit this one.

June 7, 2011: Don't mix threads and forks

This is a story about Python, but it applies most anywhere else, too. If you have a process that's multi-threaded and call fork(), you're juggling chainsaws blindfolded. Don't be surprised when you lose an arm when something doesn't go JUST right. Spin off a child runner early. Have some kind of dispatcher thing. Or, better still, if you can, never run subprocesses, for they are the source of much evil.

At least 2 projects have hit this one, too.

February 21, 2016: A mystery with memory leaks and a magic number

I'm guessing someone is seeing "1213486160" in their logs and is puzzling over why. Then they find out it's the letters "HTTP" and it blows their mind. Or, similarly, they're finding 2008-06-14 23:29:20 UTC all over the place and have no idea why. Same thing.

May 15, 2015: Filter all ICMP and watch the world burn

The decades change, but the clowniness remains the same: if you filter all ICMP "for security", and knock out the "fragmentation needed but DF set" control flow used for path MTU discovery, your users will eventually have a really bad day. It might take a pathological case like running a VPN over a cellular wifi gateway on a bus, but it'll catch up to you sooner or later.

March 2, 2014: Segfaulting atop and another trip down the rabbit hole

atop writes records with no record delimiters. If you interrupt it, it will get very very confused and you'll probably be unable to read any data from that point on until it rolls over at midnight. Of course, you usually go back to look at that time BECAUSE the thing restarted... because the box died... and so it's right when you need it most.

March 15, 2018: Bus errors, core dumps, and binaries on NFS

If you run a program on a Linux box from NFS and somehow rewrite the binary on that filesystem while it's running (say, by doing it on another machine), then expect it to SIGBUS on you. After all, you moved its cheese. (Stop using NFS for production. Really. And don't run binaries from it unless you're prepared to never touch them again.)

August 19, 2014: fork() can fail: this is important

The -1 you get back from fork is not your friend when you think it's a pid and hand it to kill() later on. It's really not your friend when you pass in SIGKILL and you're running as uid 0. Bye bye machine. This one screwed up all kinds of cat picture action when it happened, and they actually let me admit that a few years later. Crazy, right?

...

Those are the ones I've managed to notice so far.