Software, technology, sysadmin war stories, and more. Feed
Thursday, August 1, 2019

Possibly timely items from my reliability list

Last month, I put forth some items from my "reliability list". It's just a big list of things that bounce around in my head and tend to emerge when I'm troubleshooting something that broke, or while assessing a design for something new.

This time, I have some more. You may notice a pattern in the topics in this post. It's not a coincidence.

Item: Some days have 86400 seconds, while others have 86401, and some day we might have one with only 86399.

Leap seconds are a thing, and so days can change lengths in systems that many humans use for civil timekeeping. The planet's rotation has been slowing down which has been giving us positive leap seconds for a few decades, but who's to say it might not go the other way some day?

If this actually affects you, I hope you're getting paid well to deal with it. This stuff is a real pain.

Leap second with and without smear

Item: You get at most six months warning for days of nonstandard lengths.

Yep, a leap second at the end of June is announced early in January. A December event is announced early in July. In other words, even the inconsistencies... are inconsistent. You don't get any more warning than that.

Item: Some days have 1 AM twice, while some days skip over 2 AM entirely.

If your systems schedule things using time zone definitions which include daylight saving time, you're going to have fun twice a year. One of your days will go from 1:59:59 to 1:00:00, thus repeating the hour, and another day will skip from 1:59:59 to 3:00:00, thus skipping the hour.

If you have scheduled jobs in the 1 AM block, one day a year, they will run twice on a scheduler which does not account for this. Likewise, if those jobs are in the 2 AM block, one day a year, they will not run at all.

This seems obvious, but everybody gets this wrong eventually. If your company hasn't been bitten by this yet, wait. It'll find you.

Item: Not everyone goes on summer time, and those who do don't all shift at the same points in the year.

Some places laugh at DST and stay put year-round. Others shift. Still others shift, but do it earlier or later than the last group. This means that the offset between far-flung offices in your company might not be constant! Most of the year, it might be 8 hours, but for a week or two at either end, it might be 7 or 9. Think about the implications when scheduling meetings, or setting up on-call schedules which need to not have gaps in them.

Working on this kind of stuff will make you appreciate the meaning of "picking the wrong week to quit sniffing glue".

Item: Summer time rules have changed multiple times over the years, and are sure to change again.

The US switched as recently as 2007. Remember that?

Just wait -- it'll happen again, and a whole new generation of consumer electronics will start behaving weirdly for a few weeks every six months.

Item: US/Pacific is more insidious than anyone realizes due to Silicon Valley.

You might be surprised at how many Linux boxes are running US/Pacific instead of UTC because "it's too late to fix it now". I'm talking multiple millions here across many different companies.

Why? Because of all of the companies which started on the west coast of the US and used their local time on the servers.

I mean, sure, deep down inside, the kernel is doing its thing with the same time base, but odds are, all of those userspaces think they're in PDT right now.

Item: Not all time zones are at integer offsets from UTC.

Half an hour? Fifteen or forty-five minutes? Sure, why not. Those are the EASY ones. Dig around in the historical data for the really fun stuff.

Item: At least one month changes sizes periodically.

That's February, in case you hadn't noticed yet.

Item: Some programs written from January to September will break the first day of October.

Some people's tests really don't like two-digit numbers, what can I say? You think this is an exaggeration, but it's not. This actually happened to me, when someone who should have known better left such a timebomb behind in our continuous build test suite.

Item: Some programs written in October, November or December will break the first day of January.

Same idea, only inverted. Expect it to be two digits wide, and then suddenly it isn't? Welcome to the land of misfit software.

Item: Every four years, some programs break on December 31st.

This is because they didn't account for the year having 366 days. Any Zune owners still out there after all these years?

Item: Some programs broke September 8-9, 2001 (depending on your time zone), and not because they were trying to do some kind of Nostradamus thing for the coming week.

This bit someone in my family. KDE's "Kmail" broke because Unix time went from 999999999 to 1000000000. Yes, the text representation got wider and so it couldn't parse its own data files, or something like that.

I can't really prove it now, but I have a feeling that at least a couple of systems out there went mildly nutty in June 2004 when Unix time passed 2^30. Of course, the real fun for Unix time is still to come when we get to 2^31. See you in 2038.