Software, technology, sysadmin war stories, and more. Feed
Sunday, May 13, 2018

Responses to strftime and %G

I'm back. While I was gone, my post about strftime and %G made it on to Hacker News. The comments weren't too bad this time. Still, there are some points which deserve responses, which I will handle in a batch right here. Some are summaries, not exact quotes.

Programmers should pay attention to these things.

Yes, they should, but they don't, at least, not all of them pay attention to all of the things all the time. That's where we get our problem: it has to be 100% knowledge expressed by 100% of the people 100% of the time. Introduce anything less than perfection and you will eventually start seeing problems like this. Have less than 100% in more than one aspect and it will multiply to be even lower reliability.

100% knowledge means that you know every corner case in every aspect of strftime() and friends. You know every possible way it can go wrong and know how to handle it in advance. I believe this may actually be possible in some controlled environments, but the amount of energy required to actually reach 100% in the normal unbounded environment most of us operate in probably approaches infinity. Junior people are nowhere near this.

100% of the people means that every person who touches a bit of code knows what's up for every dimension that matters. This means the programmer and any code reviewers. It includes the senior folks who are supposed to be wandering around looking for random stuff to improve and pitfalls to protect against.

100% of the time is basically the duty cycle. It means you have no down time. You're never sleepy, sick, or grumpy. You never have a bit of gunk floating in your eye that makes one letter look like another. You are never distracted, never lose your place, and never write the same code twice. Much as some of us might like to think of ourselves as unfeeling automatons who are perfectly executing some code, it just doesn't hold up in practice. People fail at things due to squishy analog reasons. People who can do something today might not be able to do it tomorrow, but will be able to do it again afterward.

Think about yourself. What kind of reliability figures have you arrived at? Let's say there's 99% knowledge, 99% of the people have it, and they're "all there" 99% of the time. .99 * .99 * .99 = .970299. That's 97 percent. "One nine", and partway to a second. Not great.

This is what we have to build for: ourselves.

Now, if you are THE ONE, this doesn't apply to you. But it does apply to the rest of us.

This kind of solution shouldn't hide in the depths of one company.

Absolutely. I shouldn't have to fix the same pitfalls at every company I work for, consult for, or engage as a contractor... but it keeps coming up. On the one hand, it means there is an unlimited pool of things to keep fixing as new companies "come up" and need to find their footing. On the other, wouldn't it be nice for this stuff to be hard to screw up by default?

Real Engineers have best practices. They know things about metallurgy and chemicals and failure modes and all kinds of interesting stuff like that. There are whole lists of things you shouldn't do when you build something or make something.

If we want to ever get away from the "first woodpecker" problem, then the entire industry is going to have to embrace a body of knowledge learned the hard way.

My writing is just my attempt at getting some of this stuff out of my head and back out into the ether for everyone else, as other folks have done for me.

A stray "%G" is no big deal.

At least one major web service was apparently clobbered by this a couple of years back. I also think that iOS stuff (iPhones and the like) used to have problems with scheduled events during the first few days of a year, and this happened for several years until they finally sorted it out.

The whole ISO year model has its uses, but it can slip unnoticed into places where you want regular year numbers (for some values of regular: not everyone is on the same calendar!), and then that's when the fun starts.

A random thought: this sounds like another case of bare values being trouble. Perhaps some kind of typing would make it harder to screw this up, but I have no idea how you'd put that into practice. I'll leave it to the language nerds to work that one out.

They should RTFM.

This is pretty closely related to the first item, but it has enough nuance to warrant its own reply. It also has to do with multiplying percentages together and hoping you arrive at 100% or as close to it as you can get.

The manuals have to exist. Some systems don't install them. Other entry points to various dumb things in our code bases don't even have manuals because the code was written and then the docs were forgotten.

The manuals have to be complete. They have to cover all of the troublesome points in the API or there will be "unknown unknowns" hiding all over the place.

The manuals have to be good. I don't think everyone is necessarily equally suited to technical writing that actually connects with its audience. It's possible to be completely technically correct and yet be completely useless since nobody can make any sense of what you've said.

The reader has to understand it. Think of all of the reasons that someone could mis-parse a manual. Not everyone speaks the same languages, and even then, there are degrees of understanding for each one. Not even "native speaker" gives you a pass here. Just look at the errors I make in writing, and I've been using this language for decades now.

The reader has to know that manuals are even an option. For every person that goes "well yeah, I knew about that", someone else reading the same post probably thinks "this 'man' command is magic! TIL!"...

The reader has to have time to read it and appreciate it completely. If you're at some place where the order of the day is rush-rush-rush, it's not that surprising to think that you might just slap together the first thing that runs, and then just ship it to prod and move on to the next thing. If it breaks, you'll deal with it then.

A whole lot of the valley works this way: if it works, ship it. If we stay in business long enough to have problems to worry about, go back and maybe fix them. But only if we have to.

It's well-documented all over the place.

This reduces to the above points: I know that, you know that, but whoever wrote it doesn't know that. It's too late to go back and smack them over the head with the manual once the damage is done and the bad code is checked in. The question is: whatever will we do to stop the next new person from making the same honest mistake?

Put it another way: the me of 2018 knows this. The me of 1998 probably did not. Both of those people wrote code that's still around running in systems all over the place. What's the difference? 20 more years of experience. I did not show up knowing all of this stuff. Quite the contrary.

What do we do about the folks who haven't done their 20 years yet?

Timezone abbreviations ("PDT") are awful.

Totally. An excellent point. Use offsets from UTC, maybe? -0800 or something like that. Or just use "Z" if you can get away with it?

Killing format strings makes l10n much harder.

Does it? Can't you just say {%year_month_day} or some other call-out in your templating stuff and then just have the translation system do the right thing? Why does it have to be a format string?

My wish is to make your desires obvious. %year_month_day would suggest normal human-readable text, whatever that may be for the reader's settings. %ISO_year_week would be something else entirely. You can rig them to look significantly different so that people would think twice before going for the weird versions of things.

Compare that to a %G vs. %Y situation, where a cursory glance at the man page says something about years, and it looks good enough most of the time.

Easy stuff should be easy, but hard stuff should be possible. We've got the second part covered, but the first part isn't there yet.

People don't pay attention to what they're reading?

If they're reading it.

You're making this up. This doesn't happen.

I assure you that this has happened all over the place already. Stuff has broken. I also found cases of it where it would have broken, if not for it being noticed and patched before the end of the year rolled around.

My stories are based on the things that real people have done, and in some cases, continue to do. You can pretend it doesn't happen, but I can't afford to do that.

They should have written tests.

What is a test, really? Let's say your code takes in a time_t and returns a human-formatted string. How do you test that? You probably do something like "I call it with number X, and it returns string Y".

EXPECT_THAT(f(1526244242), StrEq("Sun May 13 13:44:02 PDT 2018"));

What if none of the values of X include the funky days where %Y and %G differ? The tests will all pass. You might even have 100% code and branch coverage... but the bug will still be there.

You'd have to get exhaustive in terms of permutations. Do we teach people how to think that way when writing tests? There's a certain snarky, negative, pessimistic mindset that you have to get into to really do it justice. You have to think of every possible thing that could go sideways ahead of time. The test has to be better than the code it's testing.

Considering the same programmer probably writes both the buggy code and the tests, what are the odds of it being totally correct on both ends?

Do you force them to write out every single permutation for the next 100 years, just to be safe? I know, that's reducing to absurdity, but really, what else can you do to be sure? How many EXPECT_THAT lines do they get to write?

How would you solve for that? Have something that just fuzzes the input and starts cramming in all kinds of stuff? Let's say you do that. Now how do you know what the output should be? You can't use the same code to get the "expected" output, since that would be a tautology. It would always match.

No, you'd have to use some other system to do the time_t -> asc thing for you. Maybe you shell out to /bin/date or something.

This could actually work, assuming /bin/date itself (and everything it relies on) did things properly in the first place. It would eventually find a number in the wonky part of the year where the year number and ISO year were different, and then the test would fail.

%m and %d mean month and day, so %G for year should raise suspicions.

Indeed! But... it still happens.

Sometimes it's just copypasta. One person makes the mistake, and 10 others grep the code base, find it, and copy it. Now you have 11 mistakes in your tree.

Other times, it's not as simple as Y and M and D.

This happens in other languages. Did you know that PHP has a date() function that takes format strings without percent signs? It does, and it also gives you the option of ISO years, except in this case it's "o".

Interestingly enough, the PHP manual page online at the time of this writing actually puts the year options in the following order:

This is intermixed with characters both before and after those in the alphabet. It's unclear exactly what sorting (if any) they have applied to arrive at the order in that man page. I suspect it's a matter of trying to "bunch related things together", and then alpha-sorting within those groups.

Still, the one you probably don't want is first in the list, just like with strftime.

I've seen a lot of PHP code, and I've seen this error in it.

Also, this can be hard to grep for. Imagine this:

strftime(x, y, "%G-%m-%d", ...)

Okay, that's easy. But what about this?

strftime(x, y, fmt, ...)

Oops. Hope you weren't grepping for literally "strftime.*%G". So okay, you look for "%G-%m-%d" and so you find it this way:

string fmt = "%G-%m-%d";

Cool, cool. But, you're not going to catch this one:

string fmt = "%G"; if (meh) { fmt.append("-%m-%d"); }

People do this sometimes!

The point is: short of (some really good) static analysis or just blocking calls to the potential trouble spots, you're not going to catch all of them with grep alone.

Everyone has a different take on this, and that's okay. Just remember, you don't have to be hateful about it.