Feedback on lessons, leap seconds, and LLMs
I'll roll up some responses to reader feedback here.
...
Someone asked if they could view the old code lessons. The one I put back online last year is where I do a terrible little TCP listener, compile it, start it in the background, and then connect to it with netcat. It's awkward as hell, but it's there if you really want to see it.
There is also the six part "protofeed" demo which showed how to fetch a feed in this protobuf-based scheme I rigged up. Spoiler: it's not there any more, since nobody was using it, so it's not very useful to follow the instructions on that thing.
Ironically, that fetcher program would run afoul of all kinds of badness if it was pointed at a production site. It doesn't do conditional requests, it doesn't know about Cache-Control headers, it won't recognize that a 429 is asking for throttling, and so on. I guess that's okay for something that was a proof of concept to show how to fetch something from the network and parse it, but *actual* feed readers get all of that stuff wrong, too.
...
Another reader asked if the Linux "hrtimer" glitch from the leap second was fixed. I have to assume that it was based on the fact that most people didn't have that same problem three years later when we had the one in my "leap smearing" story. My worries were about userspace stuff.
This is an opportunity for me to share just why I went to those lengths. In short, it was because of a lack of confidence in everyone everywhere doing the right thing in terms of time handling. If everyone uses monotonic clocks for measuring durations and otherwise is okay with wall time going backwards now and then, then there's no reason to smear it out. My own personal systems have never smeared a leap second. They just ride it out and keep on going.
I couldn't assume the correctness of such implementations at the company. Worse, even if I went and deliberately injected backwards-going time step operations and proved that it would crash some code, there was no guarantee of anything coming from it. I had found myself in a place at that company where some parts of it were completely unresponsive to the problems they were causing for themselves and sometimes for other people, and was starting to tire of the "bad cop" schtick. That's where I'd show up and go "your shit is broken" and they would do nothing to work with us (the whole team) to do something about it.
I just had this feeling that if we repeated the last UTC second of June 2015, we'd end up breaking something. What's kind of amazing is that later on that year, it actually happened.
Someone misconfigured the ailing NTP appliances to *not* apply the correction factor from GPS to UTC. This ended up forcing one appliance into shipping unadjusted GPS time to roughly half of production via NTP, and the difference at the time was something like 17 seconds. (This changes, and indeed, it's no longer 17.)
Anyway, I got to working on this after hearing about it and found roughly half the fleet running 17 seconds fast. It was completely unreasonable to try to "smear off" 17 seconds to get things back to normal - that would have taken weeks. I made the decision to fix the setting and then let every broken machine individually have its clock dragged backwards the 17 seconds to where things should be.
This broke stuff. Some kind of sharding mechanism deep inside the fabric of things was using wall time to determine something or other, and when it jumped back, it fired an assertion and killed the program. This nuked the web server (or whatever else was using that library).
So, basically, every single machine which had been poisoned with the bad time and which was running this library was going to crash exactly once and there wasn't really anything which could be done about it. It was something like 2 in the morning by this point and I opted to let it happen.
About the only good thing about this is that the adjustments happened at different intervals depending on the ntpd poll rate, so it's not like hundreds of thousands of machines all crashed their workloads at the same time. One would pop off here, then one there, and so on... over the span of an hour or two... until it was all done.
Thus, some services didn't really go down, but they did have a bad time with a bunch of failed/dropped requests which were on the affected systems.
That one was dubbed the "Back to the Future" SEV. At least one team made a screenshot of some display showing the 17 second offset the banner of the group where they talked about production issues.
Stuff like that is why I smeared it out. When you can't be sure of the correctness of implementations, and there are good chances that attempts to fix them will be ignored, rebuffed, or actively attacked, you have to "treat the whole situation as damage" and route around it. You remove the discontinuity in wall time to save them from themselves.
...
A reader asked for my take about "AI" and LLMs and all of this.
In the vein of the "annoyances" post from earlier in the month, I'll start by saying that I don't push any of that on you here, either. All of this stuff is straight off my keyboard with a sprinkling of ispell applied after the fact. Even that's of limited utility since there are a bunch of technical terms and not-really-words that I use for various reasons.
I think all of the hype and waste has generated an enormous mountain of useless nonsense that has attracted the absolute worst of the vampires and buzzards and bottom-feeders who are looking to exploit this stuff for their own benefit.
The LAST thing we needed was a better way to generate plausible-looking horse shit for random gullible people to consume unwittingly, but here we are, and it's only going to get worse.
I think a lot of this falls into "the Internet we deserve".
So no, I don't use anything of the sort, and I tell people not to quote any of that crap at me, or to send me screenshots of it pretending to be an answer to something, and that they need to find actual sources for their data. This has not made me the most popular person.
But hey, I've already said that I'm obviously out of touch with what most people are up to. My green-on-black terminal with nano in it that's writing up a bunch of plain text with a handful of triggers for callouts to other posts should be proof positive of that already. Hardly anyone else does things this way any more. That makes me the weirdo, not them. I know this. I'm okay with this.
