Writing

Software, technology, sysadmin war stories, and more. Feed
Tuesday, October 20, 2020

Violating monotonic time expectations on Mojave

Monotonic time is one of those things that you probably don't learn about until you've been bitten by not using it. The whole point of having a clock which only ever ticks upward is that the system makes an agreement with you that it will never ever go backwards. You can then use this to measure durations no matter what the human-readable ("wall time") might be doing.

That is, even if admin of a box sets the system time back 55 seconds in the middle of your calculations, you can still reliably count out 60 actual seconds and still wait the right amount of time. If you did that with the system clock (wall time), then it would be affected by that adjustment.

Let's say you're a developer, and this is you. You were counting out durations with time() or gettimeofday() or something like that, and then a leap second made a hash of your system, or someone seriously screwed up your timekeeping and your machine jumped forward and then back 17 seconds. You learned your lesson and started using monotonic timers.

Now let's say you've moved on to doing stuff on laptops and other systems which go to sleep sometimes. You'd like to track durations when the machine is actually alive and just not absolute time. You go to the man page for clock_gettime on such a machine, and you find this:

CLOCK_UPTIME_RAW: clock that increments monotonically, in the same manner as CLOCK_MONOTONIC_RAW, but that does not increment while the system is asleep. The returned value is identical to the result of mach_absolute_time() after the appropriate mach_timebase conversion is applied.

Well well well! That sounds pretty awesome, right? It ticks upward, never goes backwards, and only ticks when the machine is awake. That sounds perfect!

You build your stuff around this, and things are okay for a while. But then one day, you notice that your code is very unhappy in certain situations. Indeed, it looks like your stuff is running in very tight circles for unreasonable amounts of time. It was supposed to spin in that tight loop for at most a few milliseconds, but here it is going for minutes, and hours, even!

Then one day you notice that it happens after a deep system sleep of an hour or more, and only when the battery is below 50%. That's when you see it: on newer versions of the OS, that timer resets when the machine wakes back up!

You now know why your code would eat 100% of the CPU in that spinloop and would keep going until the machine's actual uptime had roughly doubled. That is, if you put it to sleep after 45 minutes of uptime with a low-ish battery, then woke it up some time later, it would spin like that for another 45 minutes.

Based on a report I got from a reader, this seemed to be very real on Macs running Mojave, whereas it may not be a thing on Catalina. You can see where this has been reported to libuv and llvm too.

I haven't tried to duplicate this yet on my own Macs given that it takes a lot of battery drain and then a substantial time asleep and I have other things to do with these boxes. I'm willing to believe this is real based on those bug reports.

Talk about violating basic assumptions, am I right?

...

Important update: this post was originally written flipped around, i.e., implying that Mojave was fine and Catalina was broken. This is not supported by the bugs linked. Clearly, I screwed this one up. Still, since it might be mildly useful to anyone still targeting Mojave, I'm leaving it up with the fixes applied. Thanks to Theodore for setting me straight on this.