Writing

Feed Software, technology, sysadmin war stories, and more.

Monday, April 29, 2024

Hitting every branch on the way down

I keep seeing people saying that the answer to my complaints about autoconf is to rub *more* autoconf on the problem. I don't like this. In the general vein of "this should not be that hard", I decided to revisit something from two years ago and tried to use my build tool to generate my stuff on a fresh BSD-flavored install. (The exact flavor is unimportant here, and mentioning it by name would only trigger the weenies in the crowd, so I won't.)

I wanted to prove to myself that yes, my stuff can Just Build on other (i.e., not Linux or Mac) systems without resorting to the kinds of stuff that I wish we had collectively left behind in the 90s.

The OS itself was fine. The install process on a throwaway VM image was quick and painless. I knew how to get my usual tools installed - bash, nano, that kind of thing. Unlike last time, I opted to not do X and just focused on getting my stuff to build.

But then I made a mistake: I told it to install "protobuf" since I use that library in my build tool. That actually installed "protobuf-24.4,1" which is some insane version number I'd never seen before. All of my other systems are all running 3.x.x type versions.

Now, realize, I didn't know this was a mistake yet, and kept on going. I did manage to bootstrap my build tool into a usable binary, and then started a "build the world" process, at which point it blew up. It was complaining about not being able to find a library called "google/protobuf/arena" inside my personal source tree.

This made no sense, so I started digging, and found out that the "protoc" compiler in that version of the software spits out code like this:

#include "google/protobuf/thing1.h"
#include "google/protobuf/thing2.h"
#include "google/protobuf/thing3.h"

... you get the idea. It's a third-party library that's installed at the system level, and yet it's using "" like it's all chummy and hanging out with your code in your local repo. Yeah, no. That's wrong. They should be <> includes, like this:

#include <google/protobuf/thing1.h>
#include <google/protobuf/thing2.h>
#include <google/protobuf/thing3.h>

What's weird is... it *is* that way on all of my other machines - my Mac with Macports and my Debian/Raspbian boxes all generate those #includes with <> like they're supposed to, and everything Just Works.

I won't lie. This really made me angry at first. I was like, okay, they did yet another stupid thing upstream, and now everyone else is going to have to work around it. It got me thinking thoughts like "just how hard would it be to NOT use protobuf, anyway". I figured that this abomination would eventually filter down to Macports and Debian's apt repo and whatnot, and then I'd have to deal with it, or toss it.

After a few minutes of cooling off, it occurred to me that I could do something super-duper obnoxious: wrap protoc, and run a nasty little sed command afterward to flip the "" to <>. So I did that, and things proceeded. Awful.

Of course, then I ran into some other fun problems with my code, like IPPROTO_* definitions not being available. I have a wrapper for getaddrinfo() and it uses IPPROTO_TCP in the .ai_protocol field. I had all of the #includes that the man pages say to have for using that function, but that's not enough on this particular system.

I assume that there's some transitive #include on Macs and on glibc-flavored Linuxes that drags this in for me, but on this one BSD it doesn't work that way. The fix was simple enough, and mighty stupid:

#include <netinet/in.h>

And no, that's not listed in their getaddrinfo(3) manual page, even though IPPROTO_UDP and _TCP are both explicitly mentioned in it. Dig around online and you'll find this tripping up other people. That's the extent of my self-inflicted damage that had to be fixed to make it build: lack of a few #includes.

Stuff like this is why I tend to wall off calls into the C library with a bunch of compatibility gunk and then use my own interfaces above that.

At some point during this, I decided to go back into the protobuf git repo to see just when they decided to dump the angle brackets in favor of the double-quotes, and that's when I hit another wall of stupid. Apparently it's possible to change a git repo in such a way that "git log -p" will never show it. Did you know that? Before yesterday, I definitely didn't.

Here's how I discovered this: obviously, there was code that would do the <> stuff at some point. The last version of it I could find looked like this:

  std::string left = "\"";
  std::string right = "\"";
  if (use_system_include) {
    left = "<";
    right = ">";
  }
  return left + name + right;

It seems simple enough, if a little goofy: return "input" unless use_system_include gets set a few lines up, in which case it should return <input>. No big deal, right?

But... that code exists nowhere in the repo as it stands now. Silly naive me, I thought I could just "git log -p" and do a / search in less for "use_system_include" to find the commit which dropped it. I wanted to learn why they did this, because maybe they had a good reason, or basically, if I complained about it, what I would be up against.

I found nothing.

This started a terrible sequence where I started checking out different commits from the tree to see what it looked like at various points in the past. I got it down to a commit that contained the above code, and then one commit past that dropped it.

This must be it, right? I should be able to "git log -p" and see it, right? Nope.

commit d85c9944c55fb38f4eae149979a0f680ea125ecb (HEAD)
Merge: 7764c864b 0264866ce
Author: <removed because it's not their fault>
Date:   Mon Sep 19 14:10:44 2022 -0700

    Sync from Piper @475378801
    
    PROTOBUF_SYNC_PIPER

The next line in the git log output is the next commit. There's no "body" to this commit. It's just a "Merge:" and two other commits.

7764c864b and 0264866ce, right? I should be able to sync to those with git checkout and see which one dropped it, yeah? Well, I'll spare you the effort and just say that BOTH OF THEM have the old code in it.

So... this commit somehow drops the code even though it's merging two "ancestral commits" that both contain it, and there's no diff shown.

Confusing, right?

I don't know how I finally figured this out, but after a whole lot of cursing and thrashing, I found "git show <commit>" will FINALLY give me the results I want, ish. It contains the change which dumped the <> code and put in the new stuff.

--  std::string left = "\"";
--  std::string right = "\"";
--  if (use_system_include) {
--    left = "<";
--    right = ">";
--  }
--  return left + name + right;
++  return absl::StrCat("\"", basename, "\"");

There's no explanation or other context. Presumably that all got squashed out when it was exported from whatever they use internally.

"Why" is gone. I just have "when", and that's not very interesting: it was merged in September 2022, ho hum. That just means that whenever Linux distributions and Macports catch up with at least that point, I'm going to have to deal with this for real.

Oh, there's one more bit of batshittery which needs to be mentioned here. My stuff uses pkg-config to find out how to compile and link against these libraries, right? Well, when it was using this oh-so-new protobuf version, the commands it was running were so long, it was scrolling off my standard 80x25 terminal.

$ pkg-config --cflags protobuf | wc -c
    4326

Yep! 4 KB of cflags. Here's just the top part of it:

# pkg-config --cflags protobuf
-I/usr/local/include -DPROTOBUF_USE_DLLS -Wno-float-conversion 
-DNOMINMAX -Wno-float-conversion -DNOMINMAX -Wno-float-conversion 
-DNOMINMAX -Wno-float-conversion -DNOMINMAX -Wno-float-conversion 
-DNOMINMAX -Wno-float-conversion -DNOMINMAX -Wno-float-conversion 
-DNOMINMAX -Wno-float-conversion -DNOMINMAX -Wno-float-conversion 

... and it just goes on like this. It actually worked, though!

Finally, remember when I said that I made a problem by installing their "protobuf" package without realizing it? Yeah, it turns out they actually also have "protobuf3" which is a nice sane version just like the ones on my other machines, #include <...> and all. So, I removed the bad one, installed this other one, and dropped my sed hack.

What a night.