Writing

Feed Software, technology, sysadmin war stories, and more.

Tuesday, September 19, 2023

Add extra stuff to a "standard" encoding? Sure, why not.

I've built more than a few projects which use protocol buffers somewhere in them to store data or otherwise schlep it around - in files, over the network, and that kind of thing. A friend heard about this and wanted to write an implementation in another language and so I supplied the details. Everything seemed to be going fine, but then we started getting *really weird* errors when he tried to point his new client at my server process.

Just trying to get the outermost "envelope" thing to pass would fail. This made no sense. We finally had to get down to individual bytes from the network dump to try to sort it out. Then we tried to encode "the same thing" and got two different results. His end was generating "1f 0a 0b (string)" and mine was doing "0a 0b (string)".

Where was this extra 1f coming from? We started trying to unravel it according to the rules of protobuf: the tag of a record is a varint which comes from the field number and wire type and blah blah blah... and I won't even bother with the details here since that was also a dead end. It decoded to "field 3, type 7" but there isn't a type 7. There are just 0-5. So, again, WTF? What is this "invalid wire type 7" thing? (And yes, that string in this post is entirely deliberate.)

My friend is good at this sort of thing, and so started digging in deeper... and it started looking like a length byte. It's like, wait, what? Hold on. protobufs do not work that way! They don't have their own framing. That's why recordio was invented, and countless other ways to bundle them up so you know what type they are, how long they are, and all of that other stuff. The actual binary encoding of the protobuf itself is bare bones! So what's up with this length byte?

So then we started looking at this protobuf library he had selected, and sure enough, the author decided it was a good idea to prepend the message with the message length encoded as a varint.

WHY? Oh, why?!

And yes, it turns out that other people have noticed this anomaly. It's screwed up encoding and decoding in their projects, unsurprisingly. We found a (still-open) bug report from 2018, among others. They all manifest slightly differently, so not everyone realizes that it's all from the same root cause.

The fix was dubious, but it did work: you skip the "helper" function that's breaking things. That gives you just the proper bytes, and then everything is happy.

That's how I got both a "second source" for speaking my goofy RPC language and another story about wacky broken libraries at the same time.