Writing

Feed Software, technology, sysadmin war stories, and more.

Saturday, May 1, 2021

The Honest Troubleshooting Code of Conduct

Yesterday, I wrote a little about what happens when companies have a pervasive culture of shutting down attempts to talk about things that aren't seen as completely positive. I mentioned there was more to the story, and this is part two.

I found myself in yet another tech gig in an attempt to provide reliability to a company that (as it turned out) was about to have its IPO. Since people like me weren't allowed to know the actual date and time of these things, we could only read the same tea leaves as the rest of the world. We could watch the news cycle and the IPO road show and guess at the rest.

It would be *really bad* for this kind of stuff to go down, particularly when the execs are out doing their dog and pony thing in front of the press. Then of course, exactly that happened one afternoon. What's kind of amazing is that this event isn't captured anywhere in the press cycle, but internally, everyone knew about it.

This place hadn't quite ascended to the level of giving outages notable names ("Call the Cops", "Silent Night", "A Tree Did It" being just a few from elsewhere), so it remained just "1152" after an associated ticket in a bug tracking system.

It was clear: we had to do better than this. We needed to talk about broken stuff, and soon, but every time we did this, some annoying person would show up and start "well actuallying" the conversation to death - usually several hours later after everyone else had moved on! I gave an example of this yesterday about someone showing up, completely mis-parsing "Ubuntu is doomed" due to a lack of context and jumping on someone.

The problem is, it didn't end there. There was all kinds of shit being reported up the line through managers, up to directors and/or HR, and then *back down* onto the speakers of these lines. This happened to anyone who brought up difficult topics about broken things that threatened reliability, security, or really anything else you can imagine.

It even came down on me a few times. I remember being given crap for talking about something in a supposed "leads" channel which was private and had perhaps eight people total in it. Worse yet, the person in question didn't have the character to talk to me directly about their issue (which at least the "Ubuntu" person did...), and so I had to hear about it from their manager.

It didn't take me very long to do the set logic to find the place where person A was present but person A's manager wasn't, and the topic was being discussed and all of this.

Given this, you'd think I was talking about politics or religion or some stupid crap like that. Hell no. I was talking about infrastructure, and specifically the wobbly mess of shitty "we're gonna run Flask everywhere and call it microservices" that somehow let people accomplish things over the Internet. Absolutely amazing.

They wanted me to dial it back and not talk about these things... in a private channel... with other "leads" (whatever that even meant at a company that barely builds anything). What, you want me to not do my job? Why in the hell did you hire me? Are you serious?

At some point, I decided to come up with something truly ridiculous. It's something that should have never needed to be written, and indeed, in the 24 years of being paid for doing this work up to that point, had never come up. No company had needed it before. They were mostly adults. This one wasn't, and so I had to spell it out longform.

I called it the "Honest Troubleshooting Code of Conduct", and it goes a little something like this:

These are the terms of engagement for participating in any Slack channel which has decided to gate membership on this code. You don't have to accept the code, but you also don't have to be a member of that channel, either.

Assume the best intent.

We're all here to make things better at $COMPANY. So, if you think that someone is doing something not-great or it feels like they've violated the code, try to come up with the best possible motivations on their end, and honestly believe that's where they're coming from.

We're here to improve the reliability of $COMPANY.

We do this to improve the lives of our <customers>. Keep that in mind when considering any statement in a HTCoC-governed channel.

This work can be tough and personally draining at times. We need it to be at least a little fun at the same time. Make the environment one where you still want to come back and do this job again tomorrow even though it's hard, confusing and complicated. Make it so the other folks around you are having fun, too.

Most importantly, don't make it hard for other people to enjoy their jobs.

We never attack people.

We're not here to tear anyone down, whether living, dead, or undead. Sometimes it might feel like a HTCoC-governed channel is a little unvarnished, but you should not mistake jokes, shorthand, references, or swearing for something that's intended to harm specific people or groups. Or zombies. (See, that's a joke.) Yes, there is naughty language here. People tend to swear when dealing with broken things. (Ever hit your thumb with a hammer? You get the idea.)

In other words, save your venom for problems, not people: broken computers, networks, tech stuff, etc.

Honesty is productive.

No feigning surprise. Someone who hasn't heard of something is a learning opportunity, not a flaw. Think "you gotta see this!" instead of "what?! how can you not know about X?".

Anyone in a HTCoC-governed channel is encouraged to say "WTF" when they don't "get" something that's being discussed. Literally "what the fuck" is fine too. (See, naughty language!)

Violations?!

If you think someone has violated one or more of these rules, consider the possibility that you're missing context, have incomplete information, or have otherwise misinterpreted their words. Give them the benefit of the doubt and assume best intent. If you're still concerned, address problems directly either via Slack DM, VC call, or (preferably!) in person.

Commentary on why this exists

Why have this? So we can talk about problems openly, and not have people think we are attacking them or their projects personally.

For example: "Ubuntu is doomed" out of context seems like someone is hating on all Ubuntu, everywhere, and Canonical too. "Ubuntu is doomed" at $COMPANY (note context) is a statement that we're switching to a new base OS which is built around Fedora, and so, yes, we will be moving away from it. Some day it will be gone. So, if someone says that, and you don't know the context, you have no place jumping on them for "hating on Canonical", because they aren't, and you're wrong.

...

[ Side note: I've squished down the actual company name to $COMPANY and a few identifying marks, but if you've been reading me for a while, you can probably guess who this is. This is squarely about wages and working conditions. Inter-worker communications for improving work conditions? Protected concerted activity? You better believe it. ]

...

Anyway, once I had that written, I decided the way forward would be to create a Slack channel that would be invitation-only. Anyone could be invited to it, and anyone could join it, but they had to agree to the code of conduct shown above. That was the only condition, and then it was on them, as it should be.

This started with a handful of seed people who did the same sort of work I did, unsurprisingly, and grew. It started getting legs. We started having actual conversations with people we hadn't met in person before. There were third- and fourth-generation members who I had no idea about who had signed up to talk about reliability without being second-guessed to death and hounded by people who just wanted to talk about "our tone" or "a lot of work went into that".

Did it work? Kinda. I heard from a couple of people that they thought they were losing their minds because they seemed to be surrounded by people in the company who kept crapping on them for trying to fix things. They thought they were all alone in this, and then some person reached out with the invitation to the channel, and they realized that no, they were not alone.

At least one of these folks told me over coffee and tea one day at the Philz down the street that "it probably kept them at the company longer" since it gave them faith that it could be brought around to doing good, solid and real work. They believed in the actual stated mission of the company and just wanted to deliver on that as much as possible without being treated like a child by other children.

...

I'll leave this one here for now, but I may have more to say on the topic at some point in the future. Let's just see if opening up about what happened is actually worth it.