Writing

Feed Software, technology, sysadmin war stories, and more.

Tuesday, May 19, 2020

Infra teams: good, bad, or none at all

I'm used to working at companies which have "product" teams and "infra" teams. The "product" teams are the ones who think about what the end users will be doing to share cat pictures or order pizzas or do whatever they end up doing.

The infra people, then, are the folks like me toiling down below them, keeping all of the plumbing working. If I do my job properly, the product people never know I exist. They just keep using my stuff and it works for them and stays out of the way. They can focus on doing what they do best, which is making the end users happy. They don't have to worry about service tiers or YAML files or any other abominations we foist on each other in these dark alleys of infrastructure.

But... not all companies handle this the same way. Let's look at some possibilities where there's a product team trying to get something done at their company. They have different experiences depending on how their company handles certain decisions. Watch what happens.

...

Company A has a team that provides a service to internal customers. Maybe they have a way for people to run jobs on the company's "job running" solution. It could be physical hardware, VMs, or some container orchestration situation. Company A's team does it, and does it well. The product team has stuff to run, connects it up with the infra team's service, and off it goes. People are relatively happy.

The product team gets to think about their product and not the infra. They excel because they spend their precious cycles on what they're good at - their product - and they don't have to worry about the lower levels of the stack so much. There's a whole team doing that and doing it well.

...

Company B also has a team that provides a service to internal customers. It also claims to run jobs on the company's "job running" solution. The exact implementation isn't important to the story here. What you do need to know is that they are absolutely terrible at it. Product teams know this, and wish they didn't have to deal with it.

They have no choice in the matter. Any attempt to "treat the team as damage and route around it" is met with swift reprisal. As soon as anyone finds out that a team is even kinda-sorta working in that space, they swarm the project and attempt to kill it. Anything which threatens their monopoly on this space must be controlled, and if they can't control it, they must destroy it.

Random product people now have to know about arbitrary crap enforced by someone else like "kubelets" and "Jenkins". They burn their cycles and sanity on terrible systems instead of improving the experience for the company's end user.

Everyone loses. Nobody's happy. Nobody can do anything about it.

...

Company C doesn't even have a team to provide the service of running jobs for some reason. Maybe they haven't thought about organizing things that way yet. Maybe they *had* such a team, and it sucked really badly, and so they disbanded it. Whatever. The point is, the product teams are on their own.

What happens is that each product team winds up doing a shot in the dark. Some find good solutions. Some find bad solutions. Taken in aggregate, it resembles a "random walk" of the problem space.

With a little luck, and more than a little oversight, the teams might eventually share notes and converge on the things which have been shown to work for them, and will eschew the ones which don't. Maybe they'll teach each other how to get things done.

It's not particularly efficient since you have product teams having to wrangle infra crap instead of dealing with what they are actually there to do -- build and grow their product -- but it does work.

...

Let's review here.

Company A is on the ball, and product teams offload their infra burden onto a clueful team. These teams do what they do best. Their product people keep their eyes on the prize, and they excel in the business and don't get stuck in dumb infra stuff.

Company B is a pathetic mass of losers. It has infra teams that can't deliver usable services to their fellow employees who are all in the same boat, and management which supports attacking teams which try to solve their problems by avoiding them. The product teams hate their lives because they are held hostage by the infra teams.

Company C is a rag-tag bunch of people trying different things, hoping it'll work. Nobody's attacking them for trying these things, since it's not "owned" by anyone. It's not part of someone's managerial "realm".

My guess is that the "random walk" situation created by the teams in the third company is actually far better than the "actively hostile" situation in the second one.

Or, looking at this another way:

A: You will probably have a good time because someone's there trying to make sure you have a good time.

B: You will definitely have a bad time, and you have no control over it. You WILL bow down before them, or you WILL be smited.

C: You might have a good time, or you might not, but at least it's up to you. You control your own fate. You can try new things and maybe improve. You aren't stuck with your decision.

Remember, it's not good vs. bad. It's good vs. bad vs. nothing. There are actually *two* choices which let you make progress.

Unfortunately, there's entirely too much "Company B" in this industry.