Writing

Feed Software, technology, sysadmin war stories, and more.

Wednesday, September 1, 2021

Followup on the dashboard story

Over the weekend, I wrote a post about a big company project that dragged on for six months. It elicited a bunch of responses in the usual places, but two or three of them were a little... off. I figured I'd handle all of them at once right here.

First, someone mentioned that I had checked in with the dashboard team later the same day as the original meeting, and implied that was awful. Now, I'm not sure exactly why that happened at this point several years later, but I can guess based on how I'm likely to handle such a thing now.

I bet there was something that came up in the meeting where I said something like "you should be able to talk to our server right now but give it a try and let us know if you can't". This meant that our RPC definitions were checked into source control like you'd expect, our service was already running in production (and NOT on someone's dev machine), and it had a well-defined service name. This means anyone who knew how to talk the company's flavor of RPC could totally send it a request and get something back given just a little yak shaving work.

Given this, I think they should have tried the equivalent of a hello world, or basically flipping the metaphorical light switch on and off to see what it controls and what happens in general. If it didn't work for them, then they could have told us and we would have unblocked them right away.

I didn't expect the whole thing to happen that same day. Now, honestly? Given how long it took me (a non-frontend person) to actually hate-build the thing months later, it *should* have taken a person who does nothing BUT frontends (from the dashboard team) an afternoon. It would have been a long afternoon, but it would have still been an afternoon. Make RPC, create list of objects, do the async "generate" calls on all of them, render into page, ship to client, done.

Phase 1 was expected to just be the list, and phase 2 would have added the panic button thing. Neither of them should have taken very long.

...

Next, someone managed to read the whole post and yet still said I had "no strong sense of urgency or ownership". This sounds like a person who's never been in that situation. If you lean on some of these people too much, they will actually *slow down* and then make your life even more interesting! They will say terrible things about you in order to screw up your performance report or whatever your company may have.

That speaks to the urgency situation, so now let's cover ownership. This is probably because I didn't go rogue and just create the page in the first place. Again, this is obviously the sort of thing someone who's never lived this would say. When you're in this situation, there are reasons you at least *try* to follow the "official channels" route to get things done.

Again, it comes down to not wanting to piss off people who then make your life interesting and ultimately makes it a terrible place to work.

The very fact that we went and wrote our own page in the end proved that ownership and attention never lapsed from our side of things. At that point, there was enough badness from the supposed dashboard team to balance out whatever "trouble" might come from routing around them.

This is another part of that kind of life: instead of just doing the right engineering thing from the start, sometimes you have to wait for enough badness to build up so you can "get away with" going and doing it yourself.

Also, further complicating things is that the same outcome can be viewed as great good or great evil depending on what purpose it suits for the person doing the judging. Someone who's trying to support you will say "they played it cool, kept calm, were very patient, and then went and learned how to do it themselves to solve the problem".

Meanwhile, someone who is not interested in seeing you succeed will turn it into something like "they were impatient, not a team player, and sowed discontent by making the people on the dashboard team feel bad about themselves".

This is the sort of thing I was talking about earlier this year.

...

A third comment asked what was wrong with the shell script. Oh, so many things were wrong with it. Let's see. First, it was running in screen on my dev machine. That is not how you run a production service! When the machine rebooted a few months into this, it meant the crappy public_html version of the page stopped updating. On that day, I actually thought our service was broken since it hadn't changed for hours.

It had other problems, too. It ran every couple of minutes no matter what, so it was doing work to generate an output page that nobody would see. Then, when someone DID load it, they'd see the status as it was a few minutes in the past. That means it was both wasteful AND laggy at the same time. That's terrible!

Finally, the placeholder page wasn't "plugged in" with the rest of the data model/objects at the company. There were things in that status which were supposed to be nice pointy clicky links to other things, but that only worked when it was a "proper" page. The crappy static HTML version would never be able to do that.

...

When it comes right down to it, there are people who have lived it and get it. Then there are people who complain about the fact that the story is being told, because it's not okay to talk about such things, even though they really did happen. There are also people who don't believe such things happened for whatever reason, and, well, they'll find out for themselves some day.