Writing

Software, technology, sysadmin war stories, and more. Feed
Friday, November 15, 2019

Utter frustration with the eval board's tech support

While most of the machines I work on are x86 or x86_64 based, it's not always the case. There are a significant number of alternatives that you start encountering when you get off the beaten path and start talking about weird embedded projects and other things that stop resembling general-purpose computing.

In this world, you get into situations where you have something like an "evaluation board" from a vendor, and you use it as your test bed. You take whatever it is you were working on, and figure out how to get it to run on that platform. If it works out, then maybe you design your actual product around the chipset, or CPU, or ASIC, or FPGA, or whatever it is, and that's what gets shipped.

Still, long before it's a product, it's a weird little board on your desk with a bunch of cables connecting it back to your "real" computer. That's where I was at one point when this story happened: trying to get a system up and running on a weird environment I had never used before.

Obviously, something went wrong, or I wouldn't be here writing about it today. Oh yeah, something definitely happened, and I gained a new entry on my list of companies I never want to deal with ever again because they clearly don't give a damn about their customer experiences.

There I was, with an eval board fresh out of the box. I took all of the usual anti-static precautions, and gingerly set it up on my desk. I connected it to power and did the hardware test it tells you to do: flip these DIP switches this way, then push these buttons, verify these LEDs do this thing, then flip them another way, and so on.

It passed this just fine, so I flipped the DIP switches from test mode to "boot from SD card" mode, and fed it a card with an image full of their official dev environment cruft. Then I plugged it into the network, sshed in from my workstation, and started doing stuff on it. My main objective was to make sure all of the stuff I had created actually compiled and ran on this architecture and could be packaged up. After all, it would eventually run on a much bigger version of this thing, and if it didn't work here, it sure wouldn't work there.

I started up a big compile job for a massive framework and turned my attention to other things. Time passed, and I noticed at some point that my ssh had stopped scrolling. I waited for a while, figuring it was just chewing on a really big compiler or linker command, but it didn't help. ^C did nothing. Nothing happened on the serial console, either. I couldn't open a new connection to it. Nothing worked.

I powered it off, waited a few seconds, and powered it back on. Nothing happened. Normally the board would spit some random junk down the serial lines as soon as it got power, but this time it said nothing. I tried this a few times, and got the same results: nothing, nothing, and more nothing.

Thinking back to the self-test, I figured it was time to see if that would still pass. I grabbed my pen and went to flip the DIP switches back to test mode, and the head of one of them snapped off. The actual switch body was still intact, and it could be moved around with a pin or something else small enough to get in there, but the usual white pointy bit was long gone. Still, I'm sure it got flipped into test mode, and yet, that didn't work either. The board was just plain dead.

I reported this to the people who were in charge of the hardware at this particular concern and somehow, it wound up turning into "make your software engineer re-check the board". No amount of "trust me, it died without me even touching it, during a hands-off compile" would make them take me seriously, so I decided to humor them. I took the board back to my desk and tried it again, and again, and again. I tried different SD cards. I tried all kinds of DIP switch settings. I poked at the different serial ports on the thing, thinking maybe one of them would work -- they're not all consoles, after all. Some of them let you talk directly to the subsystems on the board, believe it or not. Those were also dead ends.

All the while, the customer support person from this vendor was going back and forth with us in e-mails, asking for us to do this, and that, and none of it was changing anything. This is when I made a request that I wish I had followed through on. I basically offered to buy the board from the company I was working with, just so I could take it out to the parking lot and destroy it with a hammer. Then we could get back to actually working, you know?

They didn't go for that (obviously) and the next step was to set up a call with this support tech. Of course, instead of it just being a phone call with actual phones involved, he wanted it to be a "webex". A what? I had to look this up, since all I knew of that name was "things that happen at soul sucking companies that I'm glad I've never worked for", and "a name on an office building by Great America".

I thought it meant "video conference call", and I've done plenty of those, so I was okay with that. Then I found out it involved far more. The link they sent actually tried to install all kinds of other stuff. So, no, what webex really means is "video conference call" with a healthy side of random shit we install on your machine TO TAKE IT OVER REMOTELY.

I went to join the call and my browser lit up like a Christmas tree. Install this extension? Run this application? NO! NO NO NO NO.

Yes, they expected me to load all kinds of sketchy crap on my machine so this guy could jump through it and poke at their board remotely. I swear I am not making this up. Obviously, I've done the sysadmin thing for long enough to know that you do not ever let random crap like this onto your machines or network, and reported it to the security/IT folks there. They agreed: absolutely no remote access for these weenies.

I said as much to the support person, and so, finally, it turned into a regular phone call... a call in which I managed to copy maybe 30% of what this person was saying due to the terrible audio quality, lag, and who knows what else.

I have chat logs of talking with friends during this. They all boil down to "KILL ME NOW". I spent over an hour putting up with this crap, trying to convince him that no, the broken DIP switch does not excuse you from dealing with this problem. The machine was broken BEFORE the little plastic bit split off. It has nothing to do with that. Take the damn thing back and RMA it, already! He wasn't having it.

This dragged on for days, with them wasting my time and refusing to do anything useful about the situation. Someone commented they "could feel my frustration from all the way across the office".

It's like a curse. If something's the least bit broken, I am probably going to trip over it. If there's a "stupid tree", I'm going to fall out of it and hit every single branch on the way down.

I used to read jwz's writings back in the day and initially thought "he must be really bad at this stuff since it keeps breaking". How foolish I was. I've since realized that all of these things happen to me, too.

As for the problem with my board? It got solved in a most unusual fashion. I randomly happened to meet some people from another hardware company. This one was more of an integrator place, such that they actually build the products that you'd end up buying with the chip from the place with the idiot customer support. That is, if the chip came from company X, it'd be rolled into a board and a box by company W, and that is what would get shipped as the final product.

I guess they saw the board on my desk, since we got to talking about it and the bigger situation, and they thought it was batshit insanity. They actually called up the support guy at company X and chewed him out. The gist of their conversation was "what, why would you tell the customer to do that?". Finally, someone listened.

The vendor finally took the board back, and that was the last I heard of it. As for my project? I got the lab people to check out a second board to me, and I carried on using it. The second one didn't suffer from infant mortality, and so I was able to get my actual work done. What a concept, right?