Why Is Your Computer System Down?

June 23, 2009

Today I went to a local Texas Drivers License office to replace the license I lost at the airport recently (thanks, TSA). On arriving there was a sign, nicely laminated as it had obviously been well used, saying all the computers were down and they could do nothing for anyone.

Give me a break. It's embarrassing to admit you work in an industry where things fall apart at random times. Would you go to a doctor who said he would keep you alive 98% of the time?

Of course this is a "new" system which was promoted in April, and then was supposedly shut down with a virus or worm (articles seem to be light on the details) which caused 4 weeks delay in processing licenses.

I asked the information person if this happened frequently and she didn't say how often but apparently often enough to warrant the nearly permanent sign. Oddly enough certain offices in the state still had service but it seemed hit or miss.

I imagine it was built by some big firm at $500 per hour, who then promptly hired coders off the street at $20. When I worked for a small consulting firm we would be routinely clobbered by one of the big boys who would brag about having 100 people here tomorrow. Today I assume some of these big outfits do outsourcing to squeeze even more profit out of big projects, particularly state projects where the clueless agencies have no idea what a good project team looks like.

I haven't seen many details of what the "new" system looks like, but anything developed today shouldn't suffer from random fluctuating downtime on a statewide basis. Sure, even mighty Google and Amazon have their issues, but those are worldwide 24x7 systems; even in a state as large as Texas, these systems are only used at most 10 hours per day, 5.5 days per week, leaving plenty of time for testing and maintenance during off-peak hours.

There is no way of knowing if it's crappy software or poor systems architecture or crummy management or cheap hardware or what this was but it still reeks of incompetence. I remember a story of a California attempt to build a new drivers license system, after tens of millions in development costs it was rolled out to the whole state, at which point it became obvious that the architecture was non-scaleable and apparently unfixable as well.

Why do we tolerate such garbage in our industry? Programming by now is more than 50 years old, hardware is unbelievably faster and cheaper than ever and in business systems there are few unknowns anymore. Yet today numerous Texas Drivers License offices' computers were down, like some bad stereotype from a 1960's B-movie.

Tomorrow I will call first. Maybe I will get lucky and the computers won't go down the moment I step into the office.

Update: 24 hours later, they are still down.

Update: 48 hours later, still down, plus they were up for a couple hours yesterday, but when it went down they lost all the transactions.