Healthcare.gov and LSCITS – The Moderate Voice

We are all familiar with the failure of the ACA’s website – the politics won’t allow us not to be. But is this just an example of our future. About 12 years ago before I retired I was an engineer for a multinational manufacturing company. I was on the team to introduce and new Enterprise Software System. We spent well over a year testing and tweaking the system but when we went “live” it crashed forcing us to revert to the old system. It was another 6 months before we felt confident enough to go “live” again and another 6 months of tweaking before it was operating the way we wanted it to.

Felix Salmon wonders if software systems have become so complex that is impossible to predict when or how they will fail. He gives the example of Knight Capitol.

David Wilson has found a wonderful example in the SEC’s censure of Knight Capital. Knight blew up as a result of badly-designed computer systems, and the cascade of mistakes in this case was particularly egregious: it kept important deprecated code on its active servers, it didn’t double-check to ensure that new code was installed correctly, it had no real procedures to ensure that mistakes like this couldn’t happen, it had no ability to work out why something called the 33 Account was filling up with billions of dollars in random stocks, despite the fact that the account in question had a $2 million gross position limit, it seemingly had no controls in place to stop its computers from doing massive naked shorting in the market, and so on and so forth.

In the end, over the course of a 45-minute period, Knight bought $3.5 billion of 80 stocks, sold $3.15 billion of another 74 stocks, and ended up losing a total of $460 million.

Think swiss cheese:

That’s a lot of mistakes; nearly as many as can be seen in the Knight Capital case. But when you see a list this long, the first thing you should think about is Swiss cheese. Specifically, you should think about the Swiss cheese model of failure prevention, as posited by James Reason, of the University of Manchester:

“In the Swiss Cheese model, an organization’s defenses against failure are modeled as a series of barriers, represented as slices of cheese. The holes in the slices represent weaknesses in individual parts of the system and are continually varying in size and position across the slices. The system produces failures when a hole in each slice momentarily aligns, permitting (in Reason’s words) “a trajectory of accident opportunity”, so that a hazard passes through holes in all of the slices, leading to a failure.”

In other words, we should maybe be a little bit reassured that so many things needed to go wrong in order to produce a fail. The Swiss cheese model isn’t foolproof: sometimes those holes will indeed align. But a long list of failures like this is evidence of a reasonably thick stack of cheese slices. And in general, the thicker the stack, the less likely failure is going to be.

That brings us to LSCTIS:

The concerns expressed here about modern computer-based trading in the global financial markets are really just a detailed instance of a more general story: it seems likely, or at least plausible, that major advanced economies are becoming increasingly reliant on large-scale complex IT systems (LSCITS): the complexity of these LSCITS is increasing rapidly; their socio-economic criticality is also increasing rapidly; our ability to manage them, and to predict their failures before it is too late, may not be keeping up. That is, we may be becoming critically dependent on LSCITS that we simply do not understand and hence are simply not capable of managing.

So yes.we add additional layers of swiss cheese in an attempt to mitigate failure but the more complex systems become the more likely the holes in the cheese are to line up. This applies not to just government web sites and computerized trading but the so called “smart electrical grid.”

A couple of years ago I reviewed Joseph Tainter’s The Collapse Of Complex Societies. His conclusion was that complex societies collapse because they become too complex. I don’t think he had software in mind but perhaps he should have.