Software Project Disaster Types: #6 The Big Bang

December 02, 2006

The Big Bang is the mother of all software development disasters; in an attempt to fix or change everything, everywhere, all at once, an organization tries to revolutionize but almost always ends up as the Titanic did. The larger the IT project the more likely some or all of it will be a failure. In the sixty or so years of developing projects the industry still has not found any recipe for success in mega-projects.

Finding examples of large IT project disasters is really easy; almost every market abounds with stories of big projects that either were cancelled, or went way over cost, or failed to deliver useful solutions, or all three and more. Even if something is finally deployed it often fails to be the real 'big bang' result that was promised in the early days of the project.

A current example is the British national health care system, called the National Programme for Information Technology, which is attempting to completely change how every doctor, hospital and other medical providers manage, handle, schedule and report on every person in the U.K. This system has massive software and hardware development portions, and is being developed by a number of large and small contractors and sub-contractors. So far the cost has escalated from £2 billion over 3 years to nearly £30 billion over ten years. Some primary contractors have already given up their contracts as the project is simply too complex. The goal was exciting, big, bold and utterly unlikely to ever succeed. Like all Big Bang projects the vision always obscures the history of such projects failing.

My primary example is one I briefly experienced while working at Developer Support at Apple Computer in the mid-90's. Before MacOSX ever existed Apple needed a more modern operating system and after a number of failed approaches starting working on what became code-named Copland. What they were attempting to do was continue to allow System-7 (the current at the time Mac operating system) to co-exist with a brand new OS that Apple could base its future on. Even though old software could continue to run inside a box, the new system could not use the old APIs, thus new applications would have to be written from scratch. When I started at Apple the assumption was that Copland was still on track.

I remember the day I realized it was a complete disaster. The Copland team had called a meeting of all the development groups and their representatives, and I volunteered to go as a Developer Support team representative, just to catch what was happening as we would be called on to support it eventually, and there was almost no useful information available. During the meeting the project leaders talked about the state of the system and its various features and then took questions from all the development groups. One after another they hammered the leaders on every aspect of the design and every answer was greated with anger and derision. The quicktime team for example was not going to be able to get accurate time slices to deliver smooth video. The design of the OS was so limiting and so full of obvious problems that the meeting ended with the complete disgust of most of the people in the audience. We went back to our office area and told everyone it was a complete mess. In talking later with some of the developers it was obvious that no one was really able to manage the development of the code, much less the design. People were checking in code to the OS with no testing, based on incomplete knowledge of other systems they had to interact with, and subsequently what little was running rarely ever worked. Lots of cool technologies were being developed, but the core operating system was basically useless.

After I left (I was just a 5 month contractor) Ellen Hancock was hired to fix Copland, discovered it was basically unfixable, and made the suggestion that led Apple to buy Next, and then left. For someone who barely lasted a few months, her impact on Apple is still being felt today.

Copland was a classic 'Big Bang' in that it intended to fix everything that was wrong with Apple (while I was there Apple lost $750 Million dollars in a single quarter) but still maintain perfect compatibility with everything in System 7. Given that System 7 was our main support job (helping developers write their applications) we had access to the source code, and it was obvious just how many bugs, quirks, and downright weird code it contained; trying to emulate all of that and deliver a complete new OS and GUI seemed impossible to me (and all of us in DTS as well). Watching Microsoft struggle with Vista (which started out Big Bang like and ended up being more like NT with fancy lipstick) for years brought back those memories of Copland.

So how do you avoid the Big Bang disaster? The answer is not much different than the previous 5 disasters: smaller iterations. The key, at least from my point of view, is to keep the developer team as small as possible, and build a careful chosen core (kernal might be a good term) to demonstrate the base of the larger system, and to provide a foundation for the larger portions of the system. The more people are thrown together to start a huge development all at once, the more communications has to take place, the more management is necessary, the more detailed planning is required. As the number of people goes up, the likelyhood of success goes down even faster. Copland had at least 500 engineers working on a poorly designed, poorly managed system; no one really had any control or even idea what was going on.

Why did OSX work so well when Copland was a disaster? The core team was small, the kernal was relatively modern (at the time) and well-understood. It was built in a dynamic language (Objective-C) that lent itself to groups of small development teams. It used a core with a long history (BSD). It was a Big Bang in that it was part of what revolutionized Apple but not a Big Bang project, trying to fix everything from scratch. Oddly enough one of the main culprits in Copland, being able to reuse old API's and run old code on the new OS, turned out to be fairly easy on MacOSX.

The main lesson to get from this is that big projects always fail unless they are turned into carefully designed smaller projects which build upon one another, keep the core development team small and highly experienced, ensure that everyone knows what is going on at all times, and always keep the Big Bang in Astronomy and out of the IT lexicon.