Home About The Codist RSS Feed

Project Success, Project Failure, How Do You Tell the Difference?
Mar 19, 2007 20:45 perm link Readers: 772

If you believe the various statistics, 70 or 80 percent of all software projects fail or are cancelled (for example, a little old but still relevant). Reading this kind of thing always makes me wonder what the definition of failure, and thus success, is?

One definition could be, any project which is not cancelled, and is in active use by someone, must therefore be a success. It's not very satisfying; it's like defining life as the absence of death. If I am technically alive, but my brain function is zero, it's not a very useful existence. I have seen many projects which were developed, put into production, used by a large number of people and actively updated, yet strategically speaking are destroying the company's future. If the software is buggy and unstable, is difficult to update based on the needs of the business, hurts customer retention and new business by its inability to function properly, and wastes large amounts of employee work time, then saying it's a success has a hollow feeling.

Yet in most companies today, there live many software systems which exhibit exactly this lame definition of success. Almost everyone I know who uses a computer system as part of their daily work (and are not developers themselves) tell me stories of software that just doesn't work. Workarounds, frustration, inability to get work done, poor interfaces and frequent downtime are common complaints, despite the often mission-critical nature of many of these systems. Every such application had managers, developers, architects, testers and above all decision makers involved in the process, yet the result was still crap. I bet too that many of these systems were at some point considered a "success" as they were rolled out to customers. But are they a success, simply because they are in use?

No, and they shouldn't be. In many places, however, it may be the only type of success you can get; the organization may simply be incapable of doing any better. What an ugly truth! You (of course not me we all say) write crap because you can't do any better.

Many projects are cancelled long before they reach real users. Yet are these to be considered failures? In most cases canceling a disaster before it happens is actually a good thing if it leads to a better system. Sometimes projects are cancelled or never even started due to poor decision making, fear, expense, or other reasons not related to the actual development. This of course muddies the definition of failure and success. Is canceling a bad project a success or finishing a pile of crap a failure? It makes your brain hurt.

A better definition might be, any project which is not cancelled, is actively in use, continually meets the needs and strategic goals of the organization with frequent updates, maintains a high degree of availability and stability, and no one complains about it, must be a success.

Sadly it's tough to find applications like this in most companies, so success might be an elusive quest. When you really look at what software is supposed to do, this definition makes sense. I need software to function to do my job, my company needs me to do my job, and the company's customers needs my company's services or products. If these goals are being met (and are able to change with the market) then there is a measure of success. Of course a lot of software is not strategically necessary for a company's business but ultimately if the business is affected even subtly then success still matters.

Writing successful software is hard work, but sometimes working on live failures may be even more difficult. One of the most difficult decisions to make is when to toss out the old and start over. Do you stick with the crap you know, or risk developing new crap you don't know as well? My philosophy has always been "when in doubt, throw it out" but that isn't so easy in larger systems, where the risk of a rewrite may be more expensive than dealing with the existing failure.

As an example, in 1993 we were finishing up our 5th major update to Deltagraph. We approached the publisher about starting on a rewrite in C++, as the original source was then 5 years old and written in C (with some object extensions that embarrass me now). With the company (at the time, Deltapoint) looking towards other products the decision was made to stick with the code, and we parted company at that point. Now 13 years later the current owner is still saddled with the same codebase, which makes updating (especially the Windows version) a living hell. By now the code has seen 100 engineers and 18 years of development, so I can only imagine how awful it is to work with. But in this case it is still a success, as it has provided customers with a superior product, and the companies that owned it (Deltapoint, SPSS and Red Rock) with a lot of revenue over the years. Would it be as successful if we had rewritten it (or for that matter anyone since then) in a more modern language?

So even my example is a muddy definition of success, the code sucks, its old, everyone hates to work on it (I suppose) but it still meets the needs of the company and the customers.

It looks like even defining success is a failure. The nature of our work as programmers is surely an odd way to make a living.

My Tags:

  • Jeff Atwood: Mar 25, 2007 02:23

    Super Pro Tip: most of the time, there is no difference.

  • vivian : Apr 09, 2007 23:36

    Are you trying to add live chat to your web site so you can have sales and support chats with your visitors?

    It works by adding our chat button html to your site. Your visitors click on your chat button. You answer the chats by logging into our operator client. You can have multiple chats with different visitors at the same time -- each chat is a private chat between you and each visitor.

    website: www.53kf.com/en/index.html

  • Add Comment

How to Screw up Choosing a Vendor:First Form a Committee...
Mar 12, 2007 19:33 perm link Readers: 1929

Although many of my disaster stories involved places I've worked, this one takes place in the city I live in, Arlington, Texas.

The Story

Arlington is best known as the largest city in the US with no public transportation, and home of one of the most dense retail areas in the country. It has 360,000 residents and is around the 50th largest city in the US. Its police write around 200,000 tickets a year, all handled in a municipal court system.

In 2000, the city looked to replace an aging mainframe based court management system that was no longer capable of handling the volume and lacked many needed functions.

So the city formed a committee from various departments to help select a new software vendor of a municipal court system. They then hired a consulting firm to help select a new vendor, and collected bids from 1.3 to 2.6 million dollars, from which the firm suggested two bids.

That's the last thing that went right.

The committee decided that the bids were too high and changed the requirements, collected three new bids; one way less than the others at only $300,000 (the nest lowest bid was $761,000). They jumped at the "deal" and negotiated with the company. That the company's software had never handled a city as large as Arlington, had poor comments from its other customers, and that the current version was not actually in use anywhere else, didn't seem to matter. The city council was told none of the negatives (and didn't ask for any) and thus voted for the contract. One last thing: the city had to pay the costs up front. Nice deal if you can get it.

First the go-live date was a year behind schedule. Once it was running users started reporting the software didn't or couldn't manage simple tasks it was supposed to do, like court scheduling and processing bonds. It crashed a lot and ran sluggishly. In the first year of use the city lost money despite record ticket volume as the software made collections difficult. Soon the court system was more than 100,000 cases behind. The company claimed that the city wasn't installing the frequent patches, but the city found patches were making the software even more unstable.

Of course the city never hired anyone to be a project manager either. Amazing how people think IT projects can run themselves. During the time the software was in use it totaled nearly $1.7 in actual costs, not including city employee time and money, nor the lost revenues. A wee bit over the original bid.

Last year the city finally gave up, hired another consulting firm to analyze the software and see if it was fixable. It wasn't. The city then hired the consulting firm to oversee choosing a new vendor, which is ongoing.

There wasn't any mention of a new committee.

The Lessons

I have never understood why people think the lowest bidder is a valid way to pick a vendor. If you have at least 3 vendors, tell everyone you will be picking the second lowest vendor. This way there is an incentive to be realistic in the bids, since lowballing will likely cost a vendor the contract. Sometimes this isn't possible due for legal reasons.

Another thing with bidding is to keep the bid price from the decision makers until all bidders have been examined for all other details, such as investigating other customers, trial uses, documentation, or other important information has been analyzed. The whole point is to avoid prematurely choosing a vendor strictly on price. Governments are suckers for low bids, since it looks good politically to "save voter's money". Rarely does the final price ever get publicized. That's why I would prefer to see the second lowest bid bidding process.

The entity hiring the software vendor should be prepared to have a project manager (either external or internal) be involved from the earliest possible moment until the software is fully in production and enough time has elapsed to see that it is functional and usable. Ideally this individual (or group is necessary) should have no connection to the vendor but have valid experience and be given sufficient authority to ensure the entity is getting what they paid for.

All software has bugs (unless you are NASA, then your software has no bugs but your hardware blows up). The software should be able to do all of the required and promised functionality, do it reliably, be relatively usable by the actual users, and require only a minimum of patches. The vendor should have a sufficiently large QA department and be able to demonstrate some kind of repeatable test plan for each patch. It's amazing how many companies I have seen which develop software (either for internal use or sale) with no QA team at all. My two software companies both had around 1 QA person for 4 programmers. Netscape in the 90's hired 100 programmers before they hired a single QA engineer. Never buy from a vendor who's QA department is their customers.

I read an article this weekend where a local columnist wondered if it was possible to go back to a paper-based system like "the old days". This is sad commentary on how easy it is for software projects to make people wish for how nice it was when armies of clerks pushed paper around. With some common sense, good engineering, good management and smart people it is possible to improve business or government with software solutions.

Sadly in most places "common sense, good engineering, good management and smart people" are rarer than bugs in NASA's mission critical software. Witness the continual disasters at the FBI.

Maybe if we all formed a committee...

My Tags:

  • sanjarUzbekistan: Mar 12, 2007 21:56

    It is worth going for the lowest bid price if you are procuring some standard good or service such as annual audit, accounting services or plain computer equipment, because you know for sure that you will get acceptable quality from any of bidders. But then this is called request for quotation not a bidding.

    In the above case, I'd have requested potential bidders to submit technical and financial proposals in separate envelopes. The technical proposals are opened first and evaluated. After the most appropriate technical solution has been determined, the evaluation committee opens the envelope with financial proposal. Unless the stated bid price substantially exceeds the budget, the bid is deemed accepted and all other financial proposals are returned to bidders unopened.

  • Add Comment

WTF Stories #2: Here Little Virus, Virus
Feb 22, 2007 08:14 perm link Readers: 16319

What could be worse than a computer virus infection? How about a mother-of-all-virus-checkers infection? Note that this story is true despite its unbelievable details.

The day known as Black Wednesday started innocently enough, 600 employee's PCs humming away, running Mcafee version 7 which although occasionally irritating was not causing much trouble. The company had had no virus infections since I started working there so our internal security seemed to be working. Then at 12 noon our dear beloved Network Operations group (please note the sarcasm intended) without any warning turned on the Windows XP automatic update and simultaneously auto-updated Mcafee to version 8 for all non-production computers and servers.

The whole company ground to a halt.

Every PC tried to collect and install around 70 updates to Windows (requiring multiple restarts) and install the new version of Mcafee (with all of its various parts) all at the same time. The network was overwhelmed, even our external web presence stopped working (our internal and external network traffic shared the same bottlenecks, don't ask). The lunch people were unable to supply lunch (their POS PC froze). Customers couldn't be served, our field staff was unable to create orders, basically the whole company went dormant for the rest of the day.

The rest of the Java team and I went home that evening assuming things would work themselves out by the next day. We were wrong; the hell of the coming months started the next day.

Our Java team used the usual Java tools such as Eclipse, IntelliJ, Weblogic Workshop, DBVisualizer, and Weblogic. In Java, every piece of code (and often associated data) is packaged in JAR, WAR and EAR files, which are specialized versions of ZIP files. Everything you do in Java involves reading and writing these files, both when running Java applications and building applications. In all cases these files are read by the Sun Java runtime, and executable code is limited to valid and secure Java bytecode, not native code.

The Network Operations folks had turned on every feature in the new Mcafee, including the dreadful "Uncompress compressed files" setting. All of our Java development was stopped virtually dead in its tracks. Every access of a JAR, WAR or EAR file now resulted in the computer to freeze as Mcafee opened the file, uncompressed its contents, scanned each file inside, and then only returned control to the application once it was complete. During this time the CPU was entirely utilized by Mcafee, which was set to high priority. Launching your IDE usually took a few seconds, now it took 20 minutes. Some application builds took hours (especially with the weblogic.jar file, which was enormous) instead of minutes. Just typing in your IDE was type type type, wait 2 minutes, type type type. It turned your PC into a single-tasking computer circa 1979 running the full Windows XP.

At first we though it was simply a configuration error, and reported it as such. Network Operations said it was working as intended. We had to blink, this made no sense. Every Java developer was getting about 1 hour of work done per day at best. Being engineers we figured out what feature of Mcafee was causing the trouble. We couldn't turn it off, it was locked. We complained to our management, but Network Operations was more highly regarded in the company pecking order, and they simply said we were whining and the company's safety versus virus attacks (even though we had never had one before) was more important. We argued that native code viruses hiding in Java JAR files couldn't be executed by the Sun runtime and that there were no such viruses.

So they said (to everyone) we know of many Java viruses in the wild. Given that we had done an exhaustive search and found nothing, this was unbelievable so we demanded a meeting. The head of the NO department said (to a roomful of Java engineers) that he had a list of 500 Java viruses and that was the basis of why this was being done. No matter what we said he said the same thing over and over.

After the meeting I challenged him to turn over the list to us but he passed it off saying the head of Security was assembling the list. This guy never returned any messages (what could do say, he really worked for NO). Later one of the NO employees said that they had typed Java into a virus checker company search form and found 500 hits. Really. We looked and the only actual viruses were in the ancient Microsoft java runtime, which only ran in IE, and only for applets. Pointing this out to everyone made no difference. The NO had spoken.

This went on for about 3 months (I had just started a project that should have take a couple weeks and it dragged out over the whole 3 months) before we finally got our manager to suggest that we hire an "expert" to come in and suggest some kind of improvement. NO agreed but only if we paid for it, and they were allowed to pick the vendor. This was approved and so they hired an "expert" who turned out to be the vendor whose main customer was the NO group. His "analysis" said to turn off the compressed file option, but only for those employees who would run their PCs as ordinary users and give up their administration rights. Say what?

For those who don't do development on Windows XP, everyone runs as admin since ordinary users don't have enough rights to make any changes in their environment. It can be (painfully) made to work for simple Word users, but for programmers it's impossible. So now we had paid for the "expert" and it would look like we were just whiners who didn't want to get any work done if we didn't acquiesce to this. So we had no choice. First they wanted to test the choice so a few Java developers (and the QA team) lost their admin privileges.

Let me also add that this was not just affecting the Java team, though much worse. Our DBAs used DBVisualizer which is a Java application, and even they had trouble getting anything done. It didn't help that people like our "Enterprise Architect" said he didn't understand why we were whining, since it didn't make his computer any slower (he only used Word and Powerpoint). Some ordinary users reported virus infections as their computers were sluggish and stalled at times (no one ever told them about the changes) but were told it was nothing.

Upper management didn't seem to care or understand why the Java team had so much trouble getting any work done. After all, the production servers still ran as fast as usual (still running Mcafee version 7, yes our production servers ran virus checkers!) and their PCs seemed OK. The web systems folks were also in pain, as builds took all day (and often they turned off the virus checker, despite warnings of firing). Many of our test and QA systems were built with multiple copies of VMWare, thus they ran terrible slow as each virtualized system ran a copy of Mcafee 8.

Anyway the admin rights turned out to be a disaster as well, as we could no longer install software, even a Java runtime upgrade was impossible. The QA team would not even install our C++ applications to test them. Every install required filing a ticket, waiting for someone to get around and do it. The support staff was overworked now as well.

It didn't matter, viruses were everywhere, security was more important!

The "test period" went on for months as well, nearly a year later not everyone was even "upgraded" to the new option, which itself was terrible. However by that time I was gone, and a flood of other senior people left as well. Eventually the departure of so many valuable people was enough that upper management finally told the NO team to start caring.

Of course I no longer cared.

In this entire time not a single virus was ever caught anywhere in the entire software development area. I never heard of any computer actually spreading a virus anywhere in the company. And of course our production servers still ran Mcafee 7, so the argument that Mcafee 8 was necessary was itself silly.

The real irony of this whole story is that it's the same company in WTF #1:It's Not The Database Stupid.

My Tags:

  • Michael Chermside: Feb 22, 2007 09:29

    I'd just like to say that my company has the same stupid policy. They too insisted on scanning every jar file that was touched by any program. And it did indeed make it nearly impossible to get work done. The first solution was to buy *very* powerful (and expensive) PCs for all developers... but that didn't really help. The next solution was to switch to Windows XP (previously we were using Win2000). Apparently somehow Symantec (that's what we use) didn't behave the same way running on Windows XP. I believe it is still out there trying to scan every .jar file, but it doesn't take over the whole PC to do so.

    Unbelievable, but true.

  • Tristram Brelstaff: Feb 22, 2007 12:41

    I've often thought that Mcafee was one of the best arguments in favour of Linux.

  • Iain Delaney: Feb 22, 2007 14:44

    Please, please! submit this story, (or a version of it) to www.theDailyWTF.com. More people need to hear this.

  • Saskia: Feb 22, 2007 15:45

    Isn't it easier to just use Linux and do (Java) development on that? If you aren't allowed to install it directly, just install VMWare and run Linux with all your Java tooling inside that. Alternatively, boot from a live CD and keep all your settings and installs in a virtual HD file that you store on your windows filesystem.

  • codist: Feb 22, 2007 16:33

    That would have made sense, but that's not the way NO worked.

  • Scared..: Feb 22, 2007 16:54

    Wow. Can you PLEASE at least give us a HINT as to which company this is, so I never even consider working for them?

  • Matt: Feb 22, 2007 19:30

    I've been in similar shoes before re: Admin rights. At one company we were told they were going to use our department as the Guinea pigs to test no admin. Thankfully they gave us that warning... we immediately added a DB2ADMIN user (with Admin rights of course) to each developer machine. I was the only person actually using DB2 for testing, but the Network guys didn't need to know that. ;)

    Unfortunately I've been in other shops that had also blocked admin, and I could not fall back on my old tricks. In fact the machine I got was once used by someone who DID have admin, and had somehow secured just about every file under his user, causing me all sorts of pain. I had to make a request every time I found a file that was secured improperly... it was weeks before I had things set right.

  • GUI Junkie: Feb 22, 2007 23:31

    Same thing happened here. We were using Eclipse and the whole build cycle came to a halt. Somebody told me Eclipse uses some overflow mechanism to optimize the compile process which triggers the antivirus to check.

    We tried to convince the IT department to change the virus settings. Very similar to your story: ‘No problem here’ reactions.

    Up until this day, I go to the Services, select McAfee and stop the virus checker. I repeat this every hour as the virus checker starts again, and again...

  • Nuri: Feb 23, 2007 06:48

    GUI Junkie - you could write a service of your own that watches McAfee and shuts that service down if it's ever active. You can even have this new McAfee-watcher run using your credentials.

  • martin: Feb 23, 2007 08:49

    I dont believe that a bunch of developers with local adminstrator rights are not able to get rid of something like mcaffee.

    Stop the service

    Change registry settings

    Fake network traffic

    You have so much time to find a creative solution.

    *shaking head*

  • codist: Feb 23, 2007 09:28

    Believe me we tried every creative solution. We wrote services, we wrote applications to drop the priority, etc. Everything worked for while before they caught on and deleted the applications remotely or threatened us with firing. Every action on our PCs were logged. We had no rights to terminate the services involved. 12 people devoted a lot of time spent watching the computer spin to finding a way around this. Once we lost admin rights there was no hope at all.

  • Masiosare: Feb 23, 2007 13:27

    For other people with the same situation... you can change file permissions so not even McAfee can read it, so it can disable the antivirus even if the interface says otherwise.

    Stealth enough so you don't have to run new processes and don't get caught by admins...

    Just don't tell my network admins =P

  • FostWare: Feb 23, 2007 17:29

    Ask politely to have your machines VLANed and firewalled off from the rest of the network...

    While the admin for a NOC, I separated and firewalled each department and gave certain groups (like devs) a little more autonomy.

    I had to... the CSRs took laptops to mine sites for week long stints. Even after a quarantine virus scan, we could not guarantee the machine would not do something stupid on the network (Internet Connection Sharing or alternate AD domains were always favorites).

    Network segreggation was also necessary, as we had managed accounting systems for clients on servers in our NOC

  • codist: Feb 23, 2007 20:24

    Our network had no internal firewalls, although it could be partitioned in some fashion, it would then allow no access to any of the databases needed for development (such as the AS/400). Again, we went through every gyration you can think of but none worked.

  • AdrienMerridiah: Feb 25, 2007 10:57

    Wow. That's really crap. I feel sorry for you. That would drive me insane. I would have quit months earlier.

  • Fab: Mar 09, 2007 09:02

    Honestly I don't know what to think about that - I have been confronted to unreasonable IT policies and to sub competent people but at some point someone should be able to stand up and be a little bit confrontational. Something that upper management understand very well is cost, if your manager has no willing to plaid for your case go over his head and do the same think until you find someone that listen and understand you.

  • Add Comment

Software Project Disaster Types: #3 Sisyphus and #4 Ten-Foot-Pole
Nov 28, 2006 09:37 perm link Readers: 2385

Disaster #3: The Sisyphus Project

Description

Sisyphus was an ancient greek mythical character who was eternally condemned to roll a stone up a great hill, only to have it always roll back down so he had to start over. In this type of disaster the project can only be worked on with great effort, long hours, difficult work and usually massive stress and pressure from management. Once a version is delivered it begins again. These type of projects usually burn out developers leading to many new additions not familiar with the beast, so it actually gets more difficult over time.

Example

I've been fortunate to have never experienced this type of project, so I don't have any concrete examples. I do know that this is not an uncommon situation in many companies and there are usually two underlying causes: a lack of experienced technology leadership and management ignorant of the realities of software development.

Software development today is actually much more difficult than it was when I started 25 years ago; the number of technologies required to build and maintain a modern application is almost too much for anyone to comprehend, must less architect, design, test and ensure interoperability with other systems. Simply choosing a set of tools and technologies to use is a challenge since no one can have experience in everything (despite what you see in job ads). If the technology leaders (architects, project managers, project leads) have only limited experience with or even refuse to acknowledge better alternative choices then the base of an application may suffer for the rest of its lifetime (and the programmers with it). Discerning potential problems in technology choices before they become a real problem requires a willingness and time to experiment or prototype before embarking on a big project. Jumping into a large project with unknown technology is begging for a disaster.

Top-level management often has no real clue on how technology choices may affect the cost, time and performance of a large project, preferring to listen to sales pitches from vendors instead of their own technology leaders. Then when a project starts to have difficulty they may assume their team is at fault, as the vendor assured them "it was easy". Now they pressure the team to deliver and the ball starts rolling uphill.

One company I worked for embarked on a new portal strategy for its customer and sales force facing sites without any real idea why, other than a desire to join the crowd. Porting the customer site (already written in a mishmash of technologies and styles) to the portal took much longer than anyone predicted and created am even worse mashup of code which was difficult for anyone to properly test or sometimes simply get to run. The sales force application portal was always threatened but no one ever wanted to start it. See below.

Disaster #4: The Ten-Foot-Pole

Description

Some projects are so important and far-reaching, they simply cannot be started at the present time due to manpower shortages, business reasons, cost, or a surplus of cosmic ray particles. Actually any reason is sufficient to avoid having to do this project even though not doing the project may cause even greater problems. In most businesses it’s far easier to justify painful existing processes or systems than to step forward and tackle a replacement. Due to its enormous importance the project never goes away but remains just out of reach.

Example

The best example of this was a smaller local defense contractor which asked the consulting firm I worked for at the time to come in and see what we could do for them. Some of their managers discussed with us their problems in development for the web (this was late 90's) since no one had any experience with it. They knew many of their systems and applications were too inflexible and hard to use and wanted some (easy) was to fix them.

This all seemed reasonable to us in the early heady days of the web until we asked about their database systems. They mentioned they were still using databases designed in the early seventies and had coupled the applications directly to them. When asked why they hadn't updated the database system in nearly 25 years they replied that with a project-based accounting system, someone would have to pay for it out of their project budget, and then be held responsible for any problems it would cause (likely to be huge since it touched everything the company did). There was no management responsible for the company's entire technology environment so no one had enough authority to even look at the problem.

Given the tremendous coupling of the company's applications to the neolithic database system we found nothing we could do to help them and nothing was ever done. One can argue that "if it isn't broken, don't fix it" but often the problem is not that the current system is unusable, it's that it's inflexible and potentially unmaintainable. In this situation the company was unable to improve their productivity as the old applications were difficult to use, brittle, and especially hard to modify as business needs changed. So changing technologies was painful, but keeping the old technology was equally painful but at least familiar. The company is still in business but losing money (a rarity for an longtime defense contractor).

Solutions

Both of these project disasters are complex problems usually with the decision-makers in an organization. Unless you are one of them it's difficult to fix the problems from below. If you are one of them the best solutions are:

  • Educate Yourself - read unbiased reports, news and even technology sites to gain an understand of what people are doing, and how effective the solutions are
  • Never Believe Vendors - don't let vendors convince you of anything unless you can get independent verification. A consulting firm salesman I once worked with told me "His job was to lie to customers, and mine was to make him look good".
  • Hire Broadly Experienced Technology Leaders - find people who are willing to learn and use anything, have worked with many technologies, preferrably hands-on, and especially are willing to speak their mind.
  • My Tags:

Name:


Optional URL:


Comment:


Save Cancel

Copyright © 2007 By Andrew Wulf