What Is Software Quality?

Andrew Wulf

31 Mar 2025 • 18 min read

Everyone wants the software they work on to produce quality products, but what does that mean? In addition, how do you know when you have it?

This is the longest single blog post I have ever written.

I spent four decades writing software used by people (most of the server code I worked on also included clients) and leading teams responsible for building desktop, web, and mobile apps. I released my first application to the public in January 1987, so I have had a lot of time to think about how to deliver something that works, try all the ways to achieve it, and see the result.

After all that time, my favorite definition of software quality is "The code always does what it is supposed to do, never does what it is not supposed to do, you are never surprised by what it does when it gets to the customer, and you can track the quality long after the customer is using it."

That definition has four essential parts, the third of which is probably not apparent to many people. I will review each part.

Note that some of this targets those responsible for the entire project or product. If you are working on tickets assigned to you by someone else, you might not have any involvement in quality beyond the actual code or tests you are writing. For most of my career, I led or at least contributed to the quality of an entire codebase.

Having started in the 1980s, when no one knew what good practices were for building software, I had plenty of time to learn and try different things. Starting and running two small software companies from 1985 to 1994 (both as leader and programmer on my teams) gave me a head start on those who began as programmers or leaders in the following decades. We had none of the knowledge, tools, or processes that exist today, yet teams, including mine, could still ship software that worked.

What Should It Do?

Understanding what your code-based product or project is supposed to do seems like a no-brainer. Yet, it can be surprisingly hard to know for sure—people change their minds about what should be in or out, schedules change, teams change members, and other teams you need to work with have different priorities than you. The stereotype of knowing what you are supposed to do up front, which won't change (a waterfall concept), is a rare phenomenon if it ever happens. A sprint being planned and then totally completed by the end is rare enough. Imagine how unlikely a project going for 20 or more sprints will nail it perfectly. My team did two major projects at my last employer, each with 16 months' development time, yet the end product bore little resemblance to the initially proposed product. So, knowing what your code should do can be a very dynamic thing.

This means that as a leader, you have to understand what the product team is asking today, imagine what might be asked tomorrow, deal with details that may only be decided deep into the schedule, adjust expectations regarding schedules, offer alternatives, or propose a different path, go to way the hell too many meetings, and somehow keep all of this in your head (and what tools you may have). To have product teams care to communicate with you, you have to understand the business needs, be able to ask important questions at the right time, and always stay in contact. You can't simply be handed some user stories and implement them as written without continuous clarifications, explanations, or even pointing out missing items. If a product team does not care to communicate at all (I've seen this in prior years; thankfully, in my last (very large) employer, I always had reasonable access to them, and they tended to appreciate the communication), then achieving quality will be difficult or impossible.

Another consideration has to be for the future, imagining what might change after you ship. At the same time, you can never know for sure and should never try to speculate in code on eventual changes, but you also need to ensure you don't lock yourself into a corner you can't get out of later. That is a tricky balancing act, but it gets better with experience. Sometimes, you know the product well, understand the business, and realize who will likely request radical changes. Sometimes, you have no idea what direction might be taken. In my final job, I had a decent idea of what might happen and was able to make sure we were flexible enough to deal with it. Still, in one case, a giant significant addition that came out of nowhere to an almost completed project caused me to have to work 7 days a week for 3 months because we were caught with insufficient people to do it (namely, just me!), and no additional budget was allocated.

Another concern that affects client-side applications is that the user interface and user experience must be well-exercised. While you might think understanding the UI is not essential to the programming team, I always found knowing why the design team chose what they designed helpful in asking relevant questions so that I knew how we should write the code; sometimes, you can find easier ways to meet the requirements with minor changes—again, understanding "why" can help with producing a better implementation. For example, in one project, the design team wanted certain shadows they had done in Photoshop that were impossible in an iOS app without extensive development. Explaining how they could accomplish something more straightforward that we could implement quickly helped them understand how to keep development time lower.

I always appreciated it when our QA team complained about the difficulty of certain operations, as they were performed all day. Feedback like that can help make implementation easier, assuming your design team will listen.

What Should It Not Do?

Assuming you know what the code is supposed to do, the next challenge is making sure it only does that and never does anything wrong. I consider wrong as being unintended behavior (you did not realize that it could do that), incorrect behavior (it doesn't follow the requirements), and poor code quality (random failures).

Some incorrect functionality might not make the code unusable but only be annoying or look bad. Depending on how important it is to your customers, you can live with that type of problem. Odd behavior in a game might not be as crucial as in a banking application.

Ensuring your code does what it is supposed to do is only half of the pursuit of quality. The other half is understanding and planning for anything that could go wrong in your customer's hands. Identifying all failure modes requires imagination, experience, and persistence, and only then can you truly test your code. Writing or planning tests, for example, that only assert positive behavior leaves your quality on shaky grounds. Murphy's law (anything that can go wrong will go wrong) can happen to anyone. I learned this lesson the hard way back in January 1987 when my company first showed Trapeze at Macworld in San Francisco.

During development and testing, we always ran Macsbug (Apple's debugger) to catch any crashes and debug them on the spot. At Macworld, we rented some Macs but did not install Macsbug. During demos, there was a decent statistical chance it would crash. It was highly embarrassing. Later, I realized that the presence of the debugger altered low memory (recall that MacOS back then had no memory protection or virtual memory) and was hiding a nil pointer error. I failed to understand what could go wrong, which made me realize how vital knowing failure modes could be.

A later example from my final job is even better. Another team wrote code to convert some service data to local objects in the mobile app. They wrote unit tests (positive) and had a code review (all LGTM comments). It worked fine for several releases, but only until the service changed its API. The service was put into Stage, and no crashes were seen. However, once the app update appeared in the App Store, it became apparent that something terrible was happening. The app crash rate spiked to 50% multiple times daily, causing panic in all the mobile teams. No one understood why.

Eventually, several of us tracked down the problem: the service had changed its API, and no one noticed. It worked in Stage because the service data was only sent based on real-world activities, which did not happen there. The unit tests only used the API output when the code was written, which was a positive test. The programmer had "handled" any service data issue by calling FatalError(), the only way to crash an iOS app deliberately. He defended the code by saying, "It would just have crashed somewhere else." Code review had seen nothing (the Sgt Schultz form of review). Unit tests had not exposed his lack of error handling. For no apparent reason, his team continued to employ him.

Your application will be low-quality if you don't understand and plan for all failure modes. You can't write or perform tests of any type (unit, automated, manual, etc.) if all you are doing is validating that the code is performing the required tasks. Once the application (whatever it is) is in the hands of your customers or users, none of your testing matters, only the defenses you put into the codebase that will stop or at least minimize any problems.

You must minimize problems such as incorrect or missing data from servers (never trust a server!), random user behaviors (people are devious and do not always do what you expect), and incorrect responses from internal or external APIs. You may suffer from errors introduced by OS releases, open-source library changes, or other environment changes. You might have to use your imagination for some, but you can plan for others. This is a lot of work, and I've seen many people not realize how much. Poor management often only cares about shipping features.

One practice I've always followed is to use the programming language features and architecture to ensure that writing correct code that minimizes failure modes is easy and avoids requiring too many ad hoc random solutions. It's easier to prevent problems upfront than to fix them later. In my last job, using Swift correctly made this easy. Some languages, such as Javascript, may have fewer features you can exploit.

I wrote all the service calls and data management in our iOS project (part of the division's primary iOS app) for the final and most extensive project I did in that last job. Before I started, I built a protocol-oriented service call stack that abstracted all standard functionality, leaving only a tiny stub for each service. Each service was handled in two phases: the first freely converted whatever came from the service into a generic intermediate structure, and the second took that structure, converting it into data objects with strict adherence to the agreed-upon service contract. I eventually had to support nearly 70 calls (the crazy project changed repeatedly over 16 months). The code never crashed either in QA or after it was released, despite serving almost 100,000 people daily. That should not be an extraordinary event; that's what you should expect.

Another idea is to question all assumptions. Are they always reliable? Is it possible for something unexpected to occur? If so, can you ensure that there is something in your code that can at least minimize an unforeseen problem? This can be tricky since it involves questioning things others might think are not worth considering.

I remember being bitten by this in some code we inherited when I started at the Travel company in the early 2010s. A third party had written our iPad app before I started. The code was quite odd, and we were ordered to make it shippable quickly. After it was released, we found a bizarre bug reported by a customer, wherein if they searched for van rentals in Branson, Missouri, the app would crash.

After figuring out how to reproduce the crash, I found an EBCDIC character in the description of one van, even though the service returned UTF-8 encoded JSON data. Somehow, this incorrectly encoded character sequence was sent to an app expecting JSON data. I traced the code the third party had written and found it called a function in iOS (Objective-C back then) to convert the JSON payload to a string. Because the data was not properly encoded, it returned a NULL. Eventually, that NULL was passed to a C function, which crashed. The programmer had assumed that JSON would always be properly encoded (which it should be) but made no accommodations in case it wasn't. In this case, the data for each van was entered into an IBM system using EBCDIC and supposedly converted to XML and then converted to JSON. Most people would likely have assumed that would not fail.

Fixing it was reasonably straightforward. If the function returned NULL, I called another function that "fixed" the data (by removing incorrectly encoded characters), reran the conversion, and, if it still failed, made it an error.

Something I started in the late 80s, and also made sure QA did as well, was to run through the entire application every single day. I tried to run the app as best as possible, as a customer would. I did not just test new functionality, follow user stories, or use some script. Doing this every day, even if it only takes 15 minutes, can help find a lot of issues as you get very familiar with the application as a whole and see things break or change from one day to the next. It seems like a simple idea, but it always allowed me (and others) to discover and deal with issues immediately because we could easily compare each day's state. If you only look at new things, you will likely miss problems elsewhere.

This sounds like a lot of effort, but it is necessary to make your product robust. Quality does not come quickly or cheaply.

Why You Should Not Fear Shipping

The third part of this definition is the last step. Imagine an escalator: the steps rise and take you to the top. At the top, the steps flatten and deposit you on the next floor. The same should happen with your development process. The easiest, most straightforward step should be to go from the end of testing to delivery (deployment to production, app store uploads, etc.). At the Travel company, I usually did the final build (we didn't have CI/CD back then), and after a final test, uploaded it to the App Store. Often, I let someone else push the button. I never had any concerns, as I knew everything would be fine, and it generally was.

So when should you ship?

One sign for me is that your testing (QA, test suites, etc.) is not finding anything wrong. This presumes you have a robust process that can be repeated as necessary. Any serious issue is a warning sign that you are not ready. I always wanted my QA team to have to work hard to find anything wrong, even in the middle of active development. Having QA find crashes or test suites with serious failures means you are not working hard enough to ensure quality is a continuous process. That's why I never wanted my QA team or process to find problems; I wanted any failures to be found and dealt with long before they even saw them as much as possible. Keeping testing clean during development means you are less likely to have late problems and more likely that your shipping process will be smooth. Waiting until the end of development to deal with serious issues is a bad sign that quality will likely be low.

If you test the whole app daily, please deal with issues immediately. Ensure that testing only validates what you already know to work and tests all of the potential problem solutions you built into the code. That final bit of testing will be easy, and when to ship will become more apparent.

As I pointed out, sometimes, things you can't do anything about can still happen. Assuming you did everything I pointed out to minimize or eliminate all sources of problems you can control, you can still have issues with things you can't control.

We added a library to scan credit cards for a brand new app we released at the Travel company; we were likely the first to ship this feature. The third-party company provided us with the production binary, which we extensively tested, and everything was fine. A few days after the release, customers complained that the app would tell them the demo had expired. The third party had given us the production binary with intact time-locked demo functionality! So, we got the correct library, tested it, and uploaded a new version to the App Store the same day. There was no way to detect that it was a demo library. That third party didn't last very long as a business.

In another case, Apple replaced the old Google Maps support with its MapKit. We had a hotel detail page in the iPad app with a tiny map showing the location. After an iOS update, I noticed that the iPad app suddenly started crashing on occasion for some customers. I tracked it down to the map, which was previously a Google map and was now a MapKit map. I discovered that Apple's new servers sometimes delayed sending a map tile that only arrived after the page was closed; they had failed to retain the map object, so it was deleted and then updated, leading to a crash. I had to put all the maps onto a list for a few minutes before allowing them to be deleted.

This type of unexpected error usually doesn't happen immediately, but you must be aware enough to notice it. I will cover that in this final section.

You Are Not Done When You Ship

Just because you shipped your code does not mean you are finished and have nothing more to do.

Monitor every source of information available about your customers' experience. At my last job, I read the App Store reviews and the crash reporting tool daily. I frequently reviewed our defect list. I talked with my fellow employees, who were also customers. I kept asking for access to analytics, but no one would approve it. I listened to customer support complaints. All of this monitoring aimed to validate my assumptions about testing.

When I started my final job, I asked for access to the crash reporting tool (my team and I did iOS development on the largest single piece of our two biggest iOS apps). After I received it, I looked at the audit page that showed who was looking at the reports and found that almost no one had ever looked. Over the following four years, I reported on what I saw, particularly right after any app release (two apps, two platforms) on the shipping Slack channel. I monitored every crash, not just mine (the only meaningful ones we had were in the Objective-C codebase I inherited). I also trained non-programmers on how to understand what the reports meant.

This was not my job; I just cared enough and managed to keep it politically neutral. When I started, the larger app's crash rate was around 1 per 100 sessions, which is mediocre. Over time, more and more people started looking and responding to crashes on various teams. By the time I retired, everyone was looking at them, and the app crash rate had been reduced to a very good 1 per 500 sessions. The final major project (in that app) that my team and I built had a crash rate of 1 per 100,000 sessions, with about 80% of the sessions in our code. The pandemic killed our business for a long time, and a new project (that I started but it shipped long after I retired) took its place.

Here are a few examples of how paying attention after shipping can pay off.

The Sync Process

Around the time of iOS 11's release, I noticed the App Store suddenly had many 1-star reviews daily. Our app did not usually get any on most days; now, I was seeing 25 or so per day. The customers were saying they could not launch our app at all. I reported this, but people ignored me, and a senior executive said they were “people venting,” and the crash reports showed nothing new.

This went on for two months. Suddenly, someone mailed our CEO complaining they could not launch the app, and execs set up a big war room to figure out what was happening. After some folks discovered the crash reporting tool company had been incorrectly filtering out 99% of the crashes, which was fixed, the lack of new crashes associated with this problem initially confused everyone.

Testing on various iPhones finally showed that anyone on an iPhone 6 could not launch the app after not using it for a few days. It took an average of 6-8 launches before the app would run. Before that, it would sit there for 10 seconds and then appear to crash. Newer iPhones had fewer issues but still took long to launch.

The launch method (the method called by iOS when your app launches) was syncing the local app database to a backend CMS server. For no apparent reason, an enormous download would appear every few days. I remember looking at a sample; it lasted forever and mainly consisted of redundant changes. Many entries had hundreds of changes, primarily deletions, additions, and name changes, repeatedly, almost endlessly.

iOS has a watchdog process that will shut down any app that takes longer than 10 seconds in the launch method.

The source for the launch method included a comment mentioning that nothing that takes a long time should be called in it!

The responsible team changed the sync to the first tick after launch, but the app took a minute or two to become functional on the older iPhones. After some analysis of the CMS server, it appeared that someone had added multiple writers (the data came from many internal servers), but the CMS did not support that configuration. Each writer competed with the others, causing massive updates to be generated as each fought the others. The solution was to reduce it back to a single writer. I never found out why the multiple-writer change happened.

After this, I noticed more people paid attention to my reports.

The Impossible Bug

A second example comes from my team. In the smaller of the two apps was a large module written by another programmer and me (and the second 8 months of the project by me alone). While there were a few crashes in our code (primarily outside of the top 20 crashes), one rare crash happened a few times every day, and it made no sense. It wasn't all that common, so ordinarily, it wouldn't be worth studying and challenging to reproduce (3-4 per day out of some 700,000 sessions!), but it bugged me.

It happened at the end of our flow, and the crash reflected no root ViewController, which made no sense. In iOS, there is only one window and one common way to get the ViewController stack. So, how did the flow work to get to a point where that root ViewController vanished?

Months passed, and it still bugged me until one of my QA team members came to me with a strange feature she had never seen before. Usually, QA uses a tool that configures users for testing. That day, the tool was down, and she had to make users in the app like a customer would. Midway through the flow, a modal requested that the customer choose security questions. She had never seen this before, which was not part of our code. As I watched her reproduce it, I saw it put up a little UIAlertView, which hung around for a few seconds. Boom! That was the issue.

In iOS, a UIAlertView creates another UIWindow. By having it stick around precisely when I manipulated the UIViewController stack, the topmost UIWindow did not have the root controller. This only happens when triggered at the end of the flow.

OK, so maybe it’s not a big deal, but it proves that you need to know everything that can affect your code at runtime. In this case, the team responsible had not told anyone about this feature. It was easy to fix (find the bottommost UIWindow), and maybe I should not have assumed a random alert would not be stuck over my code!

The following two examples are from my Travel company job, around 2010 or thereabouts.

The Exceptional Names

I was on the mobile team, and we consumed APIs generated by the web team. The web team had 10-15 times more programmers than we did, and it took months to deliver updates. At one point, they decided to install IBM Tealeaf, allowing them to see what a web page looked like from a customer’s viewpoint. To their horror, it showed that anyone booking a hotel (I think hotel, but it might have been flights) reservation with particular punctuation or non-ASCII characters in the name would get a nasty exception on the booking page. They looked at the server logs and found that this exception appeared about 1% of the time, going back for a long time. They had been losing 1% of revenue due to this error! If they had been mining the logs for problems, it could have been dealt with long before it became a loss of revenue. Always watch your logs for errors; never wait until a customer complains or you randomly find it years late.

New York City Vanishes

At one point, I noticed people complaining in the App Store reviews for our iPad app that searching for a location in our hotel booking flow would return “No Results Found,” even for obvious locations like New York City.

I tried it myself, and it worked, so I set up a loop in a command-line app, calling the API service we used from the backend. Running the New York City search all day eventually hit some empty responses.

We talked with the backend team that managed that API. They had multiple servers with a load balancer in front and a health check to ensure a server was running. The data would be updated a few times daily, and the servers would be restarted, loading the data into memory. Once I convinced them there was a problem, they realized that the data loading had some bug where it would occasionally fail to load anything. The health check only tested if the server was running, not that it was returning results. They decided not to fix the bug but changed the health check and restarted any failed servers. This problem also existed in the web app, but there was no way to report such a failure. It was found only by checking the App Store reviews.

Final Word

The point of these examples is to show that quality requires attention to everything, even after shipping! No matter how thorough your development is, you should continuously monitor what happens at the customer level. This reinforces and validates whatever you did before shipping and communicates what might need to be changed in your approach to quality.

Naturally, your employer might not care about quality or be unwilling to pay for it. I have seen that in many places and have often experienced it on the web or using apps. You might be able to convince them otherwise or have them ignore you. I was able to get people to pay attention to the crash rate of our apps despite that not being a priority when I started, but you can’t always make that sort of subversive act work. I never needed another job, but if I had interviewed for a mobile job elsewhere, I would have asked about the crash rate. If they don’t know, they probably don’t care. Perhaps another company might be a better choice.

I hope this long post gives you some useful ideas. Quality is not easy, quick, or cheap. It requires discipline, imagination, thoroughness, and continuity. It’s easy to lose and difficult to regain. In the long run, your customers greatly appreciate something that works and never gets in their way. As long as you view quality as necessary, it doesn’t have to be an impossible goal; you must make it part of your DNA. That embarrassing crash while giving demos back in 1987 paid off in the long run!