What Your App's Crash Rate Can Tell You

When I shipped my first app 30 years ago, once we put it on a floppy and in the box and sent it away we had basically no idea what the ultimate customer experienced—unless they called us.

Today the options for knowing what the end user is experiencing for desktop and particularly mobile apps is amazing. Along with various analytics one of the most useful and objective stats you can use to tell how well you did and what your customers are seeing is the crash rate.

Crash Rate

The crash rate is typically calculated as a count of the number of app launches in a period divided by the number of crashes in the same period. Usually I look at 24 hours worth—our mobile apps are used 24x7 but practically 16 hours a day. There are multiple ways to look at a crash rate, as how many launches resulted in a crash (as in 1 in 100) or the percentage resulting in a crash (1%) or as the percentage without a crash (99%). Whichever makes more sense to you it is a consistent and comparable way at looking at crashes between time periods, versions or apps. Usually I like to use the 1 in 100 style.

When I first started at the travel company (now sadly just a brand of our former competitor), during the iOS 5 timeframe, the only crash reporting was in Apple's iTunes Connect, and it sucked and often failed to download anything. When Crashlytics first appeared (I think it was them) and I saw the first live reports from our iPad app, I was hooked. Since then I view crash reports almost every day, even for our apps I have nothing to do with.

Good, Bad and Average

So what is a good, bad, or average crash rate? A few years ago when Crittercism existed they released a report showing the average iOS and Android app's crashed at a 2% (or 1 in 50) rate. There hasn't been a comparable report since then, but due to improving tools and especially on iOS with ARC and Swift, I think it's likely more like 1.5% or lower. At my well known employer the alerts appear whenever the crash rate hits 1.5% for 15 minutes.

Now I don't consider the average rate even remotely acceptable. We typically ship 4 apps at once (2 apps for different business units on iOS and Android) about 7 or so times a year. Right now 3 of the apps are seeing around 0.3% (1 in 300) and one is terrible at 2.5% (1 in 40). Even though those are quite acceptable for most people I still view them as not good enough.

At the travel company we replaced our flagship iOS app with a brand new from scratch app that launched on iOS 7 launch day (Apple wanted us there). Despite the rather sketchy iOS betas the new app started off at 1 in 700 and when Apple fixed a load after free in MapKit, it improved to 1 in 1400 and stayed there pretty much the entire 8 months it existed, before the brand was sold and all of our tech was furloughed. To me that's much better.

The team I am on wrote a giant addition to on of our 4 apps recently, where we added basically 40% more code to the app after 16 months of effort, and so far this code, which is used by most of our customers though not all, contributes around 1 in 10,000 to the crash rate. That's basically 99.99% crash free users every day. The entire app had a decent 0.3% rate before we launched our part and that didn't change at all.

It's not always possible to hit such a low rate, and measuring the crash rate when you don't control the entire app can be hard to calculate. You always sit on top of other code, open and closed source libraries, the operating system, and devious users; so there will always be something that doesn't work, but clearly keeping your app nice and crash free is a big plus.

Before I came aboard the travel company flagship iOS app crashed so much (but of course before tracking) that we had 11,000 one star reviews! Yikes! We took 3 months to fix everything we could and I even trained our Java folks to do Objective-C so we could throw more people at it. Sometimes you have to take the time to clean things up. Afterwards our rating jumped a lot.

One Crash OK, More Bad

One thing I have learned about crashes is that mobile users will ignore a single crash. If your app crashes several times in a row so that they can't do what they wanted to, then they will get angry and leave nasty reviews. A single crash is assumed to be normal so people shrug it off.

A perfect example of that was earlier this year when one of our apps was released and because of some server change most of the people who were still running the previous version suddenly crashed repeatedly. They left lots of nasty reviews, which is something we really rarely get. Once they upgraded they were OK.

The next release went out with some horrible bug where depending on where you were in the app, and about 3 times a day when some data was synced in the app, the crash rate would spike up to 50% for 20 minutes or so! We were horrified (this happened in an app we have some code in, but rarely change, and the issue had nothing to do with us thankfully). Yet there were virtually no bad reviews, because the data was successfully in the app but updating the UI had crashed, so once you restarted it was fine. No customer knew since everyone assumed it was just a random crash for them.

Of course what you really want is to make crashes highly unlikely. I've been shipping apps for 30 years, mostly of them targeted at people (even the server side web work I've done always included the client as well) so there are things I know make things work better, just from long experience.

Suggestions To Help Lower Your Rate

(1) Test the whole app every day during the entire development process.

Many times these days people only test whatever they just wrote, or what was in the sprint, or even wait until the end of the project (sadly we do too much of that). Apps meant for people should be tested by people; automation and other mechanical types of testing are fine but insufficient. If I've learned anything it's that people are devious and do stuff you never expected, so hire QA/Tester people who enjoy torturing your application! I insisted on this 30 years ago, and even at the travel company we did the same thing. If your app is tested entirely every day all day you won't ever be surprised by what the customer sees.

(2) Leave plenty of time.

We suck at this at my present employer, every release is planned a year in advance and there is never enough time to do an adequate job of quality development, and every release is late and leaks into the next one's schedule which then makes it also require heroic effort to release, and inevitably something horrible happens. Quality software takes time. 25 years ago working on Deltagraph we had a great working relationship with the publisher, we had a vague target for shipping a new release, and both we and they were perfectly ready to dump features for quality as we reached near that goal. If your business refuses to realize the reality that software takes time, then crash-filled releases will happen a lot.

(3) Less is more.

The fewer people you have involved in development of an app the easier communication is, the faster decisions can be made when you still have time to change. Product decisions makers, test leaders, designers, and developers should work together as uniform teams if possible. Combine that with constant testing and feedback is continuous; it's much harder to wind up with unrecognized issues. At my present employer the apps are built by way too many people and teams spread out all over the country, with isolated product decisions often at odds with other teams'. Feedback is almost non-existent, problems linger, and throw it over the wall last minute testing by random people no one even knows results in many missed problems. The first day the customers get your app should be the easiest day you ever have.

(4) Make sure your server environments match production

Many times over my career when there are servers involved, I have seen people fail to ensure that testing environemnts match the production environment. People refuse to spend money on hardware, or use out of date test data, or utilize different configurations or even components.

One time in the past my financial employer hired contractors to build an EJB based server, and these folks thoughtlessly saved state in Stateless Session Beans. The stage server was a single server, and production was a two server system. Of course when the app was released people were seeing each other's data as they bounced from server to server. It was a terrible mess.

The app version I mentioned above where the crash rate spiked to 50% worked great in stage where all testing is done. Production however had different data pushes which did not occur in stage so when our users ran the app there were multiple crashes. It took 2 weeks and 250,000 crashes before they were fixed. Two days of sleuthing were required before anyone even knew of these differences that caused the crashes.

(5) Learn from your mistakes

I like the saying: "if you don't learn from your mistakes, you are optimizing for them." If you have a bad release, why did it happen? What can be done to minimize it ever happening again? Maybe your whole approach to building apps, and not just development, may simply be incapable of releasing things that work. Of course this requires that your leadership is able to honestly evaluate what is going on, instead of just tossing out blame, or forcing people to work overtime, or (happens here a lot) just ignore it each time and hope for the best.

I learned this lesson the first time I released my first app Trapeze 30 years ago last January. We did a release just for the Macworld show, and every time I demoed the app there was a statistical chance it would crash; it happened because we always tested with a debugger and our demo Macs had none. It was embarrassing.  After that I vowed to ensure that testing more accurately mirrored what our customers were doing, and came up with the continuous testing idea. I wanted the "release" to be a no-brainer day, because it wasn't anything we hadn't done before. Since then I haven't had a single release that had crashing issues, even the recent major 16 month project.

Crashing is a fact of life, but it should be as tiny a fact as possible. You can fix a broken app, you can release apps that meet the basic definition of quality: "The app always does what it is supposed to do; it never does what it is not supposed to do; and you are never surprised by what it does." I've lived with that philosophy for the past 3 decades.

There are always more things you can do that work in your industry, or company, or team, or project. The important thing is if it's broken fix it, if it works just keep doing it! Your customers will appreciate it.