I Have To Fix Broken Things

Broken pieces of pottery on table

Call it a character flaw or a character benefit—I hate being around broken code, processes, products, or UI. If it's broken, I want to fix it. If I can't, it grates on me.

After I graduated from college, my parents, a friend, and his parents went out to eat, and my dad sat in a broken chair. So he laid it down and attempted to fix it, which embarrassed me then. But today, I understand why, and clearly, I inherited it (though about technology, not furniture).

Throughout my multiple decades of being a programmer, nothing irritated me more than seeing something that should work correctly or function and having it do neither. It's been both a positive and a negative in my career—sometimes, people were thrilled that I made things better, and a few times, I lost my position because someone did not like me trying to fix something that they preferred to leave alone (often because they broke it and would look bad). Sometimes, I had to be a little sneaky or make things better a little piece at a time.

A Failure Teaches Me A Valuable Lesson

I learned a valuable lesson in delivering a quality application in 1987 at the San Francisco Macworld. We were showing our app Trapeze, which was about to ship (we would assemble the packages after the show). I spent several days demoing our product; the key feature was a hierarchical popup menu, we had the first Mac app with such a thing (Apple would release this in Macos a few months later), so people were interested in seeing it work. However, a statistical crash happened sometimes, which was embarrassing. During development and our testing, everyone ran a debugger in case of a crash. In those days, the debugger was in the same address space as the app and loaded first, changing low address values. The code had an uninitialized pointer, but with the debugger present was OK; we had rental Macs and no debugger for the show. It was easy to fix back in the office, but it wasn't very pleasant for the show.

What changed in me was spending more time at the start of a project considering how to build more reliable code, hire permanent QA, and have them test the whole app daily. Remember, back then, no one wrote unit tests or anything you could do today (we did not have access to a code repository as there were none). After this early failure, nothing a team I led or shipped ever was released with severe errors until I retired.

I did a two-week contract in the mid-90s to help complete a release of an email program. They had several mysterious crashes and needed another person to fix them. I spent a little time analyzing the source. I discovered someone had commented out the low-memory handling code (vital in fixed-memory macOS) a couple of releases prior, and the lines had never been uncommented. This was the key reason why the app was so fragile. At the end of the short contract, the entire team took me out for lunch.

Afterward, I worked at Apple for half a year, but this was pre-Jobs, and everything was headed toward going out of business, and I did not want to be there, so I left. That was too broken for me!

I Fix Broken Things

In the mid-2000s, I went to work as an architect at a financial services company, and the interview consisted of the two architects telling me horror stories to see if I would run away. So much brokenness to fix was exciting to me. So during the employee orientation, when everyone was asked what their job was, I responded, "I fix broken things."

Little did I know how much stuff was broken, and I spent 18 months trying to fix everything I could get to, but it was hopeless; there was too much. My most significant success required cleverness. The mainframe folks hated the Java programmers and blamed everything on them. The worst issue was that the app servers and web servers would frequently lock up, creating a lot of complaints from our 1000 field offices and our 60,000 customers. The databases backing the apps ran on the AS/400 (DB2), and I suspected this was an issue, but the challenge was proving it.

I wrote a simple test using the DB2 JDBC driver that fetched only the first row of each of the approximately 200 tables; then, our Oracle DBA told me he routinely downloaded the entire production database from DB2 to Oracle running on an old PC. So I got him to run the benchmark against the production DB2 database and the PC running Oracle with the same tables. The difference was astounding: the little PC appeared significantly faster than the expensive AS/400!

So I wrote it up with charts and sent it to various people, which caused quite a stir. While watching metrics, the AS/400 DBA ran the same benchmarks and discovered that someone had set the DB2 partition to only 100MB and that the AS/400 was swapping like mad between the benchmark and the regular production load. It was clear that this was a deliberate choice when the CIO ordered them to increase the memory to 1GB (and promised to buy them more RAM), and all the issues instantly disappeared.

I was a happy camper!

Not So Happy Fixes

Not everything I did to fix things worked out as well. In my next job (after there were too many broken things in that one), I thought I had found a great job, all Macs, a startup with a great market niche that wanted to expand its product line, and the possibility of an IPO at some point. I was the second Java programmer, as the first complained he could not do the whole project alone. Little did I know...

After a couple of months of building the new online store's front end, I felt ready to investigate what the other programmer's back end (that he talked about endlessly) did. So I went looking in the repository and found absolutely nothing. All that was in any repository was some sample code. He had been working for ten months, and there was nothing.

I reported that to the manager and then the CEO, but what the manager said to the CEO floored me. He said, "Oh, he never checks in any code until it's perfect." Yet there was no proof he had done anything. So after that, I became persona non grata with him and a few other technical people. He also built a cardboard wall so I could not see his display (open office with desks), as I was the only Java programmer and could tell what he was working on. I am sure he was working on contracts for other people while taking the salary.

I tried repeatedly to convince the two other partners that something was wrong, but the CEO did not listen to them either. I stayed a couple more months, writing some code to automate the back office (printers), but I had enough and quit. A year passed, and I heard they finally fired the programmer and the manager. They did not get a new store, stayed in their narrow market, and never expanded, but remained in business. They only needed a new online store; they had a great back office and printing expertise.

The following job I took (as an Architect again) ended even worse. Our main application (a batch system written in Java, chopped up into 20 individual applications) leaked like a sieve; two people had to restart the app multiple times a day and alternated restarting them during the night (they did not accept data from our customers on weekends). It frustrated me that something so important was so broken, so I looked. The system used a C++ framework called by the Java applications. I looked in the documentation, and it indicated that the Java code needed to tell the framework when memory was no longer needed so it could free it on the C++ side. But, looking at the Java code, no such calls were anywhere. I tried to explain this to the person who had written this (now Chief Architect) and got no response. I asked the programmers who supported it, and they said they knew about it but were not permitted to fix it, as it would make the author look bad to the executive team. After that, my tenure at the company did not last long; after a period of highly hostile actions, I was laid off.

After I left, they tried to write a new application, but it failed to work, and I heard a new CIO came in, canceled the project, and went out and bought something commercial.  

Back To Fixing Again

I then worked at a game company with an MMO/FPS game. The team was too small, we had little money, and the game engine was homegrown and had a lot of issues. Despite the low pay (as in 1/3 of what I could have made elsewhere), it was so much fun fixing everything! I even got to battle with a company selling a hack and winning. Sadly I stayed too long at that pay rate, so I had to leave. They are still in business and hoping to eventually move to Unreal 5.X.

Working at a well-known online travel agency, my first mobile job, allowed me to fix stuff again. This was around iOS 5 timeframe. The main iOS app had been the first travel app in the App Store and had been written by two managers as the company did not think mobile mattered, so they had no team. While it was making some sales, the app had 9000 one-star reviews (I asked about that during the interview) as it crashed continuously, I think a record count at the time.

I wound up training all of our Java programmers on Objective-C and iOS, and we took three months to rebuild the app, removing 300 warnings and 500 static analyzer errors, plus improving much of the UI. The app was far more stable and better received when we released this. Eventually, we built an entirely new app just in time to launch on iOS 7 launch day; only our app and one other appeared in the travel industry that day with full iOS 7 features. This team was very flexible and productive; this was the best place I had ever worked in many ways, but our parent company sold our brand to our biggest competitor, and everyone was laid off. Bummer.

Fixing In The Big Leagues

My last job was at the largest company I ever worked at. It proved to be a lot of challenges, including many opportunities for fixing things (not just code but also processes) and building mobile code that worked correctly. Nevertheless, I think it was an excellent place to end my career at.

I could list several things, but this post is too long, so I will only mention one.

When I started, I worked on a 16-month-long project and inherited a large codebase written by an army of contractors that had just shipped. I naturally wanted to see what level of crashes the code had, but no one had access to or cared to look at crash reports. So I requested access and started looking; now, I reported what I saw in the shipping Slack channel every release. At first, few people paid little attention, but two high-profile failures made people start considering crashes as important. In one case, the app, as soon as it was released showed a 50% crash rate several times a day, which alarmed everyone; it took all the leads working together to solve the issues in about two weeks (made more difficult as no one had uploaded the symbol file!). The second issue followed two months of complaints in the App Store reviews (that I also read daily),, which was initially ignored as mere venting. However, an email to our CEO suddenly made the issue significant; the crash was not being reported as it occurred in the launch method where the iOS watchdog simply killed the app as unresponsive as it was trying to sync a database and took way too long on slower iPhones. Again, this failure of the app to work correctly started a trend of more and more people looking at crash reports regularly.

The app overall had a terrible crash rate of 1-2% during this time, but now more people were into paying attention. During the pandemic, when our business was slow many projects were on hold; people started fixing the common crashes, and by the time I retired in 2021, the crash rate got down to 0.25%, and I no longer had to say anything.

I managed to slowly convince a large division to care about the crash rate of the two apps, armed with just data and simple repetition. Over time I taught several people how to properly read them as well, including non-technical staff. Finally, my need-to-fix obsession paid off!

After Retirement

I still care about broken stuff; websites and apps sometimes frustrate me. I went back and forth with Citibank as their website, upon login, would tell me that the version of Safari I was using was too old despite running the latest version on a Mac Studio (it also included a helpful link to a non-existent page). They admitted they finally knew about the issue, and after a few months, I saw it was fixed.

Then take Twitter. Please. My account with 5,600 followers has been bugged for a year by not showing content to my followers, only to people not following me, reducing my views to 5% of what I was getting a year ago, and even looking at the source and trying to do what I could accomplished nothing. If I worked there, I could figure it out (and would have to being obsessed with fixing things), but I wouldn't last 5 seconds with the owner!

So having to fix broken things has been a great benefit, and sometimes a terrible curse, but that's who I am, and I have no intention of changing!