Looks Good To Me: When Code Reviews Go Awry

Cartoon person and headline Looks Good To Me

Code reviews can effectively improve code quality in large or mixed teams with experience differences. They can also be useless if not done correctly or if management does not support the time to do them.

A code review is a modern invention, as the technology to do them easily did not exist for a large part of my career. I did not even have access to a code repository for the first 14 years (the first one I saw was in 1995); although I knew about them, we had no access to one when we built Mac applications in my two startups from 1985 to 1994. Even after this point, I did not see much use of this technique until my last job (2015-2021). Of course, code reviewing as a part of a process likely would have been used elsewhere, just not where I worked.

To do a code review properly, you have to look at three things: (1) what does the code affected currently do, (2) is the change being made appropriate to what was asked for, and (3) what effect does the change have? Doing this may require a large percentage of the time taken by the person who made the change in the PR, depending on how familiar the reviewer is with the code and how complex the change is. Trivial changes may take little time, and complex implementations such as new features, API additions, or large refactorings may be time-consuming.

Note that people, including me, shipped large complex applications for decades before anyone could continuously review changes. My team built Trapeze (a spreadsheet-like program released in 1987), assisted on Persuasion (the only real competitor Powerpoint ever had, released in 1988, we contributed only to version 1.0), and Deltagraph (for the first five years only, a charting and graphing application, released in 1990 and existing until the pandemic finally killed the last owner in 2019 or so). All three applications were large and complex and functioned well.

In contrast, teams are far larger and much more diverse today, with people spread out over geographic areas, operating in a far more complex programming world, interacting with multiple teams, and having many different levels of experience. You can still be successful without code reviews but in far more limited circumstances. Teams in the 80s and 90s tended to be very small due to the inability to manage codebases with more than a handful of people without robust code repositories. My teams for that decade never had more than 4-5 programmers, and we had to be very disciplined to manage our code in a shared folder! We also had to share the code with the primary author of Persuasion, and in the last version of Deltagraph, we shipped the code on hard drives via FedEx. We didn't even have company email until 1991, so only phones could be used to discuss anything remotely. Yet it worked fine because the teams were small and careful.

The first large team I saw was at Apple when they were trying to build a new OS. Around 500 people were contributing to Copland, and it was a complete disaster. People checked in code without any oversight or control, and the project was a nightmare. The tooling to organize such a massive team did not exist, nor did any idea of how to coordinate them all. It's a good thing it was canceled (after I left Developer Support and gave up on Apple), bringing back Steve, and everything changed.

Code reviews, if done correctly, can make teams better. Done poorly or not done at all, they can lead teams into disaster, as everyone assumes the code has been reviewed and lessons learned, yet nothing of the sort happened.

My last employer was an extremely large non-tech company with many divisions; I worked for a high-profile group in one of the largest. We owned the largest vertical piece in our two apps, but many teams contributed to them, including the largest team that managed the mobile app container. Management, particularly project management, insisted that programmers do 40 hours a week of sprint-related work (which is pretty silly). Yet, some teams insisted that everyone also do code reviews on top of that. To do this, you either had to work overtime (or if a contractor is forbidden to work overtime without permission) or make the problem disappear.

Thus enters "Looks Good To Me" or LGTM.

I did not make my team do any required code reviews; however, I also picked everyone on my (small, 3-5) team so I had no inexperienced programmers. Despite being the lead, I was also a full-time programmer on my team, and we all were highly conscious of how we did everything, including asking for reviews if there was any doubt, harkening back to my earlier experience. What we built had extremely high visibility in the company and with our customers, yet we were constantly affected by budgetary constraints; it was either picking a really good small team or getting stuck with an army of cheap contractors. For my team, what we did worked well, and we could do more with fewer people with high-quality results, but this is not appropriate for most situations.

Other teams were much larger, and a mix of programmers with varying experiences, plus many random contractors that came and went (a codebase I inherited was an average of eight programmers over nine months, yet sixty total programmers contributed but left). Due to the large amount of work required and the insistence on spending each programmer's time writing code, code reviewing was reduced to trivialities such as the ubiquitous LGTM. I was mainly asked to review changes that affected my team's responsibilities. The core team would require changes we requested in their codebase to be reviewed, which then would be criticized with pointless comments (like, you need a space before a semi-colon) instead of anything important. Sometimes, we wondered if it was political (company politics can be so much fun).

I occasionally looked at code reviews for other team's changes that eventually became a problem, often crashes that riled up customers and management once they were fixed, to see what happened.

The most egregious example was support that had been added to the iOS app for a set of schedules pushed to the app several times a day. It was "reviewed" by that team and had a positive unit test for code that converted a JSON service response into model objects. The service changed, but no one paid attention; it was never tested in stage at all, and the day it was released to the app store, the crash rate of the app spiked to 50% several times a day! Alarm bells went off everywhere, and all team leads were required to figure out what was wrong; this was made more difficult because the symbol file was never uploaded to the crash reporting service, so we had no idea what was wrong. Once we figured out what the issue was, looking at the PR said it all: LGTM strikes again! The problem was that the programmer (a contractor) had written code that, if the service response did not match expectations, called FatalError(), the one method on iOS that will kill your application. This appeared five times in the PR, and no one had looked at it. The unit test was only positive; a negative test (if the service response was wrong) might have caught the problem. When confronted with the FatalError() debacle, the programmer said, "Well, it would have crashed somewhere else." Sigh.

If a company expects programmers to do code reviews, then it needs to hire more programmers. If each person is doing ten hours properly reviewing the code of others, you need a third more people if you want the same amount of development time. This investment is not an issue for some companies, but not everyone is willing to spend the money. Another consideration is who reviews the code (experience matters); the quality of who writes the code also makes a significant difference, as the likelihood of poorly written code increases. Other factors also include the size of PRs (I've been asked to review changes made by other teams that changed 500 source files, no thank you!), the complexity of changes (I've seen PRs with only a few lines changed that nonetheless would take hours to understand), and unfamiliarly with the codebase, business logic, and algorithms. Many programmers also dislike reviewing code and prefer to work on their own code, and management in some companies may think reviewing code is a waste of time and money. Increasingly larger teams make reviewing necessary, yet may also become ponderous and time-consuming. My small handpicked team was an uncommon option.

Today, code reviewing is well supported by technology (maybe with AI, it can be made easier, but I am not sure if I would trust it at this point), and given the complex nature of programming today, it is more often than not a benefit, but only if it is done correctly and with sufficient time and investment to accomplish that. Doing it poorly or suffering from "Looks Good To Me" syndrome may be worse than doing no reviews, as people will assume everything is fine when it is not.