The Best Code I Ever Wrote (Part 1 of 2)
May 26, 2016This project was the most pressure-filled thing I ever did yet in many ways it was also the most fun. In the end it was also the best code I ever wrote.
In 1999 or so I was working for a consulting firm working on an application for the US Post Office, and my officemate was assigned to a big new project for a popular magazine. I didn’t really pay much attention at first since I was trying to avoid going postal (USPS was fairly painful to work with). The magazine had boldly promised the previous fall that in one year they would have a fabulous new website that would allow consumers to search their massive consumer product catalog, and especially a database of every make, model, package and feature of most of the world’s automobile manufacturers. It was a bold promise.
I knew it was a big deal since I heard the the magazine had hired another consulting firm who had worked on the project for six months before giving up and telling the magazine that what they wanted to do was impossible. Of course our CEO immediately said we can do it and a team started from scratch. The magazine had all the data in raw form. We (as in the team, not me, I was still postalized) built a database and a data entry system so that an army of 20 people could input and categorize the data.
They also started on the website itself, and hired some supposed expert on search engines who would build the crucial engine itself. Strangely enough he lived and worked on a mountaintop in Colorado or Wyoming (I forget which). We also bought the hardware (3 servers from HP) and were going to host it in our server room (no cloud back then).
So about 3 months in and with less than 2 months to go before the promised date panic began to set in and I heard lots of discussions and concern. The web application was built with WebObjects (I think Apple by now but maybe still NeXT), Enterprise Objects and the database was Oracle, all running on HP/UX. The search engine, like the app, was written in Objective-C, and had been delivered and installed in the application and put on the production hardware.
It was slow. Not just a little slow. 70-seconds-per-request slow. Apparently the “expert” had written the engine using just a few rows of data. Once it was exposed to the real database, which had a lot of data and especially all the automative data, it was dreadfully slow.
Of course our DBA and architect and everyone else not tied down started to investigate, especially the database query, which measured 4 printed sheets of paper long (I know the DBA stapled it to his office door). Nothing anyone could think of helped. No amount of indexing made any difference. People were beginning to worry that maybe it was impossible just as the other firm had concluded.
I remember asking my office mate, the architect, “let me take a look”. My project was at a slow point and I thought it might be useful to help out.
At this point I knew basically nothing which I think might have been a big advantage. So someone got me access to the database and gave me a brief overview, and of course I could play with the app itself.
What I learned first was how automobile manufacturers put together vehicles. There were features, from vanity mirror to heated seats to engines, individual ones, assembled into various packages. Each vehicle started with a base model to which you could apply packages. The packages themselves had rules for combinations: some packages required others (and), possibly from a set of choices (or), and some packages could not be combined with others (not). For example a package with an engine could not have another package with an engine, heavy duty towing might require a bigger engine from some package, etc. The magazine had organized the data across all the manufacturers with a categorization scheme that would allow search and display so that the user could search for any combination of features, model types and cost across all manufacturers. This also worked with non-automobiles but that wasn't all that exciting. Automobiles were the killer feature.
Now the data in the database contained the features individually from each manufacturer, combined into packages, the rules were encoded into the database, linked to vehicles they applied to across a manufacturer’s models with a complex web of categories. Once I stared at it for a bit I realized the data was a poor match for a relational database, there was no relation that went from a feature and got you to a vehicle model. It was more of a description of how to derive the actual data rather than the data itself. The database schema had been built in a hurry and no one really took any time to understand the nuances of the problem since the time was so short. It might have been possible to model the data better but by this time an army of people had already entered the data into the current database and there was no time to do it again.
So while everyone was trying to hammer the existing code I began by extracting the data from the database so I could see how I might model it different. Of course I could not change the database since everything was complete otherwise. So the solution would have to be something else.
Now prior to working for this firm I wrote and sold a commercial memory allocator and heap debugging library called HeapManager for MacOS (prior to OS X). So I figured maybe I could organize the data in memory and search that instead of starting from the database directly.
At this point no one expected much of me. There were only 4 weeks to go before beta and 6 before the promise date.
So I wrote an extractor to pull out of the database all the relevant searchable data and the encoded rules that described how the packages could be combined. The search feature of the app would let the user pick from categories of features and other things like 2 door vehicles, or trucks, and specify a target price. The results would be a list of models across all the manufacturers that could be customized by any sets of packages that were legal to combine. The idea was you could look for whatever combo of things you wanted and it would figure out what an auto manufacturer would be able to custom build for you. If this worked it was something even the auto manufacturers didn’t have as far as I knew.
I saw that the package rules created a sort of tree where the relationships to the children were and, or and not. So picking a package you could traverse its children but you would have to follow the boolean type. If the child was ‘and’ you had to move down, if the child was part of an ‘or’ you had to move down all of its siblings, if the child was ‘not’ you could stop. Each package had children that described all the potential and required child packages until you exhausted all the possibilities. So starting at a feature like heated steering wheel you could find all the packages that included that, traverse the children for each recursively, and eventually find all the models that were connected to those packages.
So for example feature a was in package b & c, b required one of d, e, f and c required g but not f, d required h, e required i; you get the picture. Of course the feature likely existed in each manufacturer so it was a huge forest.
So now I had a good idea of what I was needing to figure out. Given a set of features and other restrictions (like pickup truck or 4-doors or under $20,000) could I come up with results for every conceivable query and do it in no more than a second?
Briefly I thought of generating every possible vehicle from the data but the combinations were simply too large and I knew we were limited to just the hardware the customer had authorized. I also didn’t feel confident I could verify I was actually generating them all either.
Today of course this is the correct answer! But not in our situation and in 1999 the options were pretty limiting.
At this point I was fairly confident I understood the problem. I then analyzed the existing search engine (still chugging along at almost a minute per query) and realized not only was it stupidly slow, it was also very wrong. By hand tracing my data I could see that its answers missed a lot of correct results but also included a lot of completely bogus ones. There was no fixing it.
So I presented my analysis to the team which was pretty desperate so they told me to go ahead and see if I could build a replacement while they continued to try to fix the original. By this point I had 3 weeks left.
To be continued in the next post! Yeah, a cliffhanger.