I admit I love shopping at Amazon, they sell virtually everything, you can order it online and get it quickly. I also admit I hate the amazon.com website, there is simply too much crap on every page, finding stuff can be tedious, and you can easily get overwhelmed with too much information. A sample page I clocked with Firebug had 43 images and took 330K of network bandwidth to display.
So I decided to start building an alternative interface. With Amazon's web interfaces you can pull down almost everything they know about everything they sell and then some. There is a big catch though, you can only call them once per second per IP (more or less), to keep from overwhelming their servers. Of course you can set up a server farm with dozens of IPs, each accessing them only once per second, so I'm not sure how meaningful the limit is. You could even use Amazons EC2 service to set up the farm.
So if you have only one IP (like I have at home) you can make about 86,000 API calls (in my case via REST) per day. I am currently getting the entire catalog tree, and then will be getting the entire product list for each node in the tree. They only return 10 items per call so the most I can get is 860,000 or so products per day. I imagine I will have to choose only certain stores to keep this reasonable and be able to keep it fresh. If this idea gets traction I can set up a farm and make this faster.
I wish Amazon had a dump API like Wikipedia's.
You might wonder why I don't do what other Amazon web services users do, and only grab information via the APIs as needed. The main reason is to build a different kind of search technology than what Amazon natively does and for that I need the whole data for each item. Their search is extensive but like so many search APIs you get way too many useless results and unless you ask a precise question you may miss many possible matches. I call these (highly untechnically) "useless hits and missing bits". More than 10 years ago I thought up with a different way to find stuff in a large collection but never pursued it, and this seems like an interesting application and test case. After the work I did on the Consumer's Digest website (documented in a post) in 1998 I've always wanted to get back into search, if only part time.
Building a better way to browse and find items in Amazon is not easy, but the main issues are not technical (other than overcoming the API limitations) but user experience issues. I have been working on UI (to use the old terminology) designs for most of my career, so it's not a stretch. Can I build something people would use (or for that matter Amazon itself could use) as an alternative? Part of my desire is to blend (mashup) an alternative experience and other sources of information like Wikipedia and web searches into the mix, assuming I can add value without complicating the design and winding up back where amazon.com is today.
The beauty of the web today is that there are so many opportunities to explore given the wealth of public APIs, sources of information, and affiliate programs that provide raw materials for new ideas. Combine that with a lot of new languages, frameworks and technologies to build with and it's no wonder you read every day of something new being released. Of course in a darwinian environment a lot of stuff winds up in the deadpool but that's the beauty in experimentation. There was a time when eBay, Amazon and Google were startups with an uncertain future.
So while I work on starting up my consulting business (coming soon), I will be spending some time working on the "Amazon Experiment".
At the current rate of 1 call per second, I should be ready with data sometime next century so there's plenty of time.