Home About The Codist RSS Feed

Writing Multithreaded Code Is Like Juggling Chainsaws
Feb 05, 2008 19:24 perm link Readers: 12639

Writing multithreaded code is like juggling chainsaws; amazing when it works and truly sucky when it doesn't.

Right now at my job I am writing the foundation for a transaction processing cluster in Java, so I'm immersed in lots and lots of threads and interacting applications. When you are processing 8000 of something per second, any problems in your approach or in your choice of frameworks is magnified.

In job interviews, a popular question is "what is the major problem you have to solve in writing multithreaded code?" Generally, if they have read a little about it, they often say "avoiding deadlocks". If they have done a bit of thread coding, maybe in Swing, they might say "protected shared data". Only the truly experienced in complex threaded coding will say "avoiding doing nothing".

What's so bad about nothing? Assuming you can avoid deadlocks (generally not hard if you're disciplined) and understand when to protect data, when working in a complex high speed system you want to accomplish both of those basic requirements without slowing down the real work your threads are doing. In a Swing app, so a couple threads block for a while, or even less of a problem is a few waiting around for something to happen. In a large cluster of servers, having threads sitting around doing nothing is a waste of money. If Google is processing millions of searches daily, and half of their cpu's capacity is basically blocked waiting for some resource, it costs a lot of money and power: instead of 50,000 servers you need 100,000. That a hefty price to pay for poor thread coding.

Sure, my company will probably only use 100, and they could afford 200 servers, but why waste resources just to save a few brain cells.

For example, one of my trials used Jetty on one application, and Apache HttpClient on another. With two worker threads on the HttpClient I was able to process my test transactions. When I increased the worker threads to 7, they fell behind; at 10 it was almost dead in the water. WTF? Doing some debugging (with Yourkit Profiler, very nice) the Jetty side was mostly sitting around waiting, but the worker threads in the other app were mostly blocking (as much of 95% of the cpu time). Ah the joys of threaded coding.

The issue turned out to be a synchronized method in HttpClient that deals with Http Parameters, for each parameter for every call on every thread it would pass through this method, creating a common sync point for all the threads. Something that might seem OK for a couple threads was fatal with 10 threads running full tilt at 1000 somethings per second. The chainsaws can be really brutal.

I finally settled on a couple of technologies that work pretty well, on my developer PC (4 core Core Duo something) I was handling 8000 somethings per second at about 85% CPU.

That's another sign of properly avoided nothing, if you slam your creation with unfettered requests and the CPU rises close to 80-95% then you are seeing proper behavior. In my HTTPClient test the CPU rarely got above 25%; mostly a lot of nothing was going on. This was clear with Yourkit. Nothing is not your friend. Don't stop coding when nothing is broken; that's only step one.

Multithreaded coding in a complex application requires a lot of discipline (never assume anything, test everything), experimentation (there is no one way to make it work) and patience (sometimes lots of negative problems lead you to see the solution). It's not easy and it's not quick (never mind the project manager).

Then again neither is juggling chainsaws; plus the downside is pretty nasty.

My Tags:

  • Daniel: Feb 05, 2008 21:39

    "Don't stop coding when nothing is broken; that's only step one."

    That's awesome, I think I'll add that one to my fortune quotes list.

    Ps

    s/What's do bad about/What's so bad about/;

  • AllenJB: Feb 06, 2008 01:45

    Your link & link text are the wrong way round for the YourKit link

  • Tony: Feb 06, 2008 05:36

    On our big, custom data analysis system, we tried to do the whole "framework" and "threads" thing in C++. We wrote about 75K SLOC and it failed utterly due to a host of issues related to timing, overall complexity and lack of design/testing discipline (as you rightly point out). So we threw it all away, went back to the Unix philosophy of "libraries" and "small, single-purpose executables" and "event loops" in C. Wow, what a difference! If you can solve a problem using an "old" technology approach, do yourself a favor and use it.

    I swear I will never work on a project where the words threads, dead-lock, mutex or timing-issues are uttered in meetings again... To butcher a Jamie Zawinski quote: "Some people, when confronted with a problem, think 'I know, I'll use a threaded framework.' Now they have two problems." ;)

    Ha! Love the article.

    Tony

  • Stu Smith: Feb 06, 2008 07:28

    "Writing multithreaded code is like juggling chainsaws; amazing when it works and truly sucky when it doesn't."

    I absolutely agree. What amazes me is that the suggested techniques for dealing with it are exactly the same as they were ten years ago. We seem to have made no practical progress in this area. There are some promising (experimental) techniques for dealing with the problem, but they've been ignored in every modern framework. 'Threading support - stick in a mutex class and let's go to the pub'.

    I'm going to take exception with one paragraph; admittedly I don't know much about your systems, but:

    "That's another sign of properly avoided nothing, if you slam your creation with unfettered requests and the CPU rises close to 80-95% then you are seeing proper behavior. In my HTTPClient test the CPU rarely got above 25%; mostly a lot of nothing was going on."

    Seems wrong to me. Surely anything to do with the network should sit at low CPU levels, it not being a CPU-bounded task? And high CPU levels are not necessarily a sign that all is well. Surely a better metric might be something like "double the number of CPUs, and see a (nearly) 2x improvement in speed"? (Assuming, as I said, it's a CPU-bound operation).

    Incidentally, once experimental technique that caught my eye (software transactional memory) I wrote up into actual working code... if you're interested:

    http://www.feedghost.com/Blogs/BlogEntry.aspx?EntryId=17791

    Stu

  • a: Feb 06, 2008 08:05

    Uhhh, okay, so the synchronized block created a constrained resource.

    That all the other threads were blocked from using, since one thread was probably hogging it.

    While not permanent deadlock, that is basically deadlock, not doing nothing. It's a constrained resource that one thread was holding to the exclusion of the others.

    An interesting story though, I wonder why the parameter processing was synchronized. Pretty crappy design if you ask me...

  • Ed: Feb 06, 2008 08:13

    "Seems wrong to me. Surely anything to do with the network should sit at low CPU levels, it not being a CPU-bounded task?"

    It depends on how big your requests are. If your hitting the server with alot of small requests (<1kb lets say) then it probably be more CPU bound then network bound. Using 1Kb requests size as an example, if the server is processing 8000 requests a second, then that's only 8 MB/s transfer, which isn't a considerable amount of bandwidth. And for the server I'm wrote and am current maintaining for works requests are actually smaller then 100 bytes, an order of magnitude smaller.

  • Michael Davis: Feb 06, 2008 08:21

    "Seems wrong to me. Surely anything to do with the network should sit at low CPU levels..."

    I guess it depends on what the application is doing. If it's doing serious encryption or advanced mathematical transformations, it may well be CPU-bound. If it's doing database searching, it will likely be IO-bound.

  • Stephane Grenier: Feb 06, 2008 08:24

    I love your opening line: "Writing multithreaded code is like juggling chainsaws; amazing when it works and truly sucky when it doesn't." It's perfect!

    I remember writing an application a while back where we were trying to surf the net, just the first page of every website, parse it, and store some resulting calculations in a database for research purposes. Avoiding blocking was pretty simple. Even avoiding data issues wasn't too hard. The really hard part, as you stated, was unused processing time. How do you max out each thread to it's max potential. Not CPU processing, but effectiveness. This took quite some time.

    For example for us, sometimes a thread would have to wait for a website to return. Not all servers have good response times. This is dead time for a thread. We could increase the thread count, but then there's more process swapping. Figuring this out, and maximizing the effectiveness took a good deal of effort. And it wasn't until we got into these optimizations that we started to face harder deadlocking/data issues. Before we optimized our world was much simpler. Optimizing to avoid deadtime significantly increased our threads.

    The good news is that we went from surfing 100k sites to 4 million on the same hardware. A very significant gain!

    Great article!

  • Domagoj Klepac: Feb 06, 2008 08:39

    So how did you solve your HTTPClient params problem? Because I have the same problem, and have to set HTTP params for each request. :)

  • codist: Feb 06, 2008 09:11

    I wound up using Apache Mina and a custom protocol over plain sockets instead of HTTP. THere is probably some way with HTTPClient to avoid sharing the parameter issue (I didn't go further with it, but I am sure you can have a separate state for each Thread).

  • Domagoj Klepac: Feb 06, 2008 09:59

    Wow, didn't know about Apache Mina, looks interesting. Thanks for response, seems that I'll have to dig into HTTPClient...

  • Aminorex: Feb 06, 2008 11:14

    One has to ask why you keep juggling chainsaws. It seems a remarkably unclever thing to do. I have to guess that you are not getting the kind of risk-reward that the Brothers Karamazov got during their extended stints in Vegas. It's lovely to know that you've learned to do it well enough so as to minimize finger reattachment surgeries, but even so...

  • jc: Feb 06, 2008 11:15

    I can truly sympathize with you. I gained much (experience) and lost some (hair) in writing threaded code. Its akin to being a babysitter with infinite babies to "sit". It was also fun.

  • JC: Feb 06, 2008 14:58

    I feel your pain (and joy)! I wrote a replacement for Java RMI with: resizable thread pools (while the system is processing requests!), controlled timeouts, separate thread allotments based on the type of workload, O(1) thread allocation/deallocation, pluggable transports, asynchronous calls, controlled degradation (server busy messages anyone?) and denial-of-service protection.

    It was one of the most painful and gratifying projects I've ever worked with. I still think of it as one of the pieces of work that make me proudest (is that even a word?).

    I really enjoyed your article, keep'em coming!

  • Imp: Feb 07, 2008 13:01

    I agree that it's bad if the threads spend too much time blocked. It's an indication that your locks aren't granular enough (e.g. you should look at locking a row rather than the entire table). However, just having the CPU rise with load is not indication that you've gotten it right either. In general more granularity means more concurrent threads, but if your locks are too granular then you can waste a lot of cycles locking and unlocking unnecessarily (e.g. you should consider locking a page rather than an single row).

    One good indication that you've gotten the granularity right is how much contention there is for locks. If an average lock takes 1ms to get, and is held for 100ms then if you don't have contention for 1% of locks then you can probably get better performance by having less granularity in your locks (even though it might mean that the CPU isn't pegged).

  • Steve Campbell: Feb 08, 2008 11:15

    There are some API patterns that can insulate from many of the complexities, at the expense of using thread-pools. Working with individual threads is painful in comparison. Kindof like programming in Assembly vs C, or C vs Java (similar performance arguments apply).

    Some examples... the BackgroundWorker component in .NET. Or even the basic .StartSomething + .EndSomething (callback) pattern common in many .NET libraries. (Sorry, I do not know Java examples).

  • Add Comment

Kindergarten, The Digg Effect, Shared Hosting and Traffic Rankings
Apr 09, 2007 09:40 perm link Readers: 1038

Hows that for a complicated title?

I posted All I Need To Know To Be A Better Programmer I Learned In Kindergarten last Thursday morning, and watched it do absolutely nothing that day (600 readers or so, way below an average day).

The next day it was added to Reddit and during the day had about 6,000 readers (good, but no record). That night I went to Good Friday services and shared a meal with friends. Before going to bed (late) I check the stats and noticed 2,000 people had read the article in the last hour. I saw that it had 69 diggs so I visited Digg to see where it was. It turned out it was #2 on Digg. This was the first time one of my articles went anywhere on Digg (usually Reddit is my main traffic site). In the next two days I had nearly 40,000 readers and over 1,000 diggs.

This site runs on a shared host on linux (some kind of mid-level single CPU Athlon box) at kattare.com with apache on the front end passing all traffic to my jetty server, which runs my java blog application (BlogFiche) on top of my own web application framework (Fiche) using H2 as the database. This instance of jetty is also running some other (lower traffic) sites. I wasn't remotely afraid of the "Digg Effect" since I knew exactly how much traffic this platform was designed for. I wouldn't get worried until traffic hit 5 per second or so sustained (that's 18K+ per hour). The max I saw was probably around 3K per hour, so the system was still mostly idle.

My site caches all generated page content and only regenerates it if something changes (I edit the site or a comment is posted). All static content is served via the jetty default servlet, which also does caching. Jetty itself uses Java NIO (non-blocking IO), so it's extremely efficient at serving web content. So in effect the only thing happening on 99.9% of all connections is IO. Digged? No problem.

Of course I wouldn't advertise myself as knowing and solving performance issues if my own site performance sucked. Seeing blogs getting digged and die is a little irritating to people; I can understand a graphics or video site getting swamped, but a simple blog shouldn't have any trouble with 1 connection per second (3,600 per hour).

The funny thing about the traffic is the difference between Google and Alexa. Even though I had 40,000 readers for this one article, my Alexa ranking dropped a lot over the same timeframe. Google's index is filled with references (and outright copies!) of the article, but Alexa shows nothing happened at all. Maybe fewer people are using the Alexa toolbar these days.

I tend to look at my adgridwork account for quick updates to my traffic. Although I am collecting some stats I have been lazy to update the admin portion of this blog so I can see (and show) actual counts on the articles.

It was pretty cool (and humbling) that so many people read the article from all over the world. The web is an amazing place.

My Tags:

  • codist: Apr 09, 2007 11:45

    Another blog posted a similar article at Surviving the digg effect. Unfortunately he hit the digg front page twice at the same time and it leveled his server. Ouch.

  • Dan: Apr 12, 2007 13:26

    Congratulations! It was a great article!

  • Add Comment

Links: Yahoo On Web/Ajax Peformance
Apr 02, 2007 08:07 perm link Readers: 701

Found this series of articles on the yahoo UI blog. Some interesting stuff. Pardon if you all read this already. There is so much information on the web it's easy to miss it the first time around.

Performance Research Part 1: What the 80/20 Rule Tells Us about Reducing HTTP Requests

Performance Research Part 2: Browser Cache Usage - Exposed!

Performance Research Part 3: When the Cookie Crumbles

My Tags:

Name:


Optional URL:


Comment:


Save Cancel

Copyright © 2007 By Andrew Wulf