Home About The Codist RSS Feed

Writing Multithreaded Code Is Like Juggling Chainsaws
Feb 05, 2008 19:24 perm link Readers: 12638

Writing multithreaded code is like juggling chainsaws; amazing when it works and truly sucky when it doesn't.

Right now at my job I am writing the foundation for a transaction processing cluster in Java, so I'm immersed in lots and lots of threads and interacting applications. When you are processing 8000 of something per second, any problems in your approach or in your choice of frameworks is magnified.

In job interviews, a popular question is "what is the major problem you have to solve in writing multithreaded code?" Generally, if they have read a little about it, they often say "avoiding deadlocks". If they have done a bit of thread coding, maybe in Swing, they might say "protected shared data". Only the truly experienced in complex threaded coding will say "avoiding doing nothing".

What's so bad about nothing? Assuming you can avoid deadlocks (generally not hard if you're disciplined) and understand when to protect data, when working in a complex high speed system you want to accomplish both of those basic requirements without slowing down the real work your threads are doing. In a Swing app, so a couple threads block for a while, or even less of a problem is a few waiting around for something to happen. In a large cluster of servers, having threads sitting around doing nothing is a waste of money. If Google is processing millions of searches daily, and half of their cpu's capacity is basically blocked waiting for some resource, it costs a lot of money and power: instead of 50,000 servers you need 100,000. That a hefty price to pay for poor thread coding.

Sure, my company will probably only use 100, and they could afford 200 servers, but why waste resources just to save a few brain cells.

For example, one of my trials used Jetty on one application, and Apache HttpClient on another. With two worker threads on the HttpClient I was able to process my test transactions. When I increased the worker threads to 7, they fell behind; at 10 it was almost dead in the water. WTF? Doing some debugging (with Yourkit Profiler, very nice) the Jetty side was mostly sitting around waiting, but the worker threads in the other app were mostly blocking (as much of 95% of the cpu time). Ah the joys of threaded coding.

The issue turned out to be a synchronized method in HttpClient that deals with Http Parameters, for each parameter for every call on every thread it would pass through this method, creating a common sync point for all the threads. Something that might seem OK for a couple threads was fatal with 10 threads running full tilt at 1000 somethings per second. The chainsaws can be really brutal.

I finally settled on a couple of technologies that work pretty well, on my developer PC (4 core Core Duo something) I was handling 8000 somethings per second at about 85% CPU.

That's another sign of properly avoided nothing, if you slam your creation with unfettered requests and the CPU rises close to 80-95% then you are seeing proper behavior. In my HTTPClient test the CPU rarely got above 25%; mostly a lot of nothing was going on. This was clear with Yourkit. Nothing is not your friend. Don't stop coding when nothing is broken; that's only step one.

Multithreaded coding in a complex application requires a lot of discipline (never assume anything, test everything), experimentation (there is no one way to make it work) and patience (sometimes lots of negative problems lead you to see the solution). It's not easy and it's not quick (never mind the project manager).

Then again neither is juggling chainsaws; plus the downside is pretty nasty.

My Tags:

  • Daniel: Feb 05, 2008 21:39

    "Don't stop coding when nothing is broken; that's only step one."

    That's awesome, I think I'll add that one to my fortune quotes list.

    Ps

    s/What's do bad about/What's so bad about/;

  • AllenJB: Feb 06, 2008 01:45

    Your link & link text are the wrong way round for the YourKit link

  • Tony: Feb 06, 2008 05:36

    On our big, custom data analysis system, we tried to do the whole "framework" and "threads" thing in C++. We wrote about 75K SLOC and it failed utterly due to a host of issues related to timing, overall complexity and lack of design/testing discipline (as you rightly point out). So we threw it all away, went back to the Unix philosophy of "libraries" and "small, single-purpose executables" and "event loops" in C. Wow, what a difference! If you can solve a problem using an "old" technology approach, do yourself a favor and use it.

    I swear I will never work on a project where the words threads, dead-lock, mutex or timing-issues are uttered in meetings again... To butcher a Jamie Zawinski quote: "Some people, when confronted with a problem, think 'I know, I'll use a threaded framework.' Now they have two problems." ;)

    Ha! Love the article.

    Tony

  • Stu Smith: Feb 06, 2008 07:28

    "Writing multithreaded code is like juggling chainsaws; amazing when it works and truly sucky when it doesn't."

    I absolutely agree. What amazes me is that the suggested techniques for dealing with it are exactly the same as they were ten years ago. We seem to have made no practical progress in this area. There are some promising (experimental) techniques for dealing with the problem, but they've been ignored in every modern framework. 'Threading support - stick in a mutex class and let's go to the pub'.

    I'm going to take exception with one paragraph; admittedly I don't know much about your systems, but:

    "That's another sign of properly avoided nothing, if you slam your creation with unfettered requests and the CPU rises close to 80-95% then you are seeing proper behavior. In my HTTPClient test the CPU rarely got above 25%; mostly a lot of nothing was going on."

    Seems wrong to me. Surely anything to do with the network should sit at low CPU levels, it not being a CPU-bounded task? And high CPU levels are not necessarily a sign that all is well. Surely a better metric might be something like "double the number of CPUs, and see a (nearly) 2x improvement in speed"? (Assuming, as I said, it's a CPU-bound operation).

    Incidentally, once experimental technique that caught my eye (software transactional memory) I wrote up into actual working code... if you're interested:

    http://www.feedghost.com/Blogs/BlogEntry.aspx?EntryId=17791

    Stu

  • a: Feb 06, 2008 08:05

    Uhhh, okay, so the synchronized block created a constrained resource.

    That all the other threads were blocked from using, since one thread was probably hogging it.

    While not permanent deadlock, that is basically deadlock, not doing nothing. It's a constrained resource that one thread was holding to the exclusion of the others.

    An interesting story though, I wonder why the parameter processing was synchronized. Pretty crappy design if you ask me...

  • Ed: Feb 06, 2008 08:13

    "Seems wrong to me. Surely anything to do with the network should sit at low CPU levels, it not being a CPU-bounded task?"

    It depends on how big your requests are. If your hitting the server with alot of small requests (<1kb lets say) then it probably be more CPU bound then network bound. Using 1Kb requests size as an example, if the server is processing 8000 requests a second, then that's only 8 MB/s transfer, which isn't a considerable amount of bandwidth. And for the server I'm wrote and am current maintaining for works requests are actually smaller then 100 bytes, an order of magnitude smaller.

  • Michael Davis: Feb 06, 2008 08:21

    "Seems wrong to me. Surely anything to do with the network should sit at low CPU levels..."

    I guess it depends on what the application is doing. If it's doing serious encryption or advanced mathematical transformations, it may well be CPU-bound. If it's doing database searching, it will likely be IO-bound.

  • Stephane Grenier: Feb 06, 2008 08:24

    I love your opening line: "Writing multithreaded code is like juggling chainsaws; amazing when it works and truly sucky when it doesn't." It's perfect!

    I remember writing an application a while back where we were trying to surf the net, just the first page of every website, parse it, and store some resulting calculations in a database for research purposes. Avoiding blocking was pretty simple. Even avoiding data issues wasn't too hard. The really hard part, as you stated, was unused processing time. How do you max out each thread to it's max potential. Not CPU processing, but effectiveness. This took quite some time.

    For example for us, sometimes a thread would have to wait for a website to return. Not all servers have good response times. This is dead time for a thread. We could increase the thread count, but then there's more process swapping. Figuring this out, and maximizing the effectiveness took a good deal of effort. And it wasn't until we got into these optimizations that we started to face harder deadlocking/data issues. Before we optimized our world was much simpler. Optimizing to avoid deadtime significantly increased our threads.

    The good news is that we went from surfing 100k sites to 4 million on the same hardware. A very significant gain!

    Great article!

  • Domagoj Klepac: Feb 06, 2008 08:39

    So how did you solve your HTTPClient params problem? Because I have the same problem, and have to set HTTP params for each request. :)

  • codist: Feb 06, 2008 09:11

    I wound up using Apache Mina and a custom protocol over plain sockets instead of HTTP. THere is probably some way with HTTPClient to avoid sharing the parameter issue (I didn't go further with it, but I am sure you can have a separate state for each Thread).

  • Domagoj Klepac: Feb 06, 2008 09:59

    Wow, didn't know about Apache Mina, looks interesting. Thanks for response, seems that I'll have to dig into HTTPClient...

  • Aminorex: Feb 06, 2008 11:14

    One has to ask why you keep juggling chainsaws. It seems a remarkably unclever thing to do. I have to guess that you are not getting the kind of risk-reward that the Brothers Karamazov got during their extended stints in Vegas. It's lovely to know that you've learned to do it well enough so as to minimize finger reattachment surgeries, but even so...

  • jc: Feb 06, 2008 11:15

    I can truly sympathize with you. I gained much (experience) and lost some (hair) in writing threaded code. Its akin to being a babysitter with infinite babies to "sit". It was also fun.

  • JC: Feb 06, 2008 14:58

    I feel your pain (and joy)! I wrote a replacement for Java RMI with: resizable thread pools (while the system is processing requests!), controlled timeouts, separate thread allotments based on the type of workload, O(1) thread allocation/deallocation, pluggable transports, asynchronous calls, controlled degradation (server busy messages anyone?) and denial-of-service protection.

    It was one of the most painful and gratifying projects I've ever worked with. I still think of it as one of the pieces of work that make me proudest (is that even a word?).

    I really enjoyed your article, keep'em coming!

  • Imp: Feb 07, 2008 13:01

    I agree that it's bad if the threads spend too much time blocked. It's an indication that your locks aren't granular enough (e.g. you should look at locking a row rather than the entire table). However, just having the CPU rise with load is not indication that you've gotten it right either. In general more granularity means more concurrent threads, but if your locks are too granular then you can waste a lot of cycles locking and unlocking unnecessarily (e.g. you should consider locking a page rather than an single row).

    One good indication that you've gotten the granularity right is how much contention there is for locks. If an average lock takes 1ms to get, and is held for 100ms then if you don't have contention for 1% of locks then you can probably get better performance by having less granularity in your locks (even though it might mean that the CPU isn't pegged).

  • Steve Campbell: Feb 08, 2008 11:15

    There are some API patterns that can insulate from many of the complexities, at the expense of using thread-pools. Working with individual threads is painful in comparison. Kindof like programming in Assembly vs C, or C vs Java (similar performance arguments apply).

    Some examples... the BackgroundWorker component in .NET. Or even the basic .StartSomething + .EndSomething (callback) pattern common in many .NET libraries. (Sorry, I do not know Java examples).

  • Add Comment

Name:


Optional URL:


Comment:


Save Cancel

Copyright © 2007 By Andrew Wulf