The Codist - Programmerthink

Writing Multithreaded Code Is Like Juggling Chainsaws

Posted: 02/05/2008, Perm Link Readers: 21223


Writing multithreaded code is like juggling chainsaws; amazing when it works and truly sucky when it doesn't.

Right now at my job I am writing the foundation for a transaction processing cluster in Java, so I'm immersed in lots and lots of threads and interacting applications. When you are processing 8000 of something per second, any problems in your approach or in your choice of frameworks is magnified.

In job interviews, a popular question is "what is the major problem you have to solve in writing multithreaded code?" Generally, if they have read a little about it, they often say "avoiding deadlocks". If they have done a bit of thread coding, maybe in Swing, they might say "protected shared data". Only the truly experienced in complex threaded coding will say "avoiding doing nothing".

What's so bad about nothing? Assuming you can avoid deadlocks (generally not hard if you're disciplined) and understand when to protect data, when working in a complex high speed system you want to accomplish both of those basic requirements without slowing down the real work your threads are doing. In a Swing app, so a couple threads block for a while, or even less of a problem is a few waiting around for something to happen. In a large cluster of servers, having threads sitting around doing nothing is a waste of money. If Google is processing millions of searches daily, and half of their cpu's capacity is basically blocked waiting for some resource, it costs a lot of money and power: instead of 50,000 servers you need 100,000. That a hefty price to pay for poor thread coding.

Sure, my company will probably only use 100, and they could afford 200 servers, but why waste resources just to save a few brain cells.

For example, one of my trials used Jetty on one application, and Apache HttpClient on another. With two worker threads on the HttpClient I was able to process my test transactions. When I increased the worker threads to 7, they fell behind; at 10 it was almost dead in the water. WTF? Doing some debugging (with Yourkit Profiler, very nice) the Jetty side was mostly sitting around waiting, but the worker threads in the other app were mostly blocking (as much of 95% of the cpu time). Ah the joys of threaded coding.

The issue turned out to be a synchronized method in HttpClient that deals with Http Parameters, for each parameter for every call on every thread it would pass through this method, creating a common sync point for all the threads. Something that might seem OK for a couple threads was fatal with 10 threads running full tilt at 1000 somethings per second. The chainsaws can be really brutal.

I finally settled on a couple of technologies that work pretty well, on my developer PC (4 core Core Duo something) I was handling 8000 somethings per second at about 85% CPU.

That's another sign of properly avoided nothing, if you slam your creation with unfettered requests and the CPU rises close to 80-95% then you are seeing proper behavior. In my HTTPClient test the CPU rarely got above 25%; mostly a lot of nothing was going on. This was clear with Yourkit. Nothing is not your friend. Don't stop coding when nothing is broken; that's only step one.

Multithreaded coding in a complex application requires a lot of discipline (never assume anything, test everything), experimentation (there is no one way to make it work) and patience (sometimes lots of negative problems lead you to see the solution). It's not easy and it's not quick (never mind the project manager).

Then again neither is juggling chainsaws; plus the downside is pretty nasty.

Daniel 02/05/2008 23:39

"Don't stop coding when nothing is broken; that's only step one."

That's awesome, I think I'll add that one to my fortune quotes list.

Ps

s/What's do bad about/What's so bad about/;

AllenJB 02/06/2008 03:45

Your link & link text are the wrong way round for the YourKit link

Tony 02/06/2008 07:36

On our big, custom data analysis system, we tried to do the whole "framework" and "threads" thing in C++. We wrote about 75K SLOC and it failed utterly due to a host of issues related to timing, overall complexity and lack of design/testing discipline (as you rightly point out). So we threw it all away, went back to the Unix philosophy of "libraries" and "small, single-purpose executables" and "event loops" in C. Wow, what a difference! If you can solve a problem using an "old" technology approach, do yourself a favor and use it.

I swear I will never work on a project where the words threads, dead-lock, mutex or timing-issues are uttered in meetings again... To butcher a Jamie Zawinski quote: "Some people, when confronted with a problem, think 'I know, I'll use a threaded framework.' Now they have two problems." ;)

Ha! Love the article.

Tony

Stu Smith 02/06/2008 09:28

"Writing multithreaded code is like juggling chainsaws; amazing when it works and truly sucky when it doesn't."

I absolutely agree. What amazes me is that the suggested techniques for dealing with it are exactly the same as they were ten years ago. We seem to have made no practical progress in this area. There are some promising (experimental) techniques for dealing with the problem, but they've been ignored in every modern framework. 'Threading support - stick in a mutex class and let's go to the pub'.

I'm going to take exception with one paragraph; admittedly I don't know much about your systems, but:

"That's another sign of properly avoided nothing, if you slam your creation with unfettered requests and the CPU rises close to 80-95% then you are seeing proper behavior. In my HTTPClient test the CPU rarely got above 25%; mostly a lot of nothing was going on."

Seems wrong to me. Surely anything to do with the network should sit at low CPU levels, it not being a CPU-bounded task? And high CPU levels are not necessarily a sign that all is well. Surely a better metric might be something like "double the number of CPUs, and see a (nearly) 2x improvement in speed"? (Assuming, as I said, it's a CPU-bound operation).

Incidentally, once experimental technique that caught my eye (software transactional memory) I wrote up into actual working code... if you're interested:

http://www.feedghost.com/Blogs/BlogEntry.aspx?EntryId=17791

Stu

a 02/06/2008 10:05

Uhhh, okay, so the synchronized block created a constrained resource.

That all the other threads were blocked from using, since one thread was probably hogging it.

While not permanent deadlock, that is basically deadlock, not doing nothing. It's a constrained resource that one thread was holding to the exclusion of the others.

An interesting story though, I wonder why the parameter processing was synchronized. Pretty crappy design if you ask me...

Ed 02/06/2008 10:13

"Seems wrong to me. Surely anything to do with the network should sit at low CPU levels, it not being a CPU-bounded task?"

It depends on how big your requests are. If your hitting the server with alot of small requests (<1kb lets say) then it probably be more CPU bound then network bound. Using 1Kb requests size as an example, if the server is processing 8000 requests a second, then that's only 8 MB/s transfer, which isn't a considerable amount of bandwidth. And for the server I'm wrote and am current maintaining for works requests are actually smaller then 100 bytes, an order of magnitude smaller.

Michael Davis 02/06/2008 10:21

"Seems wrong to me. Surely anything to do with the network should sit at low CPU levels..."

I guess it depends on what the application is doing. If it's doing serious encryption or advanced mathematical transformations, it may well be CPU-bound. If it's doing database searching, it will likely be IO-bound.

Stephane Grenier 02/06/2008 10:24

I love your opening line: "Writing multithreaded code is like juggling chainsaws; amazing when it works and truly sucky when it doesn't." It's perfect!

I remember writing an application a while back where we were trying to surf the net, just the first page of every website, parse it, and store some resulting calculations in a database for research purposes. Avoiding blocking was pretty simple. Even avoiding data issues wasn't too hard. The really hard part, as you stated, was unused processing time. How do you max out each thread to it's max potential. Not CPU processing, but effectiveness. This took quite some time.

For example for us, sometimes a thread would have to wait for a website to return. Not all servers have good response times. This is dead time for a thread. We could increase the thread count, but then there's more process swapping. Figuring this out, and maximizing the effectiveness took a good deal of effort. And it wasn't until we got into these optimizations that we started to face harder deadlocking/data issues. Before we optimized our world was much simpler. Optimizing to avoid deadtime significantly increased our threads.

The good news is that we went from surfing 100k sites to 4 million on the same hardware. A very significant gain!

Great article!

Domagoj Klepac 02/06/2008 10:39

So how did you solve your HTTPClient params problem? Because I have the same problem, and have to set HTTP params for each request. :)

codist 02/06/2008 11:11

I wound up using Apache Mina and a custom protocol over plain sockets instead of HTTP. THere is probably some way with HTTPClient to avoid sharing the parameter issue (I didn't go further with it, but I am sure you can have a separate state for each Thread).

Domagoj Klepac 02/06/2008 11:59

Wow, didn't know about Apache Mina, looks interesting. Thanks for response, seems that I'll have to dig into HTTPClient...

Aminorex 02/06/2008 13:14

One has to ask why you keep juggling chainsaws. It seems a remarkably unclever thing to do. I have to guess that you are not getting the kind of risk-reward that the Brothers Karamazov got during their extended stints in Vegas. It's lovely to know that you've learned to do it well enough so as to minimize finger reattachment surgeries, but even so...

jc 02/06/2008 13:15

I can truly sympathize with you. I gained much (experience) and lost some (hair) in writing threaded code. Its akin to being a babysitter with infinite babies to "sit". It was also fun.

JC 02/06/2008 16:58

I feel your pain (and joy)! I wrote a replacement for Java RMI with: resizable thread pools (while the system is processing requests!), controlled timeouts, separate thread allotments based on the type of workload, O(1) thread allocation/deallocation, pluggable transports, asynchronous calls, controlled degradation (server busy messages anyone?) and denial-of-service protection.

It was one of the most painful and gratifying projects I've ever worked with. I still think of it as one of the pieces of work that make me proudest (is that even a word?).

I really enjoyed your article, keep'em coming!

Imp 02/07/2008 15:01

I agree that it's bad if the threads spend too much time blocked. It's an indication that your locks aren't granular enough (e.g. you should look at locking a row rather than the entire table). However, just having the CPU rise with load is not indication that you've gotten it right either. In general more granularity means more concurrent threads, but if your locks are too granular then you can waste a lot of cycles locking and unlocking unnecessarily (e.g. you should consider locking a page rather than an single row).

One good indication that you've gotten the granularity right is how much contention there is for locks. If an average lock takes 1ms to get, and is held for 100ms then if you don't have contention for 1% of locks then you can probably get better performance by having less granularity in your locks (even though it might mean that the CPU isn't pegged).

Steve Campbell 02/08/2008 13:15

There are some API patterns that can insulate from many of the complexities, at the expense of using thread-pools. Working with individual threads is painful in comparison. Kindof like programming in Assembly vs C, or C vs Java (similar performance arguments apply).

Some examples... the BackgroundWorker component in .NET. Or even the basic .StartSomething + .EndSomething (callback) pattern common in many .NET libraries. (Sorry, I do not know Java examples).

Chelsius 07/13/2008 04:39

I think that's mostly because you're one of those Java programmers who never really learned how hardware works, what a state engine is, how to code in C++, etc. Do you even know what a context switch is? How about a TLB? Moron...

Code in Java...go to jail...

Chelsius 07/13/2008 04:39

I think that's mostly because you're one of those Java programmers who never really learned how hardware works, what a state engine is, how to code in C++, etc. Do you even know what a context switch is? How about a TLB? Moron...

Code in Java...go to jail...

codist 07/14/2008 22:57

More lols. I learned C++ using Cfront when it first came out around 1989. I also wrote a commercial C++ memory allocator in C++ in 1996. You can even see the code on this site.

gen 09/23/2009 11:09

Hi, This is very nice article. We use httpclient extensively along with httpunit to load test websites. also we use it to scrape web data <a href="http://www.scrappingexpert.com/"> Scrappingexpert.com Web data Extraction , crawling, data mining</a>

thanks, Prat

Arjun Pakrashi 12/26/2009 23:11

I am a beginner in threads. As far as i have encountered, is it is very difficult to avoid the "Doing nothing". I made a 2 thread solution of a certain permutation generation algorithm, which shared no common data at all, so i could omit any synchronizations. Also the calculations involved mostly the CPU with only incrementing and shifting of values in an array. I used pthreads with C language. When it runs in my 2 core AMD cpu, the system monitor shows both CPU with 100% load all the time. But the execution time is much more than that of a single threaded version of the same code. Although the CPU shows 100% all the time the process waits for 'some thing to happen' . Now that is a problem with me. I hope i would be able to learn the art.

Dr. Dude 01/28/2010 06:06

The key to writing any code is to find someone who's already solved the problem. It sounds like you could have implemented a SEDA architecture to handle this. I think there's a current Java implementation floating around.

ajuc 01/28/2010 07:02

Have you tried clojure? It's language on jvm that is designed to make multithreading managable.

And it has different model of threadings than locking.

It has software transactional memory, and allows writing threaded code like you were writing sql (commits, rollbacks, etc).

It is considerably easier than Juggling with Locks.

Brian Hammond 01/28/2010 10:08

Take a look at a system like Node.js, an everything-is-async server-side Javascript framework for highly concurrent network services. Almost everything down the stack from request processing, file I/O (yes, really), and even DNS queries are async. Sure it's only single threaded at the moment. The point is, it is certainly possible to keep threads or single-threaded processes (behind a load balancer) busy... You just have to avoid blocking on anything and everything which has in the past been difficult (e.g. RDBMS drivers block, etc.)

Codist 01/28/2010 11:04

I wrote this article almost two years ago and don't work there any more and haven't used Java since I left. Odd that suddenly the article appeared in Reddit again.