Enhyper: threading

Why Events are a Bad Idea (for high concurrency servers)

And Why Mixing Events and Threads is Even Worse

I've just been badly burned by mixing two paradigms - threading and events. All was working fine until the day before go live; the testers started to pour 400 stock baskets through the system rather than 40 which resulted in one of those issues which make your heart sink as a programmer. Between 5 and 15 stocks would go into pending which meant that there was a threading issues somewhere and it was go live that evening. Tracking it down was to prove difficult due to poor separation of duties between the threads resulting from the design having its origins as a single threaded, serial set of calls which took data from an event generated on one side and generated a modified message dispatched to the receiver and vice versa. In retrospect, the problem would have been solved by having single queue between the two threads, following the asynchronous put/take connector pattern. This would have ensured complete separation and higher throughput.

Mutex Spaghetti

As it happens, it was impossible in the time available, to redesign the solution and the simplest course of action was to go back, reluctantly, to a single threaded implementation. Time to test was a major factor here, however, not before some time was spent playing the mutex game. I started mutexing at a very low granularity which appeared to fix the issue (or break things completely) almost - I was down to one or two trades pending - which was not good enough.

Adding additional mutexes, it quickly became apparent that it's very easy to get in a mess either by blocking on an already locked mutex or by adding too many mutexes which means you end up in a mess anyway. After four hours of mutex soup - I made the decision to remediate the code back to single-threaded code. Performance, at the moment is not an issue.

So the moral of the story is look for a design pattern which fits your code - understand it and think the design through, we seldom have time, but if you can, use UML to build a sequence diagram - then you'll see the call chain and understand conflict between threads.

A friend of mine related the story of Sun's attempts to make the solaris kernel mt safe - this was a much harder task than they anticipated. Most of their effort was centred around protecting the key data structures rather and changing the programming paradigm so that users took the responsibility for data allocation.

Another pointed out an article on slashdot this morning "Is Parallel Programming Just Too Hard?" which raises some concerns which, as we can see from the above, seem to be valid. If you're going to write parallel threads, you have to spend time on the design - it takes three times the effort and you really need to use patterns, example code and sequence diagrams. I hacked it and got away with it - to a point. If performance proves an issue, and you can bet it will at some stage soon, then it will be back to the drawing board - and this time, the design will come first.

Interestingly, threading versus events is an interesting debate which this paper: "Why Events Are A Bad Idea (for high concurrency servers)" argues well and threading comprehensively wins the day as a paradigm over events. I've always been of the opinion that separation of duties via a thread is intuitively faster than event dispatch - this paper goes a way to prove my intuition right.

Finally, there's a well deserved mention for Functional Programming Languages such as Erlang and Haskell, which has much promise for multi-core programming as outlined in these excellent slides: Data Parallel for Haskell.

Enhyper

Tuesday, May 29, 2007

Followers

Blog Archive

Contributors

Del.Icio.Us

Favourites