06 April 2009

What is so Hard About Parallel Programming? Lots!

Over at JavaLobby, Clay Breshears asks "What is so Hard About Parallel Programming?".

Hmmm. Interesting. I've spent quite a bit of time thinking about this question too. I happened to write my masters thesis on the subject of parallel computation.

I think that Mr. Breshears is obviously a very smart individual, but I disagree with his assertion that:
[...] learning parallel programming is no harder than learning a second programming language.
Let me be clear: I am not violently disagreeing with Mr. Breshears, but let me put it this way: I would suggest that for a reasonably intelligent computer science student, learning a second programming language, on a scale of 1-10, is around a "3" (in terms of difficulty). Now, for this same student to learn parallel programming, I would estimate that, on a scale of 1-10, the amount of difficulty would be a "7".

So, I disagree with Mr. Breshears.

What is my rationale for my disagreement? In my experience, it is often hard to write a serial program to solve a particular problem. I mean, this should be uncontroversial. However, I think that moving from a serial program that solves a given problem to a correct parallel program that solves the same problem requires a paradigm shift. The central part of this transformation does not involve more rote learning of some new programming language's syntax and APIs -- this process requires a completely different way of thinking. It is one thing to hazily understand that an execution unit on a given machine is going to be executing a given routine; it is another thing entirely to be able to grok that multiple execution units on a given machine might be executing the same routine all at the same time.

There is one other thing that this transformation requires as well: a fantastic attention to detail. Coming up with a correct/efficient parallel program that solves a problem means that the programmer has to completely understand the problem at hand (especially the data dependencies). The programmer also has to know the semantics of the parallel programming system that they are using like the back of their hand. This last bit cannot be stressed enough: the number of details involved here can be mind-boggling.

How difficult can it be to come up with a parallel implementation of a serial program? Well, let me answer this question somewhat indirectly. One of the most seductively popular ways to write a parallel program is to use threads and synchronization mechanisms. In fact, threads and synchronization mechanisms aren't just used for parallel programming -- they're also used quite a bit in concurrent systems programming. This is where I spend quite a bit of my time as a software engineer, actually. Over the years, I have seen quite a few problems in real products that have as their root-cause something to do with threads and synchronization primitives. These problems have all been inadvertently created by engineers with at least a few years of industry experience (sometimes much more...). If engineers with this level of experience can get confused and make mistakes in such an environment, what chance does the average college sophomore (as Mr. Breshears suggests) have to use similar technology and create parallel programs? I would humbly suggest that the answer to this question is "not much".

1 comment:

Brian St. Pierre said...

I violently agree, for many of the same reasons. I have also fixed a lot of multi-threading related bugs. I've caused some of these even when I *thought* I had a very good understanding of all of the pieces that were in motion -- and that was just on a single execution unit.

Throw a second cpu into the mix, memory races, and components where the requirements, implementation and interface is fluid and it really is a hard problem.

But the original poster is probably right too: on a sophomore-level project of maybe 1KLOC with a small scope and well defined artificial requirements, parallel programming is easy!