26 March 2008

Multiple Function Return Points Considered Harmful

I prefer to see functions written in such a manner that there is one consistent return point. I prefer this:


int f(int x)
{
int result;

if (x < 3)
result = 1;
else
result = 0;

return result;
}


...over this:


int f(int x)
{
if (x < 3)
return 1;

return 0;
}


This is a religious issue (to some extent). Clearly, a good optimizer is going to render the same code in either case, so the latter code isn't faster -- it is just shorter. I prefer my way because it goes along with my conservative coding style -- I prefer to make the code so simple that I can even understand it when I am tired.

By the way, I'm not so religious about this matter that I always follow my own advice. Every rule has exceptions.

However, I can think of one case in which I believe that my methodology (having a consistent return point) is a clear winner. I will illustrate this with a story.

....

One day at $DAYJOB, my $MANAGER informed me that I'd been assigned to work on a new project. I'd been assigned to add a $FEATURE to a $BIG_SUBSYSTEM in the product. I didn't know anything about this subsystem. $MANAGER told me that this was fine -- $COWORKER was the expert on this $BIG_SUBSYSTEM, I could use $COWORKER as a resource.

I'd never even heard of $COWORKER at this point. After exchanging a couple of terse emails with $COWORKER I figured out that $COWORKER worked really strange hours and actually worked in a cubicle a couple of rows away from me.

Later that day I managed to find $COWORKER in his cubicle, so I stopped by and tried to introduce myself:

Hi, I'm Kevin. I've been assigned by $MANAGER to work on $FEATURE in the $BIG_SUBSYSTEM. $MANAGER tells me that I can use you as a resource for this project. I'm trying to come up to speed with $BIG_SUBSYSTEM ; I've started reading the available documentation, but could you perhaps give me an overview of $BIG_SUBSYSTEM so that I have a better idea of what is going on in this subsystem.

$COWORKER looked at me with disdain and said "I'm sure that you'll figure it out.". Then $COWORKER put his headphones back on and turned back to his computer.

$COWORKER also managed to turn down my offer of a friendly handshake.

"Great...", I thought, "...$COWORKER is an unhelpful jerk". I knew what this meant too, because complaining to $MANAGER wasn't going to help me one single bit. I would have to familiarize myself with $BIG_SUBSYSTEM and implement $FEATURE on my own.

Weeks passed. I worked my ass off to come up to speed on $BIG_SUBSYSTEM. I barely had any interactions at all with $COWORKER. $MANAGER asked if $COWORKER was being helpful and I honestly answered "no, he seems to be working on his own work". "Oh, he must be busy" was the response.

Eventually, I was done. I tested my code and even showed it to $COWORKER. He made a bunch of comments that ranged from being somewhat helpful (things that I wished I had known before I started the project) to comments that just reflected his opinions about the code.

So, I tested one more time and checked my changes in.

There are four things that you need to know about my $DAYJOB before I continue: the project written in C, it was very large, the project was heavily multithreaded and the project made use of a huge number of branches in the source control system.

Soon after I checked my changes in $MANAGER told me that my changes were needed in a different branch in the source control system. So, I performed the arduous process of merging my changes into the new branch.

Let's just say that I performed this merge process into N different branches...

Each merge really was arduous. Due to the way that the software organization used the source control subsystem at $DAYJOB, it was very difficult to use the tools that came with the source control system to perform the merge. I soon concluded that the best way for me to do merges was via patch, ediff/Emacs, and a huge amount of attention to detail. Each merge took well over a day of solid effort, sometimes quite a bit more.

I repeated this merge several times, as needed. While I was doing all of this work, I had lots of time to think to myself "how could this be made better?" But more on that some other time...

Anyways, one day somebody in SQA contacted me and told me that he wasn't sure what was going wrong, but one of our internal software releases was dying strangely, and since I was the last person who made a major modification to the branch, he thought I might have something to do with this.

This was the start of the badness. As soon as the problem was found, $MANAGER was informed of the problem, and now $MANAGER was pretty insistent that I find the problem...and quickly. $COWORKER even came by and reminded me of the importance of finding the problem quickly. This was the most contact I had had with $COWORKER in months! Great...

So, under a bit of pressure, I started looking for the bug. $COWORKER started looking for the bug too, independently (of course). I felt some pressure to try to find the bug before $COWORKER, because technically the bug was probably mine.

Many hours passed. I had a difficult time trying to find this bug because I couldn't figure out what was different about my work on this branch versus all of the other branches that I had worked on. My work on all of those other branches worked fine and had passed SQA testing. This was a tough bug to isolate...

Eventually, $COWORKER found the bug. I was pretty unhappy. The bug was in my changes for $FEATURE, and $COWORKER was more than happy to rub my nose in the problem. The problem was a thread synchronization problem. My coworker pointed at the code and then at me and said "You need to pay more attention to details when you write code". My head was swimming. I ruefully noted that this was the most interaction with $COWORKER that I had ever had and that it had gone very badly. My code had a bug. I was responsible for the problem. Where did I go wrong?

...

I've been in the software engineering business a long time. Sometimes I make mistakes. Really, I try to learn from everything that I do. So, after the bugfix got checked in and everybody stopped freaking out, I tried to track down what went wrong.

A little while later, I figured out what went wrong.

When I originally implemented $FEATURE, I had to integrate my changes into the already large codebase. When I made my original implementation, my code changes mimicked the style that I found in the rest of the codebase -- of course.

Here is where we get back to the subject at hand: having a consistent place in each function where a function returns.

The codebase at $DAYJOB didn't adhere to my preferred pattern here. And, like I mentioned, it was heavily multithreaded (and this implies that it used a lot of synchronization primitives). So, part of my changes modified a function that looked like this:


void f(int x)
{
LOCK(&mutex);

switch (x) {

case FOO:
if (someFunc() == ERROR) {
UNLOCK(&mutex);
return;
}
/* do something else */
}
break;


/* HERE IS MY CODE */
case BAR:
if (someFunc() == ERROR) {
UNLOCK(&mutex);
return;
}
/* do something else */
}
break;

/* ...100 more cases... */


}

UNLOCK(&mutex);
}


But, when I merged my code into the new branch with patch, the patch miraculously managed to apply cleanly on this file, and the resulting/buggy file looked like this:

void f(int x)
{
LOCK(&mutex);
LOCK(&some_other_mutex);

switch (x) {

case FOO:
if (someFunc() == ERROR) {
UNLOCK(&some_other_mutex);
UNLOCK(&mutex);
return;
}
/* do something else */
}
break;


/* HERE IS MY CODE */
case BAR:
if (someFunc() == ERROR) {
UNLOCK(&mutex);
return;
}
/* do something else */
}
break;

/* ...100 more cases... */


}

UNLOCK(&some_other_mutex);
UNLOCK(&mutex);
}

See the problem? In this new branch of code, which I had never worked with before, somebody had introduced a new mutex (some_other_mutex), and my code wasn't cleaning up properly.

This is where I want to make my point: this code could have been written a lot more cleanly, like this:

void f(int x)
{
LOCK(&mutex);
LOCK(&some_other_mutex);

switch (x) {

case FOO:
if (someFunc() == ERROR) {
....
}
else {
....
}
}
break;


/* HERE IS MY CODE */
case BAR:
if (someFunc() == ERROR) {
....
}
else {
....
}
break;

/* ...100 more cases... */


}

UNLOCK(&some_other_mutex);
UNLOCK(&mutex);
}

...and, if it were, not only would my changes have applied cleanly with patch, but we would have avoided a long, gory, multi-hour, "let's find the bug" session.

I never did get to tell $COWORKER about what the true root cause of the problem was here. As you can see, this is a long story, and $COWORKER never had any patience for me. In fact, $COWORKER probably thinks I am a moron to this day. But I was glad to figure out what the root cause of the problem was, and this is one of the reasons why I really try to write functions in such a way that there is a single consistent return point -- especially in multithreaded code that utilizes synchronization primitives.

Postscript: several months after this bug reared its ugly head, I found two places in $COWORKER's code that suffered from the same bug. I was tempted to treat $COWORKER as rudely as he had treated me, but in the end I just decided to be polite about it.