26 May 2008

I am a fan of Programming by Contract and gcc's -Wcast-qual

Late one Friday afternoon at $DAYJOB I was nearly finished implementing a new feature for the product that I worked on. This was the culmination of several long days effort, and I was looking forward to finishing my task and going home for the weekend.

As I was hooking everything up to enable the new feature, I spotted a bug. This was a strange bug, but I felt confident that I'd find it quickly. I called my wife and let her know that I'd be a little late for dinner.

Like I said, this was a strange bug. I noticed this bug because the unit test I was cobbling together acted strangely in the corner case that I was trying to get right.

Basically, the problem came down to the fact that at some point in time in my C program's execution, I had a (char *) variable that pointed to a particular string, but, later in the program's execution, for some unknown reason the contents of the string changed. This was very unexpected, and in fact this change corrupted a larger data-structure that my code was maintaining.

So, I looked at the code that maintained the (char *) variable with all of the concentration that I could muster on a Friday afternoon. This exercise proved to be fruitless -- I soon concluded that the code that I was looking at was correct.

I called my wife and told her that I was going to be a bit later...

Since the code seemed to be correct, I decided to pull out the big guns -- I decided to run the program through the memory debugger. I guessed that there might have been a improper memory access in the code, and this was the thing that was corrupting my string.

Fire up $MEMORY_DEBUGGER. Instrument. Wait.... Wait some more.... Run.

Running $MEMORY_DEBUGGER yielded absolutely nothing: no bad memory accesses.

Now I was getting irked. I called my wife and told her that I wasn't sure when I would be home. Luckily we were having leftovers that night...

So, now I focused on the problem again:

  1. my string was getting modified strangely.
  2. the code appeared to be correct.
  3. there didn't appear to be any memory access errors in the code.

At this point, I did what I probably should have done earlier: I fired up the debugger and added a watchpoint to the contents of the string. When I re-ran my program, I had my smoking gun: the string was being changed by some library code written by somebody else at $DAYJOB.

I was still a little bit confused though, because, like I said, the code that I had looked at was technically correct. But I hadn't looked at the library code...

The library code looked like this:

void do_something(const char *s)
char *s2 = (char *)s;
s2[0] = 'a';

I am hand-waving a little bit here, because I can't remember the exact circumstances. All I really remember is that the situation was a lot more complicated than this code snippet, with many levels of indirection.

When I saw this, especially the first line of do_something(), the problem was obvious: whoever wrote this function broke a fundamental rule of C -- you are not allowed to modify the thing being pointed at by a (const T *) object.

So, I figured out who wrote this function, sent $COWORKER an email asking them to please fix their code to adhere to the rules, and then I packed up and went home. I couldn't check my change in with this bug still in the system, so I decided to wait until Monday.

You might think that the conclusion to this story might be boring, cut-and-dry, etc., but it wasn't for me.

Monday morning came. My $COWORKER who wrote the buggy function read my email and then responded via email. To sum up his email: (1) he was not going to fix this function and (2) the problem was mine because (paraphrasing) "'const' does not mean what the C standards bodies have defined it to mean; instead, 'const' means what they defined it to be at $SOME_PREVIOUS_COMPANY_THAT_HE_WORKED_AT".

I was flabbergasted at this response, so I went over to talk with $COWORKER. He amazed me with his tenacity. There was no line of argument that I could employ that would change his deep held belief here. I never really could pin down an exact definition of what "const" meant at his previous company. He did further clarify his position here by telling me that, in his experience, "most programmers are too stupid to know what the standards bodies say about 'const'". I protested that I thought that I understood pretty clearly what "const" meant, and he agreed with this, but he held fast to his "programmers are too stupid" point.

At this point I even pointed out that GCC had a "-Wcast-qual" flag that would catch errors like this.

"So what?" he responded.

"Do you think that they would add this check into the compiler if it wasn't, like, important?" I asked.

"I don't care.", he responded.

"Do you understand that I stayed late on Friday night because of this bug?" I asked.

"That's unfortunate.", he responded.

We continued this fun interplay for a few minutes, but I eventually had to give up. Clearly, he wasn't going to change his code, and I simply could not fix all of the places in the codebase that used "const" in this non-standard way.

The maddening thing for me here was that $COWORKER was a very smart engineer, and he certainly understood the concept of Programming by Contract, but he was basically asking me to enter into a contract that said "no matter what other legal mumbo-jumbo is in this document, it is OK if my code does anything whatsoever, even if it is wrong or goes against the spirit of everything else in this contract". This isn't a very useful or meaningful contract.

I eventually learned that anytime I saw the keyword "const" certain subsystems in the code that I should attribute no meaning to this keyword. Seriously, in those subsystems, "const" meant whatever the author meant on that day, and tomorrow the meaning might (and in fact did) change. I started to write my code in a very defensive manner, making copies of important data-structures and sending these copies off to these Alice-in-const-Wonderland subsystems.

I did have enough control over the system to ensure that all of my code compiled with gcc's "-Wcast-qual", and this did keep me out of a bit of trouble a couple of times. I knew that my code was bulletproof and correct.

One day after I had given up on getting $COWORKER's code to be "const correct" I was struck with an idea, so I made a modest proposal to $COWORKER: since he seemed to want to use a keyword to attribute some meaning to variables (whatever meaning "const" meant at $SOME_PREVIOUS_COMPANY_THAT_HE_WORKED_AT), I suggested that we could migrate his code away from using "const" and we could instead define a new keyword for his code with the C preprocessor. I suggested that we could do something like this:

/* This is some header file */

/* For an exact definition of this keyword, please ask $COWORKER */

...and then we could have updated the original function that caused me to stay late at work like so:

void do_something(MEANINGLESS_CONST char *s)
s[0] = 'a'; /* much more streamlined!!! */

I even offered to make all of the changes for him...

So, now our system would have been improved! We'd still have the use of the "const" keyword as the standards bodies had intended, but we'd also have MEANINGLESS_CONST too, and our code would be more streamlined and definitely a lot more understandable. We'd even be able to use "gcc -Wcast-qual" too!

Anyways, I presented this idea to $COWORKER. I give him credit for sticking to his guns -- "no" was his simple flat response to my proposal.

I wish I had some neat way to wrap up this story, but I don't. The Alice-in-const-Wonderland code continued to exist in the product until I left. This probably cost the company a bit of money in terms of bugs and lost efficiency, but that's the way things work sometimes. I just had to learn to live with this behavior in the code.

To sum things up, I'm a big fan of Programming by Contract and "gcc -Wcast-qual". I'm also a big fan of sticking to reasonable standards, and assuming that my co-workers are smart until they go out of their way prove otherwise.

1 comment:

The Beast said...

Wow, this story almost made me incoherent with rage. In fact, I was pretty well pissed just by looking at the code sample. What was the point of declaring something const in the first place?

I wish I could say I would have beat this programmer with a blunt object. My more passive-aggressive action would have been, every time I had to bulletproof my own code, to include a comment about how it's necessary in order to deal with the other programmer's buggy programming. Then wait weeks or months for someone to notice the comment.