29 March 2008

VESA driver works for me with very old Dell Latitude C800 ATI M4 32MB

I wanted to upgrade my ancient Dell Latitude C800 from Fedora Core 4 to something newer. So, I decided to try Fedora 8. During the install the video was garbled so I did a text install. After everything installed, I tried to get X running. I didn't have a lot of success. Everything was weird and still garbled. The install itself detected the ATI card so I continued to work with the ATI driver.

It was late and I still wasn't having any success so I decided to try out Ubuntu 6.x (I happend to have the CD handy). This didn't work either, giving me basically the same problem as the Fedora 8 install. So, I decided to call it a night.

In the morning, in an attempt to get something to work, I tried out the VESA driver. I discovered that this worked fine. I was even able to configure things so that I would be able to display at 1450x1050.

So, my conclusion is that the newer Xorg ATI drivers don't work very well with my ancient hardware, but the VESA drivers work fine. We're not talking about high-performance hardware here, so this is good enough for me.

27 March 2008

Tour de Cure -- American Diabetes Association -- 4-may-2008

In honor of a few people I know who suffer from diabetes, I will be riding my bike 75 miles in this year's Tour de Cure for the American Diabetes Association. I believe that this is the fourteenth straight time that I have participated in this ride.

If you would like to support me in this endeavour, please visit my TdC page.

I have ridden the century TdC ride (actually ~107 miles) many many times in the past, but, to be honest, 107 miles this early in the season is pretty difficult for me to train for. It seems prudent to do the 75 mile ride so I am doing this.

This is a good cause, and I would appreciate any support.

26 March 2008

Multiple Function Return Points Considered Harmful

I prefer to see functions written in such a manner that there is one consistent return point. I prefer this:


int f(int x)
{
int result;

if (x < 3)
result = 1;
else
result = 0;

return result;
}


...over this:


int f(int x)
{
if (x < 3)
return 1;

return 0;
}


This is a religious issue (to some extent). Clearly, a good optimizer is going to render the same code in either case, so the latter code isn't faster -- it is just shorter. I prefer my way because it goes along with my conservative coding style -- I prefer to make the code so simple that I can even understand it when I am tired.

By the way, I'm not so religious about this matter that I always follow my own advice. Every rule has exceptions.

However, I can think of one case in which I believe that my methodology (having a consistent return point) is a clear winner. I will illustrate this with a story.

....

One day at $DAYJOB, my $MANAGER informed me that I'd been assigned to work on a new project. I'd been assigned to add a $FEATURE to a $BIG_SUBSYSTEM in the product. I didn't know anything about this subsystem. $MANAGER told me that this was fine -- $COWORKER was the expert on this $BIG_SUBSYSTEM, I could use $COWORKER as a resource.

I'd never even heard of $COWORKER at this point. After exchanging a couple of terse emails with $COWORKER I figured out that $COWORKER worked really strange hours and actually worked in a cubicle a couple of rows away from me.

Later that day I managed to find $COWORKER in his cubicle, so I stopped by and tried to introduce myself:

Hi, I'm Kevin. I've been assigned by $MANAGER to work on $FEATURE in the $BIG_SUBSYSTEM. $MANAGER tells me that I can use you as a resource for this project. I'm trying to come up to speed with $BIG_SUBSYSTEM ; I've started reading the available documentation, but could you perhaps give me an overview of $BIG_SUBSYSTEM so that I have a better idea of what is going on in this subsystem.

$COWORKER looked at me with disdain and said "I'm sure that you'll figure it out.". Then $COWORKER put his headphones back on and turned back to his computer.

$COWORKER also managed to turn down my offer of a friendly handshake.

"Great...", I thought, "...$COWORKER is an unhelpful jerk". I knew what this meant too, because complaining to $MANAGER wasn't going to help me one single bit. I would have to familiarize myself with $BIG_SUBSYSTEM and implement $FEATURE on my own.

Weeks passed. I worked my ass off to come up to speed on $BIG_SUBSYSTEM. I barely had any interactions at all with $COWORKER. $MANAGER asked if $COWORKER was being helpful and I honestly answered "no, he seems to be working on his own work". "Oh, he must be busy" was the response.

Eventually, I was done. I tested my code and even showed it to $COWORKER. He made a bunch of comments that ranged from being somewhat helpful (things that I wished I had known before I started the project) to comments that just reflected his opinions about the code.

So, I tested one more time and checked my changes in.

There are four things that you need to know about my $DAYJOB before I continue: the project written in C, it was very large, the project was heavily multithreaded and the project made use of a huge number of branches in the source control system.

Soon after I checked my changes in $MANAGER told me that my changes were needed in a different branch in the source control system. So, I performed the arduous process of merging my changes into the new branch.

Let's just say that I performed this merge process into N different branches...

Each merge really was arduous. Due to the way that the software organization used the source control subsystem at $DAYJOB, it was very difficult to use the tools that came with the source control system to perform the merge. I soon concluded that the best way for me to do merges was via patch, ediff/Emacs, and a huge amount of attention to detail. Each merge took well over a day of solid effort, sometimes quite a bit more.

I repeated this merge several times, as needed. While I was doing all of this work, I had lots of time to think to myself "how could this be made better?" But more on that some other time...

Anyways, one day somebody in SQA contacted me and told me that he wasn't sure what was going wrong, but one of our internal software releases was dying strangely, and since I was the last person who made a major modification to the branch, he thought I might have something to do with this.

This was the start of the badness. As soon as the problem was found, $MANAGER was informed of the problem, and now $MANAGER was pretty insistent that I find the problem...and quickly. $COWORKER even came by and reminded me of the importance of finding the problem quickly. This was the most contact I had had with $COWORKER in months! Great...

So, under a bit of pressure, I started looking for the bug. $COWORKER started looking for the bug too, independently (of course). I felt some pressure to try to find the bug before $COWORKER, because technically the bug was probably mine.

Many hours passed. I had a difficult time trying to find this bug because I couldn't figure out what was different about my work on this branch versus all of the other branches that I had worked on. My work on all of those other branches worked fine and had passed SQA testing. This was a tough bug to isolate...

Eventually, $COWORKER found the bug. I was pretty unhappy. The bug was in my changes for $FEATURE, and $COWORKER was more than happy to rub my nose in the problem. The problem was a thread synchronization problem. My coworker pointed at the code and then at me and said "You need to pay more attention to details when you write code". My head was swimming. I ruefully noted that this was the most interaction with $COWORKER that I had ever had and that it had gone very badly. My code had a bug. I was responsible for the problem. Where did I go wrong?

...

I've been in the software engineering business a long time. Sometimes I make mistakes. Really, I try to learn from everything that I do. So, after the bugfix got checked in and everybody stopped freaking out, I tried to track down what went wrong.

A little while later, I figured out what went wrong.

When I originally implemented $FEATURE, I had to integrate my changes into the already large codebase. When I made my original implementation, my code changes mimicked the style that I found in the rest of the codebase -- of course.

Here is where we get back to the subject at hand: having a consistent place in each function where a function returns.

The codebase at $DAYJOB didn't adhere to my preferred pattern here. And, like I mentioned, it was heavily multithreaded (and this implies that it used a lot of synchronization primitives). So, part of my changes modified a function that looked like this:


void f(int x)
{
LOCK(&mutex);

switch (x) {

case FOO:
if (someFunc() == ERROR) {
UNLOCK(&mutex);
return;
}
/* do something else */
}
break;


/* HERE IS MY CODE */
case BAR:
if (someFunc() == ERROR) {
UNLOCK(&mutex);
return;
}
/* do something else */
}
break;

/* ...100 more cases... */


}

UNLOCK(&mutex);
}


But, when I merged my code into the new branch with patch, the patch miraculously managed to apply cleanly on this file, and the resulting/buggy file looked like this:

void f(int x)
{
LOCK(&mutex);
LOCK(&some_other_mutex);

switch (x) {

case FOO:
if (someFunc() == ERROR) {
UNLOCK(&some_other_mutex);
UNLOCK(&mutex);
return;
}
/* do something else */
}
break;


/* HERE IS MY CODE */
case BAR:
if (someFunc() == ERROR) {
UNLOCK(&mutex);
return;
}
/* do something else */
}
break;

/* ...100 more cases... */


}

UNLOCK(&some_other_mutex);
UNLOCK(&mutex);
}

See the problem? In this new branch of code, which I had never worked with before, somebody had introduced a new mutex (some_other_mutex), and my code wasn't cleaning up properly.

This is where I want to make my point: this code could have been written a lot more cleanly, like this:

void f(int x)
{
LOCK(&mutex);
LOCK(&some_other_mutex);

switch (x) {

case FOO:
if (someFunc() == ERROR) {
....
}
else {
....
}
}
break;


/* HERE IS MY CODE */
case BAR:
if (someFunc() == ERROR) {
....
}
else {
....
}
break;

/* ...100 more cases... */


}

UNLOCK(&some_other_mutex);
UNLOCK(&mutex);
}

...and, if it were, not only would my changes have applied cleanly with patch, but we would have avoided a long, gory, multi-hour, "let's find the bug" session.

I never did get to tell $COWORKER about what the true root cause of the problem was here. As you can see, this is a long story, and $COWORKER never had any patience for me. In fact, $COWORKER probably thinks I am a moron to this day. But I was glad to figure out what the root cause of the problem was, and this is one of the reasons why I really try to write functions in such a way that there is a single consistent return point -- especially in multithreaded code that utilizes synchronization primitives.

Postscript: several months after this bug reared its ugly head, I found two places in $COWORKER's code that suffered from the same bug. I was tempted to treat $COWORKER as rudely as he had treated me, but in the end I just decided to be polite about it.

24 March 2008

Interesting article on TCP congestion control

Interesting article on the subject of fixing TCP's congestion control algorithm:

http://blogs.zdnet.com/Ou/?p=1078

A lot of reasonable things are presented here, at a very high level. The graphs are especially interesting too. From what I have read of SCTP, this very nice protocol also has the potential to suffer from this problem.

Interesting thought question: what do you suppose an ISP is going to do when a large number of its customers begin hammering on their network with RTP traffic?

....

On the other hand, here is something that is very funny and very unreasonable:



This was originally here: http://i7.tinypic.com/5z6vt4n.jpg

20 March 2008

The VLAN Koan

I wrote the following parody/koan one day to help a tech-writer friend of mine understand VLANS. He liked it. I hope you do too.

Goso said: ``When a packet goes out of its egress to the edge of the abyss, its header and its payload all pass through, but why can't the VLAN tag also pass?''

Kevin's comment: If anyone can open one eye at this point and say a word of layer-2 bridging, he or she is qualified to discuss the intricacies of ``one-armed routers'', and, not only that, he or she can save all sentient beings under them from broadcast storms caused by Windoze boxen. But if he or she cannot say such a word of true IEEE 802.1Q VLANs, he or she should turn back to their tag.

If the packet is transmitted, it will fall into the ether;
If it remains in the queue, it will exceed its TTL,
That little VLAN tag
Is a very strange thing.


(apologies to Mumon... (-: )

13 March 2008

A conversation with Eliza

M-x doctor RET

I am the psychotherapist. Please, describe your problems. Each time you are finished talking, type RET twice.

I am upset. Joseph Weizenbaum has passed away.

Weizenbaum? Hah! I would appreciate it if you would continue.

I think that the curse of being middle-aged is that you see all of the
greats that preceded you eventually pass away.

Is it because the curse of being middle aged is that I see all of the greats that preceded me eventually pass away that you came to me?

Yes. I mean, how could I argue with that bit of insight?

Does it bother you that could you argue with that bit of insight?

Nope.

Can you elaborate on that?

Yow!

Yow! Are we interactive yet?

...

12 March 2008

09 March 2008

Ack is my favorite new tool

Ack is my favorite new tool. I highly recommend it. It is even written in my favorite programming language -- you know, the one with the really supercharged and usable regexp library.

Over the years, I have written some handy tools in this area....like for example, my "txtfind" shell alias:

 
   # put this in your .bashrc/.kshrc/etc.
  txtfind () {
    if [ $# -eq 0 ] ; then
       txtfind .
    else
       perl -MFile::Find -e 'find(sub{ print "$File::Find::name\n" if (-f && -T); }, @ARGV);' "${@}"
    fi
  }


This alias takes an (optional) list of directory names and searches underneath them -- when a "text" file is found, the filename is printed. To a shell hacker like me, it is very common to type things like:
txtfind /etc | xargs grep 192.168.9.37
...if, for example, I wanted to figure out why a machine was configured to use some strange IP address. This beats typing something like:
find /etc -type f -print | xargs grep 192.168.9.37
...because, of course, this will likely cause grep to troll through some binary files. In the end, this might hose your terminal.

I even have shell aliases like "binfind" and "dostxtfind". I have another alias called txtfind0 that allows me to use it in this manner:
txtfind0 /usr | xargs -0 grep foo
...but with this new tool, ack, I'll be able to eliminate a lot of hassle by simply typing something like:
ack --type=text foo /usr
That's not to say that things like my txtfind0 alias are now obsolete. Let's say, for example, I wanted to find a file that contained both the phrases "weapons of" and "mass destruction" -- in this case it would be as easy as:
txtfind0 /usr/secret | \
xargs -0 perl -l -0777 -ne 'print $ARGV
if (/weapons of/i && /mass destruction/i)'
But, just the same, I am enthusiastic to have a great new tool in my arsenal. Ack has a bunch of other features that I haven't really touched on here (my favorite is its intelligent ability to skip files in .svn directories) -- I encourage everybody to check ack out.


06 March 2008

Development Tip: Multiple Build Areas

Here is a code development tip that I nearly always employ in any workplace. I have employed this strategy for years, and several of my colleagues have told me "wow! that's a really good idea!" so I thought that share this.

I always setup multiple build areas to go along with the source control system that I am using. At the very least, I always have a "-work" directory (where I work on my current task), but (and here is the important bit) I always have a second "-clean" build area. I never modify any files in the "-clean" area! Ever. The only thing that I ever do wih the -clean area is (0) update this build area from source control, (1) run a build in this area and (2) run a regression test on this build area. Again, I never modify any files in this directory.

Having a "-clean" directory is terribly useful. For example, if I am making a big change that modifies ten files and my changes also depend on the addition of two files to the build tree, when I am done with my work, I will checkin my changes (under my "-work" area), and then I will immediately update my "-clean" area to run a build and a regression test. If I somehow forgot to add those two source files to the source tree, the build will fail -- but I will immediately notice this. It is much better for me to notice this immediately rather than my co-workers.

If you are a professional software engineer, the problem that I have just cited here has probably plagued you, what? -- several dozen or hundred times in your career? Yes? How much of your time has been wasted due to this problem? If only everybody employed this technique.

Like I mentioned, I always setup multiple build areas. In fact, I usually have at least a half dozen build areas going at the same time. I usually have a "-clean" area going for every source code branch that I work with, and I usually have a build area going for every task that I work on as well. This latter use of build areas seems to be particularly useful, because I have had colleagues who were dead-set on creating a new source control branch in the codebase to do their work tell me, after I have explained my multiple-build-areas methodology to them, that this trick saved them a lot of grief. Let's not forget, every time your organization creates a new branch, this costs your organization time and money. Sometimes you need a new branch, but many times you do not. This trick costs a modest amount of disk space, and disk space is cheap. Branches are never cheap.

I have used this trick wih dynamic views under ClearCase, static views under ClearCase, and directories under Subversion too. This trick can be used anywhere.


Update:  yeah, yeah, yeah, I realize that folks who use DVCS systems will probably look at this post quizzically.  Let me issue the following reminder:  not all shops use DVCS.

04 March 2008

A DIALOGUE WITH SARAH, AGED 3: IN WHICH IT IS SHOWN THAT IF YOUR DAD IS A CHEMISTRY PROFESSOR, ASKING “WHY” CAN BE DANGEROUS

Can you say ‘hydrophilic’?Link

Recursive Make Considered Harmful

One of the happiest days of my life was when I typed "make print" and I watched make invoke LaTeX and then my masters thesis started spewing out of my laser printer. Make is a dependable tool that, by definition, knows how to handle dependencies and is flexible enough to handle complex tasks (Towers of Hanoi, anybody?).

People who are fans of make like myself will probably like Recursive Make Considered Harmful.

03 March 2008

Debugging War Story 2

At one of the projects I worked on in the past I got to work on some protocol design and implementation. This was actually one of my favorite projects ever; it was a project where I had a lot of responsibility, I got to work with a lot of interesting people on some hard problems, and I got to work on a project that allowed me to be creative and technical at the same time.

Anyways, I was in charge of the protocol implementation. I am a very careful and a very conservative programmer, and a lot of the protocol implementation was coded in the style that I prefer.

The protocol itself was binary (out of necessity). One day as we continued the design of the protocol we concluded that we needed to add a 64-bit integer to one of the message fields. After agreeing on the design, I updated one of the structs that I had created to store the message fields:

#if defined(SOME_COMPILER) || defined(SOME_OTHER_COMPILER)
#pragma pack (1)
#endif
struct SomeMsgStruct {
uint32_t field1;
uint32_t field2;
....
uint64_t fieldN; /* NEW FIELD */
}
#if defined(GNUC)
__attribute__ ((__packed__))
#endif
;
#if defined(SOME_COMPILER) || defined(SOME_OTHER_COMPILER)
#pragma pack (0)
#endif


The key point you need to understand here is that I wanted to make sure this struct was packed and there was no padding in the struct (as the C standard allows). You need to understand that I was supporting several different compilers and target architectures, some of which I did not have easy access to.

So, in my conservative programming style, I also added the following code to one of the protocol's initialization functions:


SomeMsgStruct x;
assert(sizeof(x) == (sizeof(x.field1) + sizeof(x.field2) + sizeof(x.fieldN)));


After testing my code, I checked it in, announced my changes, and then moved on to my next tasks.

....

Several days later, one of my colleagues, who, shall we say, was not a detail-oriented engineer, complained to me that something was wrong with the protocol stack and that there was garbage being transmitted on the wire. I knew this problem had "wild goose chase" written all over it, but I didn't have a magic wand to fix this problem. So, at around 2pm, we sat down to debug the problem.

We analyzed logfiles. We looked at protocol traces. I tried to reproduce the problem on my setup, but the problem only reared its ugly head on my colleague's setup. My colleague's setup included a different target processor than what I had in my setup, so I was quickly forced to try to understand this problem on my colleague's foreign setup (which seemed to include a very tedious compile/link/load-the-binary-onto-the-target phase).

Eventually, in one of the logfiles, I noticed that something seemed to be wrong with "struct SomeMsgStruct". Ah! So I had my colleague add this to the code:

printf("sizeof SomeMsgStruct: %d\n", sizeof(SomeMsgStruct));

And, sure enough, the output was not what I was expecting.

Now I was really confused. At around 7:30pm, I wondered aloud to my colleague "How could this possibly be happening?! How could the compiler be doing this? I even put an assert() in the code to make sure that everything was right!".

At this point my co-worker blurted out:

assert()? Oh, what's that? When I first started working with your new code this morning, I kept on getting this `assertion failed' message. But I just wanted to get the code going, so I commented out that pesky line of code.

After this revelation, I excused myself from the noisy lab where I had just spent the entire afternoon, went outside to the parking lot, and had a good scream. After I composed myself, I went back inside and re-wrote the code so that the structure would be packed correctly on my colleague's target architecture. I also told my colleague, in no uncertain terms, to never modify my code again and to absolutely positively never ever remove an assertion from the code again unless he knew what he was doing and was prepared to deal with the consequences.

I am saddened to inform you, kind reader, that this wasn't the last time in my life that a colleague removed an assertion from my code. I guess that I am better able to deal with this now, but I am never less surprised when I see this.

02 March 2008

Tom and Atticus

I am a fan of Tom and Atticus. Maybe you would like them too.

01 March 2008

Gear Review: Rudy Project Horus Cycling Eyeglasses

My old cycling glasses (a cheapo pair of perscription sunglasses) died after a decade of abuse. I have been doing more and more cycling lately, including some interesting night rides. Because I have had several incidents over the years in which things have pinged off of my eyeglasses as I have been riding, and because ${employer} was chipping in in terms of employee benefits, I decided to buy something that would last a long time.

The thing that complicated this purchase is that I wear perscription lenses. I do not wear contacts, and I am not interested in Lasik. In fact, I like wearing glasses.

Again, my constraints were: (1) must be bombproof, (2) must be perscription-friendly, and (3) would be really nice if I could use these in wildly varying light conditions (bright sun to glare/fog to complete darkness).

It turns out that a product that satisfies all of these constraints is difficult to find. I definitely couldn't find anything locally.

Eventually, I ended up at a website called www DOT bicyclerx DOT com . I talked to a sales guy there on the phone and eventually I ordered a set of Rudy Project Horus cycling eyeglasses. I ordered these with two sets of detachable lenses, one clear and one tinted. The tinted lenses are polycarbonate, which is an upgrade. How much? Are you sitting down? $342. This is waaaay more than I wanted to spend, but again, my ${employer} paid for most of this. I also rationalized this by thinking that these would last a long time.

So, what is my review? These glasses are nice, but certainly not $342 nice. These glasses have one almost fatal flaw -- the lenses sometimes pop out.

I sometimes ride in cold weather, and when I do I wear a hat. There is something about the added resistance of wearing a hat on your head that occasionally doesn't interact well with the motion of putting these glasses on your head. I have been in the following situation twice: in complete darkness, with my bicycle, looking for one of my clear lenses with only the light from my bicycle light to help me. This sucks. In both cases I was able to find my lens, but this was a bad situation to be in.

I emailed Bicyclerx and Rudy Project telling them of my experiences. The salesman from Bicyclerx offered to swap frames in hopes that this would help, but I declined -- I'm just certain that this wouldn't help. I even suggested in my email how they might make the lens mounting mechanism more reliable but this generated no response. Whoever answers email at info@rudyproject.com didn't think that my email deserved a reply.

I am happy that I have my new glasses. Eye protection is very important on the bike. I just don't think that
these glasses are worth $342, not with the flaw that I have mentioned.

I have probably made these glasses quite a bit more reliable with the following trivial modification: I placed a tiny strip of clear package sealing tape on the bottom edge of the glasses. I haven't had any problems with these glasses since I made this modification.

Oh well. That's my review.

Debugging War Story

One day, one of my colleagues updated me about one of the problems that he was trying to fix in one of the older products that he maintained.

My colleague informed me:

We can't fix the problem because we can't even produce a build with the current codebase that doesn't crash instantly on the board. Something in the code changed. I've tracked it down; it is a compiler bug. We'll have to call the compiler vendor.

You have to realize, this problem report pushed so many wrong buttons in my engineer's brain that my head immediately started to hurt. I got a cup of coffee and prepared for battle.

When I got back to my colleague's desk I asked "when was the last time you produced a working build for this product?". Let's just say that the answer was "a lot longer than 6 months and many code re-orgs ago".

Great. The pain in my head became a dull throb.

I thought for a moment and then asked "You mentioned that you had tracked this down to being a compiler bug -- how do you know this?"

A few minutes later my colleague was showing me the assembly language output generated by the compiler. I was a bit out of my element here; I was not familiar with the target processor or its assembly language.

"So, what exactly is the bug here?" I asked. My co-worker explained to me that the compiler was dealing with some code that was working with uint32_t values, but in one particular case it just decided to deal with a uint32_t value using the processor's 16-bit instructions. So, a value in a register was getting "shaved", and this was the root cause of the fatal error the product was experiencing.

Again, I was not familiar with the target processor, but I did manage to look through a reference book on my colleague's desk and I did verify that, sure enough, the assembly language output was using 16-bit instructions in a sea of other code that treated the value properly as a 32-bit value.

At this point I learned a little bit more about the compiler. It wasn't GCC -- this compiler was provided by the chip vendor. The whole compiler seemed to be tightly integrated to the vendor's IDE, some win32 app that seemed a little flaky at best. I'd never used this compiler before in my life.

At this point I had two conflicting thoughts going on in my brain: (1) my co-worker was telling me that there was a compiler bug and (2) I haven't seen an actual compiler bug in a C compiler in over a decade, especially for code as simple as this.

So, I decided to look at the C code in question a little more carefully. It turned out that the problem description of "the compiler is generating code that uses 16-bit instructions to work with 32-bit values" was a bit of an oversimplification; rather, the problem could more accurately be described as "the compiler was emitting 16-bit instructions to move a 32-bit return value (returned from a function call) off of the stack". Let's call the function in question foo().

Oh. I was starting to get a hunch about the problem.

"Is there a prototype for this function that returns a uint32_t?" I asked my colleague. "Yes" was his response. Sure enough, he showed me the prototype in a header file. Damn, this was a minor setback to my hunch. It looked like this in the code, of course:
extern uint32_t foo(uint32_t some_param);
So, at this point I directed my colleague to utilize one of my favorite debugging techniques -- I asked him to run the compiler on the source file in question, but to only run the C preprocessor on the file. This is usually as simple as invoking the compiler like "cc -E" or "gcc -E". After a few minutes of futzing around with the win32 IDE that controlled the compiler, we were eventually able to generate the preprocessed output, all dumped to a file.

As soon as we generated the file, I had my smoking gun.

We imported the file into a text editor and I immediately asked my colleague to look for "foo" in the file. Sure enough, the first occurrence of this string in the file was at the place where this function was invoked. Let me be really clear here: yes, there was a prototype for this function, and this existed in some header file, but in the .c file that we were looking at this file was never #included!

I asked my colleague one more question, but I knew what the response would be before I even asked:

"What size are ints on this processor?"

"16 bits." was his response.

I started doing a little jig in his office....problem solved!

There was no compiler bug. The problem was that the compiler was being asked to generate some code to invoke a function called foo() but it had never heard of that function before. But this is C, and this is legal. So, the compiler generated the code to pop the return value off of the stack using the default that C uses -- int -- and on this particular target, ints were 16-bits wide.

What are the lessons from all of this? I would humbly suggest that there are three:

1: Quality code is built in an environment in which compiler warnings are copiously enabled and paid attention to.

2: If you have a product and you're not building and testing the build output frequently, you're doing something wrong.

3: Occasionally, it is handy to have an engineer who can debug issues like these on staff...