31 March 2014

Debuggers

Recently, I enjoyed reading Ellen Ullman's book The Bug.  I picked this book up a long time ago and just hadn't gotten around to reading it until now.  Overall, I thought this book was interesting and worth the time that I put into reading it.

Here was one of my favorite sections:

Better fix that bug, Harry had said.  William Harland is acting like he never saw a bug before.  The VCs are watching.  Tomorrow he was supposed to meet with Harry about the schedule.  He could not stop and simply leave a note for himself.  Tomorrow there would be no "Here you are, Ethan."  No message saying, "Investigate pointer indirection."   No, the only note he could leave himself could be the bug report itself, with its message to the testers saying, "Fixed."

Step, step, step.

Some part of him knew that he should get away from the debugger.  He should get away from the machine, stop and think on a yellow pad, a whiteboard.  He wasn't making headway like this.  He kept beating against the same certainties--here, else here, else here.  Writing and sketching might break his thinking patterns, force him into other channels.  But there was something seductive about the debugger:  the way it answered him, tirelessly, consistently.  Such a tight loop:  Step, he said.  Line of code, it answered.  Step, line of code: step, line of code.  It was like the compulsion of playing solitaire:  simple, repetitive, working toward a goal that was sure to be attained in just one more hand, just one more, and then one again.

And so the paradox:  The more the debugger remained the tireless binary companion it was designed to be--answering, answering, answering without hesitation or effort late into the night---the more exhausted and hesitant the human, Ethan Levin, found himself to be.  He was sinking to the debugger's level.  Thinking like it.  Asking only the questions it could answer.  All the while he suffered what the debugger did not have to endure:  the pains of the body, the tingling wrists and fingers, the stiffness in the neck, the aching back, the numb legs.  And worse, the messy wet chemistry of the emotions, the waves of anxiety that washed across him, and then, without warning, the sudden electric spikes of panic.

I think that any decent programmer has a very good idea of what is going on in the character Ethan Levin's head.  Every good programmer knows what it is like to struggle with an elusive bug.

....

With regards to debuggers, I consider these to be a useful tool in my toolbox.  Some programmers who I hold in high regards really dislike them (rather famously Mr. Torvalds).  I respectfully disagree, but I think I understand some of their perspective.  I believe that my occasional use of a debugger is prudent.  For example, in my past I tracked down a stupid bug that caused me to stay at work late one night by strategically using a debugger.  My strategic use of the debugger in this case helped me find the root cause of a bug in minutes.  Without this tool I would have had to have spent another few hours tracking the problem down.

On the other hand, I do recognize that a debugger can be a seductive tool that doesn't always help with solving a problem.   A debugger can certainly get in the way of understanding what the actual problem at hand is.

All of this makes me recall a story in my past...

....

One day at {dayjob} I found myself in the following position:  the software product that our group was producing was in its end-stages of development.  There were maybe another 10 blocking bugs left to resolve before we could declare that the project was done.  It had been a long slow grind to get this project to this point.  The finish line was just becoming visible.  We HAD TO make this upcoming deadline.

Of these 10 blocking bugs, most were straightforward.  However, one of these bugs was elusive and strange and difficult to reproduce.  The bug itself reared its ugly head (on average) only once per week, and always at inopportune times.  A full system restart was the only way to clear the system of the problem.  In the office we referred to this bug simply as "System Haywire".

My role in this project was mostly to write code, but I also had a leadership role in this project too.  The official project leader was unfortunately in the hospital, and so I found myself leading the team towards the finish line.  "Okay, I can do this", I thought.

So, I managed the project, assigning engineers to work on the remaining bugs, and helping out where I could.  But, there was still the matter of "System Haywire".  This problem was frustrating because hardly anybody could reproduce it.  It is tough to fix what you can't reproduce...

One day at one of our status meetings, one of our junior engineers mentioned to me that with his slightly non-standard setup, he was frequently experiencing a weird error that caused him to have to restart.  The more he described what he saw, the more convinced I became that the problem he was experiencing was, in actuality, "System Haywire".  "Oh, is that what this is?" he asked.

After the meeting was over, I went over to this engineer's office and asked him if he could show me "System Haywire".  "Sure", he replied.  Within ten minutes, I was looking at a totally haywire system.

This is where things got strange.

I was happy that we were finally able to reproduce the problem.  This was very good news.  So, I asked my co-worker, the junior engineer, if we could restart the whole system and try it again.  "Okay..." was his response.  So, we restarted.

Now that the system was restarted, I asked my co-worker if he could enable all of the system's debug logging.  "Huh?  How do you do that?" was his reply.  I thought it was a little bit weird that, at this late stage of the project that there was still somebody working on the project who didn't know how to turn on debug logging, but I rolled with this:  I taught my co-worker how to do this.  He went through the steps impatiently.  I also asked my co-worker to enable a protocol trace for the networking protocol that was being employed here.  Again, there was more befuddlement and frustration from my co-worker.  "Please be patient" I implored, "this is going to help us find the System Haywire bug".  My co-worker replied "I don't see how".

We had spent around 15 minutes setting everything up here, with me explaining various steps.  My co-worker was becoming more and more exasperated with me.  Eventually, everything was setup, so we started up the system.  It took a few minutes of futzing around with the system in order to get the system back into the System Haywire state.  My co-worker seemed to be really annoyed with me at this point.

It happened.  System Haywire reared its ugly head.  Yipee!  I started the process of trying to collect the logfiles....

At this point, my co-worker was becoming unglued.  He did not want to help me with getting the logfiles and the protocol trace off of the machine.  I implored "look, we're nearly done here -- all I need are the logfiles.  It'll just take another five minutes".

For a minute or two, my co-worker relented.  We got one of the logfiles back to his local machine.  I asked my co-worker to "load up" the logfile just so we could take a quick look at it.  He shrieked "HOW DO YOU EXPECT ME TO DO THAT?".  I responded "well, load it into your text editor -- it's only a couple of megs big".  He replied "DO YOU MEAN MY IDE?".  "Yeah, your IDE, I guess" I replied.  Something that I couldn't quite put my finger on was confusing me here.....a programmer didn't know how to look at a logfile?  What....?

My co-worker's IDE loaded up the text file and the logfile was on the screen.  Standard logfile sorts of stuff:  [time] [subsystem] [some message].   "Good, we got the output", I said.

My co-worker exploded at me.  A full 30 seconds of vitriol and shouting:  "[expletive] [expletive] [expletive] THIS IS NOT HOW YOU [expletive] DEBUG A [expletive] PROGRAM.  YOU LOAD THE PROGRAM INTO A [expletive] DEBUGGER AND YOU START IT UP AND PUT BREAKPOINTS INTO THE [expletive] PROGRAM.  THAT'S HOW YOU [expletive] DEBUG A [expletive] PROGRAM YOU CLUELESS [expletive] TWIT" was one of the things that he screamed at me.

I tried to reason with my co-worker "Look, this doesn't seem to be the type of problem that a debugger is going to help us with.  There are no memory problems here, and the system isn't crashing -- it is going haywire.

He continued to shout at me.

And this is when the following fact hit me like a ton of bricks:   my co-worker and I came from completely different worlds.  The only way that he knew how to debug a program was from using the debugger in his IDE.   He had no capacity to debug a program in any other way -- by looking at the code, working things through on a whiteboard, looking at a logfile, analyzing a protocol trace, for example.  I was fairly stunned when I made this realization.

But....I still desperately needed those logfiles.  My co-worker was still shouting at me, so I decided to yell back at him.  I told him "LOOK, I DON'T CARE IF YOU ARE UPSET.  I WANT THOSE LOGFILES....NOW.  GIVE ME THOSE LOGFILES AND I'LL GET OUT OF YOUR HAIR.  I'M NOT LEAVING UNTIL I GET THOSE LOGFILES.  RIGHT NOW.".

So, after a few tense minutes in which I watched my angry co-worker's every single move (because that's how important these files were to me), I got the logfiles transported back to my computer for further analysis.  As I left his desk, my co-worker told me that he was going to delete his copies of "those stupid [expletive] logfiles right [expletive] now".

Wow.  I did not need this in my day.

.....

Anyways, after I got back to my desk, I spent the next six hours looking through the logfile, the protocol trace, and thinking my way through the code.  Eventually, I found the problem....well, two problems, actually.  One of the problems was a protocol error.  The other problem was an extremely subtle problem in the code.  The fact that there were two problems at hand here was one of the things that had made "System Haywire" such a beast of a problem to find and reproduce.

The "System Haywire" problem was fixed, and soon after we shipped.

....

I still consider a debugger to be a useful tool to have in one's toolbox, but I think that it is important for a software engineer to have other tools in there too.