kdc-blog: development

Showing posts with label development. Show all posts

22 June 2010

How is your software project trending?

How is your software project trending? Are you going to make your deadline with time to spare? Is your project going to spend all of its time in development, leaving not enough time for testing? Is your project bogged down?

Can you understand how things are trending in ten seconds or less? Can other people in your organization understand these trends as well?

Do you employ any metrics to track this trend?

...

At the point I am at in my career, in addition to doing technical things, I also manage projects. When I first started managing projects (several jobs ago) the thing that I really wanted was a way to understand, in ten-seconds or less, was how the project was trending. I am somebody who believes that some metrics are very useful, and this is a metric that I really wanted.

I knew that the information that I wanted to understand was ultimately stored in the bug-tracking system that I was using, but I simply could not find any pre-packaged solution that gave me what I wanted.

So, I hacked together my own solution.

My solution basically consists of a "cron" job that queries the bug-tracking system for its data. I run this cronjob every morning at 12:05am.

What I do is as follows: I organize projects by milestones, and I track bugs by the milestone that they need to be addressed by. How all of this works at a given jobsite varies from company to company. But the essential point is that all of the bugs are organized by clear milestones.

Then, every morning at 12:05am, my cronjob runs, querying the bug-tracking system for its information. Generally, I think that looking at the previous 45 days worth of data provides a reasonable view of how a project is going, so my scripts default to using a 45-day window.

And then....my scripts produce graphs...one per milestone that I am interested in. One of the first things I do every morning when I get into work is to look at these graphs. I also publish these graphs so that everybody in the organization can see and understand these. I am, after all, a big believer in transparency.

Having this system run automatically at night is a win for me, because it gives me more time to do actual technical things during the day. I simply do not have time to futz with graphs every day.

I'll show you some examples of these graphs, but before I do, I have to issue the following disclaimer: ALL OF THE DATA THAT WAS USED TO GENERATE THESE GRAPHS IS FAKE.

Seriously, the data is fake, OK? Trust me.

So, here is an example:

So, what we have here is a project in which ~100 bug tickets were written. Some of these tickets might have corresponded to feature requests and some of these tickets might have corresponded to actual bugs in the project itself. The graph here clearly tells how the work towards this milestone went overall: roughly 45 days before "01-apr-2049", work started in earnest. Work on the project started slowly, but then things picked up. If you consider the fact that the number of "Open Bugs" appears to be relatively constant over time and the number of "Bugs Being Tested" appears to be increasing greatly over time, then you can easily deduce two things:

during this time, the number of new bugs being written against this project/milestone was roughly equal to the number of bugs being "fixed" and sent to SQA for testing.
SQA was getting a little bit backed up with tickets that they needed to test and verify...but this doesn't seem to have been a big deal because SQA seems to have had the capability to test/verify/close a large number of tickets in a short amount of time.

By interpreting this graph, you can even see pretty clearly that the project came to a pretty deliberate end. This is good. Everybody who looks at this graph should easily be able to understand all of this. This is the whole point of the graph.

Here is an example of another graph:

This is a much different graph than the previous graph. There are two things that I would like to immediately point out about this graph: this graph only includes tickets that have been deemed to be "blockers" to the release and also the window that this graph depicts has been increased to three months.

The addition of the "blockers" criteria is something that I have added to my reporting tools. My thinking is that, at some point in a software release, the whole organization needs to just concentrate on blockers and nothing else. Hence, my reporting tools offer this capability.

I have increased the reporting window here to ninety days to better illustrate the points that I will make next.

There is something very likely wrong with a development effort that produces the preceding graph. The number of open "blocker" bugs never trends towards zero, and neither does the number of "testing" bugs. It should be very easy to understand the following point: unless and until these two lines reach "zero", the organization cannot complete its software release. Even more troubling than this is the fact that the number of "closed" blockers is constantly increasing, which indicates that there is a constant stream of "blocker bugs" being found and fixed...for nearly three months.

How serious is all of this? Well, it depends on THE SCHEDULE, of course. In my experience, three months is a long time to be working on a single milestone, so, if I were to see a graph like this, I'd be pretty concerned.

...

These graphs are a (trivial) invention of mine, and I find them to be useful in my work in managing projects. I hope that others find this technique to be useful!

....

Update: naturally, soon after I published this, I learned that this is type of graph has an official name: burn down chart. I continue to find these charts to be useful in my own planning and management.

04 January 2009

(Don't) Throw It Over The Wall

I used to work at an interesting shop in which the general culture of the place regarded the SQA staff as being one step above moldy bread or something you find growing between your toes. This situation was succinctly expressed by a single catch-phrase; this phrase was uttered whenever a new software release was given to SQA:

THROW IT OVER THE WALL

In the culture of this place, the ultra-smart software engineers sat on one side of an imaginary wall, and the not-very-bright SQA staff sat on the other side of the wall. "Throw it over the wall" was the derisive phrase that the software organization used when it wanted to get the SQA organization to test some shiny and new software trinket that they produced. In general, the relationship between the software engineers and the SQA staff was not good.

I really wasn't wild about this dynamic. I mean, let's be honest: there were people of all abilities in both groups....just like at every other company.

The thing that bothered me about the "throw it over the wall" dynamic that went on at this place was that all this culture did was serve to demoralize the SQA staff. Furthermore, "throw it over the wall" frequently meant "give the SQA staff the software with very little documentation -- very little in the way of requirements, functional specifications, etc.". Sometimes I had no idea how the SQA team tested out the final product. I could tell that this situation bugged the more talented members of the SQA staff a lot.

There was very little I could do to change the dynamic at this place. For the period of time that I worked there, I at least tried to treat the SQA staff that I worked with as if they were partners in creating the product that we were all supposed to be creating. The results of this were generally positive -- when I worked with the more talented members of the SQA staff, they were definitely able to more thoroughly test the code that I produced. There was even one or two occasions in which a question posed to me by the SQA staff caused me to radically change the product that I worked on, because it turned out that my original design was flawed.

A few jobs later I again found myself working in a "throw it over the wall" shop. This time I was working in an incredibly intense environment, complete with aggressive schedules and incomplete requirements. Still, I took the time to establish a good relationship with the SQA staff, and I even wrote test tools for the SQA staff so that they could better test out my code. When the deadline came and the product shipped, I was pretty happy that my part of the product was well tested and performed well in the field. As for the "throw it over the wall" crowd, well, let's just say that there was a maintenance release and a hairy upgrade in the field...

26 May 2008

I am a fan of Programming by Contract and gcc's -Wcast-qual

Late one Friday afternoon at $DAYJOB I was nearly finished implementing a new feature for the product that I worked on. This was the culmination of several long days effort, and I was looking forward to finishing my task and going home for the weekend.

As I was hooking everything up to enable the new feature, I spotted a bug. This was a strange bug, but I felt confident that I'd find it quickly. I called my wife and let her know that I'd be a little late for dinner.

Like I said, this was a strange bug. I noticed this bug because the unit test I was cobbling together acted strangely in the corner case that I was trying to get right.

Basically, the problem came down to the fact that at some point in time in my C program's execution, I had a (char *) variable that pointed to a particular string, but, later in the program's execution, for some unknown reason the contents of the string changed. This was very unexpected, and in fact this change corrupted a larger data-structure that my code was maintaining.

So, I looked at the code that maintained the (char *) variable with all of the concentration that I could muster on a Friday afternoon. This exercise proved to be fruitless -- I soon concluded that the code that I was looking at was correct.

I called my wife and told her that I was going to be a bit later...

Since the code seemed to be correct, I decided to pull out the big guns -- I decided to run the program through the memory debugger. I guessed that there might have been a improper memory access in the code, and this was the thing that was corrupting my string.

Fire up $MEMORY_DEBUGGER. Instrument. Wait.... Wait some more.... Run.

Running $MEMORY_DEBUGGER yielded absolutely nothing: no bad memory accesses.

Now I was getting irked. I called my wife and told her that I wasn't sure when I would be home. Luckily we were having leftovers that night...

So, now I focused on the problem again:

my string was getting modified strangely.
the code appeared to be correct.
there didn't appear to be any memory access errors in the code.

At this point, I did what I probably should have done earlier: I fired up the debugger and added a watchpoint to the contents of the string. When I re-ran my program, I had my smoking gun: the string was being changed by some library code written by somebody else at $DAYJOB.

I was still a little bit confused though, because, like I said, the code that I had looked at was technically correct. But I hadn't looked at the library code...

The library code looked like this:


  void do_something(const char *s)
  {
     char *s2 = (char *)s;
     s2[0] = 'a';
  }

I am hand-waving a little bit here, because I can't remember the exact circumstances. All I really remember is that the situation was a lot more complicated than this code snippet, with many levels of indirection.

When I saw this, especially the first line of do_something(), the problem was obvious: whoever wrote this function broke a fundamental rule of C -- you are not allowed to modify the thing being pointed at by a (const T *) object.

So, I figured out who wrote this function, sent $COWORKER an email asking them to please fix their code to adhere to the rules, and then I packed up and went home. I couldn't check my change in with this bug still in the system, so I decided to wait until Monday.

You might think that the conclusion to this story might be boring, cut-and-dry, etc., but it wasn't for me.

Monday morning came. My $COWORKER who wrote the buggy function read my email and then responded via email. To sum up his email: (1) he was not going to fix this function and (2) the problem was mine because (paraphrasing) "'const' does not mean what the C standards bodies have defined it to mean; instead, 'const' means what they defined it to be at $SOME_PREVIOUS_COMPANY_THAT_HE_WORKED_AT".

I was flabbergasted at this response, so I went over to talk with $COWORKER. He amazed me with his tenacity. There was no line of argument that I could employ that would change his deep held belief here. I never really could pin down an exact definition of what "const" meant at his previous company. He did further clarify his position here by telling me that, in his experience, "most programmers are too stupid to know what the standards bodies say about 'const'". I protested that I thought that I understood pretty clearly what "const" meant, and he agreed with this, but he held fast to his "programmers are too stupid" point.

At this point I even pointed out that GCC had a "-Wcast-qual" flag that would catch errors like this.

"So what?" he responded.

"Do you think that they would add this check into the compiler if it wasn't, like, important?" I asked.

"I don't care.", he responded.

"Do you understand that I stayed late on Friday night because of this bug?" I asked.

"That's unfortunate.", he responded.

We continued this fun interplay for a few minutes, but I eventually had to give up. Clearly, he wasn't going to change his code, and I simply could not fix all of the places in the codebase that used "const" in this non-standard way.

The maddening thing for me here was that $COWORKER was a very smart engineer, and he certainly understood the concept of Programming by Contract, but he was basically asking me to enter into a contract that said "no matter what other legal mumbo-jumbo is in this document, it is OK if my code does anything whatsoever, even if it is wrong or goes against the spirit of everything else in this contract". This isn't a very useful or meaningful contract.

I eventually learned that anytime I saw the keyword "const" certain subsystems in the code that I should attribute no meaning to this keyword. Seriously, in those subsystems, "const" meant whatever the author meant on that day, and tomorrow the meaning might (and in fact did) change. I started to write my code in a very defensive manner, making copies of important data-structures and sending these copies off to these Alice-in-const-Wonderland subsystems.

I did have enough control over the system to ensure that all of my code compiled with gcc's "-Wcast-qual", and this did keep me out of a bit of trouble a couple of times. I knew that my code was bulletproof and correct.

One day after I had given up on getting $COWORKER's code to be "const correct" I was struck with an idea, so I made a modest proposal to $COWORKER: since he seemed to want to use a keyword to attribute some meaning to variables (whatever meaning "const" meant at $SOME_PREVIOUS_COMPANY_THAT_HE_WORKED_AT), I suggested that we could migrate his code away from using "const" and we could instead define a new keyword for his code with the C preprocessor. I suggested that we could do something like this:


  /*  This is some header file */


 /* For an exact definition of this keyword, please ask $COWORKER */
 #define MEANINGLESS_CONST

...and then we could have updated the original function that caused me to stay late at work like so:


  void do_something(MEANINGLESS_CONST char *s)
  {
     s[0] = 'a';   /* much more streamlined!!! */
  }

I even offered to make all of the changes for him...

So, now our system would have been improved! We'd still have the use of the "const" keyword as the standards bodies had intended, but we'd also have MEANINGLESS_CONST too, and our code would be more streamlined and definitely a lot more understandable. We'd even be able to use "gcc -Wcast-qual" too!

Anyways, I presented this idea to $COWORKER. I give him credit for sticking to his guns -- "no" was his simple flat response to my proposal.

I wish I had some neat way to wrap up this story, but I don't. The Alice-in-const-Wonderland code continued to exist in the product until I left. This probably cost the company a bit of money in terms of bugs and lost efficiency, but that's the way things work sometimes. I just had to learn to live with this behavior in the code.

To sum things up, I'm a big fan of Programming by Contract and "gcc -Wcast-qual". I'm also a big fan of sticking to reasonable standards, and assuming that my co-workers are smart until they go out of their way prove otherwise.

06 March 2008

Development Tip: Multiple Build Areas

Here is a code development tip that I nearly always employ in any workplace. I have employed this strategy for years, and several of my colleagues have told me "wow! that's a really good idea!" so I thought that share this.

I always setup multiple build areas to go along with the source control system that I am using. At the very least, I always have a "-work" directory (where I work on my current task), but (and here is the important bit) I always have a second "-clean" build area. I never modify any files in the "-clean" area! Ever. The only thing that I ever do wih the -clean area is (0) update this build area from source control, (1) run a build in this area and (2) run a regression test on this build area. Again, I never modify any files in this directory.

Having a "-clean" directory is terribly useful. For example, if I am making a big change that modifies ten files and my changes also depend on the addition of two files to the build tree, when I am done with my work, I will checkin my changes (under my "-work" area), and then I will immediately update my "-clean" area to run a build and a regression test. If I somehow forgot to add those two source files to the source tree, the build will fail -- but I will immediately notice this. It is much better for me to notice this immediately rather than my co-workers.

If you are a professional software engineer, the problem that I have just cited here has probably plagued you, what? -- several dozen or hundred times in your career? Yes? How much of your time has been wasted due to this problem? If only everybody employed this technique.

Like I mentioned, I always setup multiple build areas. In fact, I usually have at least a half dozen build areas going at the same time. I usually have a "-clean" area going for every source code branch that I work with, and I usually have a build area going for every task that I work on as well. This latter use of build areas seems to be particularly useful, because I have had colleagues who were dead-set on creating a new source control branch in the codebase to do their work tell me, after I have explained my multiple-build-areas methodology to them, that this trick saved them a lot of grief. Let's not forget, every time your organization creates a new branch, this costs your organization time and money. Sometimes you need a new branch, but many times you do not. This trick costs a modest amount of disk space, and disk space is cheap. Branches are never cheap.

I have used this trick wih dynamic views under ClearCase, static views under ClearCase, and directories under Subversion too. This trick can be used anywhere.

Update: yeah, yeah, yeah, I realize that folks who use DVCS systems will probably look at this post quizzically. Let me issue the following reminder: not all shops use DVCS.

kdc-blog