15 July 2015

An Appreciation for _Programmers at Work_

A few months ago I learned that an important book in my life had been "republished" in the form of a blog.  The book is/was Programmers at Work.  I'm happy to see the content in this book available on the web, because I dimly remember giving my own copy of this book away to somebody else who I thought could use it.

I found this book to be useful when I was younger because back then I really didn't have very many people to talk with when I was exploring the idea of going into the computer (specifically software) field.  There were literally no programmers to talk with in the area that I grew up in, and of course, this was before the era of the Internet.  The thing that I was wondering about was "how do I get from point A to point B?".  Based on the information I got from this book, I got a sense that I needed to change the trajectory of my math studies in high-school (I decided that I really needed to take Calculus in high-school) so I enrolled in a local community college (night school) and got myself onto a better path.  Later on, as a college freshman, I was glad that I had changed my trajectory.

....

Back when I read this book, I was a twisted, confused kid, who really didn't have too many people to talk to about college in the first place.  Looking back at this section of Programmers at Work:

LAMPSON: I used to think that undergraduate computer-science education was bad, and that it should be outlawed. Recently I realized that position isn’t reasonable. An undergraduate degree in computer science is a perfectly respectable professional degree, just like electrical engineering or business administration. But I do think it’s a serious mistake to take an undergraduate degree in computer science if you intend to study it in graduate school.
INTERVIEWER: Why?
LAMPSON: Because most of what you learn won’t have any long-term significance. You won’t learn new ways of using your mind, which does you more good than learning the details of how to write a compiler, which is what you’re likely to get from undergraduate computer science. I think the world would be much better off if all the graduate computer-science departments would get together and agree not to accept anybody with a bachelor’s degree in computer science. Those people should be required to take a remedial year to learn something respectable like mathematics or history, before going on to graduate-level computer science. However, I don’t see that happening.
INTERVIEWER: What kind of training or type of thought leads to the greatest productivity in the computer field?
LAMPSON: From mathematics, you learn logical reasoning. You also learn what it means to prove something, as well as how to handle abstract essentials. From an experimental science such as physics, or from the humanities, you learn how to make connections in the real world by applying these abstractions.

....maybe this was one of the reasons why I also decided to study liberal-arts at college.  Sometimes books are the source of dangerous ideas....

17 June 2015

Adventures in Virtualization and Storage Management

One fine Sunday while I was doing some work around the house, I received a series of dire work emails from people in the management of ${employer}.  Early that morning, one of our marquee customers had some problems with an appliance that we supported at their site.  From what I read, it seemed like the appliance had suffered a significant failure...but the actual cause of the failure was unclear.

Thankfully, a member of our Support staff had managed to get the appliance working again by simply rebooting.  This was a small amount of good news in this difficult situation.

The "marquee customer" in this story is a famous company in the financial services industry.  Their infrastructure is large and complex.

There was little that I could do about the problem at this point, so I sent a request to the folks in Support to please try to get some logfiles from the appliance.  It seemed like a good idea to get to work early the next day, so I decided to turn in early that night.  I was fairly sure that this situation was going to land on my desk in the morning, and I was right....

The next morning, the logfiles that I asked for soon appeared, and I learned that this customer had implemented our appliance as a VM in their big-iron ESX server farm.  The content in the logfiles was...strange.  Something Crazy seemed to have happened to the appliance's database, but beyond this, the root-cause of the problem was unclear.

So, at this point, we began to do several things:

  • trying to get more logfiles to help determine the root cause
  • looking through the code related to the database to see if there were any obvious problems
  • attending a set of conference calls with the customer to manage the problem.

This was a challenging problem to deal with.   Nothing in the logfiles really shed any light on the problem.  Unfortunately there were some issues in the database code that I really had been wanting to do something about for long time prior to this incident, so I started to work on these (but it seemed unlikely that the problems I was concerned about were related in any way to the problems that this customer had encountered).  And, of course, the conference calls were....tense.  Everybody wished that the problem had never happened.

A detailed reading of all of the logfiles still led me to conclude that Something Crazy had happened.

On Wednesday/Thursday we deployed some new code at this site that we hoped would improve things in general.  The stress level in the office began to go back down to normal...

...

The next weekend, the machine crashed again at exactly the same time.  Now I got even more forwarded email....and this time it was obvious that this problem was being noticed by various higher-ups in both organizations.   Not good, not good....

Again, somebody from Support managed to get the machine going again by simply rebooting.

....

Monday morning came along, and we arranged a conference call with the customer's IT staff.   We started going through the logfiles, etc.  We were pulling our hair out trying to figure out what had gone wrong while the higher-ups were fuming.

Around 5 minutes into the conference call, as we were talking to this customer's IT staff (waiting for some "higher up" staff to arrive), one of the customer's sysadmins told us that the last two weekends had been pretty bad at their site.  "Why?" I asked.  The answer:  "well, the corporate SAN server was scheduled for some maintenance over the past two weekends, and when they were performing the maintenance a lot of things went haywire".  I asked "does this SAN server provide the storage for all of the VMs on the ESX server, the same ESX server that runs our appliance/VM?".  "Yes", the sysadmin said, and then he continued, "actually, when the SAN server went down, we had around 40 VMs on the ESX server crash or go into a flaky mode where they needed to be rebooted".

I pressed "mute" on the speakerphone, looked around the table and asked our Support person "at any point in this incident has this customer's staff ever mentioned the fact that their SAN server went down?"  The reply:  "nope".

...

We then learned that their SAN server had been down for 45+ minutes during both weekends.  Then I had to explain to this customer's IT staff that our appliance had a database server running on it, and that a database server doesn't react very kindly when the underlying disk drive goes away for 45+ minutes.   "Ohhhhhh" was their reply....

The end-result of this incident is that we actually had to add verbiage to the customer-facing docs reminding site-admins that if they chose to deploy the appliance as a VM, that they needed to ensure that the underlying storage mechanism needed to be highly-available.

I swear I didn't make this up.

02 November 2014

Philosophy Tech Support

I'm pretty sure that Philosophy Tech Support from Existential Comics is the funniest thing I'm going to read all day.

05 September 2014

Hanover / Chelsea Loop -- August 2014



This was a nice ride!  Gloomy skies for the entire day, but it never actually rained on us.  Even though it was August, I wore arm-warmers for the entire day, and a vest during descents.  The roads were super-nice, with a fair amount of dirt, but I rode all of it with no problems on 700x25s.

Just before we rolled back into Hanover, we stopped by a store that is frequented by AT hikers.  When I hike in the Whites I always give AT hikers whatever food that I have (because I know how many calories they burn through), and that's exactly what I did here in Norwich too.  It was neat to chat with one particular AT hiker, asking him about his experiences so far on the trail.

As we crossed the border back into NH, the sun came out and it got hot.  That was the first real sun we got for the entire day.  Still, this was a great ride, with a great bunch of people, and I'm glad that we shared this little adventure!

18 August 2014

Some Humorous Truth About Java


I think that Java is well-designed language that comes with a very functional/useful set of class libraries.  There are a fair number of detractors of this language, but many detractors are fans of languages that aren't (in my opinion) nearly as well-designed as the Java ecosystem.

However....sometimes Java can be difficult to work with.  I suppose that every professional Java programmer has their own experiences working with Java.  For me, one of the things that I'll always remember about working with Java was the efforts I had to go through to get some non-standard security (SSL-related, but with a twist...) code to work correctly.  I ran into a lot of sub-problems.  It was almost as if the strongly-typed class-library design was telling me at many points "you can't do this ; you can't do that ; you can't do dozens of other things either.".  I was trying to do something fairly reasonable, except that in order to get things to work I had to take a fairly roundabout route.

In the end, everything worked out.  But....the picture above makes me smile...

30 June 2014

Judo In Security Engineering (first story)

One day at {dayjob}, I decided it was time to tell the VP of Engineering about my concerns regarding the product's security.  Specifically, I was concerned about the security of a part of the product that I was helping to design+implement.

We were starting to get some enterprise and government interest in the product, and I wanted to have a sensible security scheme in-place before we suffered any embarrassment.  We still hadn't released the product out into the wild yet ; the time was right to implement the security measures I had in mind.

So, I had a conversation with the VP of Engineering, gave him an overview of my plans, and got him to agree to my plans.  He allocated space in the schedule for me to work on this project, and he even allocated two of my co-worker's time in order to complete the project.

So far, so good....

The simple fact of the matter, however, was that we were working at a fairly crazy pace at {dayjob}, working towards the FCS of this product.  Although the VP had given me some time in the schedule, he didn't give me much....something on the order of maybe 4-6 days, total.  I had to be very strategic.

The security scheme that I had in mind involved some well-known cryptographic tools.  I was not looking to invent anything new here.

So, I sat down with my two co-workers to go over the plan.  This is where things got....interesting.  One of my co-workers {PracticalGuy} was a little more junior than I was, but very smart and very capable.  The other co-worker (MathGuy} was one of the most senior engineers in the company -- far more senior than I was.  As you might be able to tell, {MathGuy} was very mathematical in nature.  Honestly, he seemed to be really good at mathematical analysis, and frequently his analysis was a bit over my head.  {MathGuy} was also very capable.

Anyways, I described the problem to my co-workers, and the plan that I had come up with to address the problem.  Part of my plans involved using some of the crypto tools from {OpenSSL / libcrypto} in our product.  {PracticalGuy} came up to speed on what needed to be done right away.  The interesting part of what happened next had to do with {MathGuy}.  As soon as I said "OpenSSL library" {MathGuy} strongly objected.  I didn't know a huge amount about OpenSSL at the time, but I knew enough to understand that this library could be a bit tough to work with.  However, this really had nothing to do with {MathGuy}'s objections.  He stated "OpenSSL is a big complicated library, and I don't know how every single line of code in this library works.  I'd prefer to write my own crypto code.".

Wow.  I did not see this coming.

I tried to reason with {MathGuy}.  Obviously, the first two things I pointed out to {MathGuy} were (1) the schedule and (2)  the number of man-years involved in the design+implementation of {OpenSSL / libcrypto}.

"I don't care!" stated {MathGuy}.  We argued for a few minutes....it was a fairly crazy argument that went sort-of like this:

MathGuy:  I don't think that it would be very hard to write my own libcrypto.

Me:  do you understand how many man-years have gone into this library?

MathGuy:  I always wanted to write my own Diffie-Hellman.

Me:  we don't need Diffie-Hellman for this project.

MathGuy:   I think it would be better if we used it.

And then {MathGuy} ushered me away, telling me that he was going to get started on the project.  I left this conversation with little confidence that {MathGuy} would contribute to this project in any meaningful way.

A few weeks later, me and {PracticalGuy} started on the project in earnest.  {PracticalGuy} brought a lot of capabilities to this project that I didn't have, and I was truly grateful to have him working with me on this project.

I did try on one occasion to get {MathGuy} back on-track and into a mode where he could actually help us on this project.  I stopped by his cube and tried to get him excited about one aspect of the project that needed attention.  "I've got no time to help you with that problem!" said {MathGuy}, "I'm in the middle of designing my own BigNum library, and I am swamped with issues right now!".

As soon as I heard the phrase "my own BigNum library" I knew I was in a pickle.  I couldn't exactly storm into the VP of Engineering's office and tell him about this crazy situation....after all, {MathGuy} was much more senior than I was, and seemed to have much more political weight as well.  I couldn't get {MathGuy} to work on anything practical either.

So, I did what I could do.  I made sure that {MathGuy} was working on his own isolated branch in source-control.  I also continued to work with{PracticalGuy} on the security project, and we were able to complete the project with great  success.  And...when the VP of Engineering asked me how the project was going and sort-of forced me to account for what {MathGuy} was working on, I told him "well, {MathGuy} has his own ideas about this security project, and he is working on those ideas, but I'm skeptical that those ideas are going to pan out.....I think that he might be looking for something else to work on....could you maybe find some other work for him to work on?".

I never really checked on what was going on out in {MathGuy}'s isolated branch in source-control.  I sort-of assumed that the whole thing fizzled out.  This was simply the best I could do, given the situation....

22 May 2014

My Take on Aaron Maxwell's "Use the Unofficial Bash Strict Mode (Unless You Looove Debugging)"

I enjoyed Aaron Maxwell's Use the Unofficial Bash Strict Mode (Unless You Looove Debugging).  Read it -- seriously.  Mr. Maxwell is clearly somebody who knows what he is talking about.

I'm a shell hacker too.  My fascination with the shell has....evolved over the years.  I no longer reach for the shell as my first tool of choice...I more have the following heuristics in my head when I approach a task:
  • If a program can be written in under a page of code, then a shell script is a perfectly reasonable solution to a problem.
  • If a program can't be solved in under a page or two of code, it is time to think about writing the program in Python/Perl.
  • If a program basically has to fork+exec a bunch of other external programs, even if the program gets to be big, it might be just easier to keep the program implemented as a shell script.
  • But...if a shell-script evolves to be more than five (5) lines of code (or maybe even less...), then the shell script HAS TO be written according to the following template that I've come up with.

My Template

Here is my template for a reasonable shell script:


#!/bin/bash

#################################################
function usage() {

cat <<EOF
Usage:  $0 [OPTIONS]

OPTIONS

  --blah
        Allows you to specify the "blah" value.

EXAMPLE

  $0 --blah 1.2.3.4

EOF

}

########################################################
function cleanup() {
    echo "Cleaning up"
    rm -rf "$TMPFILE"
}

########################################################
function errHandler()  {
    echo $0: some unexpected error happened.  1>&2
    cleanup
    exit 1
}

########################################################
main() {    
    IFS="`printf '\n\t'`"

    set -E    # "set -e" is good.  I think that "set -E" is better.
    set -u   # exit if a reference is made to an undefined variable
    set -o pipefail   # exit if there is a failure somewhere in a pipe
  
    trap errHandler ERR



    # Let's say that somewhere in this program the code needs to
    # use a temporary file.  Well, let's have a consistent name for
    # this file, so we can ensure that it gets removed if this program
    # fails for some reason.
    TMPFILE=/tmp/some-temp-file.$$

    echo "blah" >"$TMPFILE"

  

    # Just for the sake of an example, here we're going to
    # introduce the chance for an error to occur.

    echo "The next operation may or may not fail.  Stay tuned..."
    echo

    [ $(expr $(date '+%s') % 2) -eq 0 ]


    cleanup
    return 0
}

main "${@}"
exit $?



Let me explain why I write shell scripts in this way:

#!/bin/bash

I know that some old-time Unix programmers might scoff at this.  Somebody might say "but the Bourne shell (/bin/sh) is the One True Shell".  My response:   I'll be glad to make my shell script ultra-portable when it needs to be.  Until then, nearly all of my shell-script code simply has to be portable to Linux (and occasional Mac) systems....and these all have Bash available for them.

I'm explicitly using the GNU Bourne-Again SHell here.  I do not care to write code that invokes /bin/sh but actually depends on /bin/sh being the Bourne-Again SHell.  Code like this dies when /bin/sh is actually Dash.  If somebody thinks that I am making a mountain out of a mole-hill here, I would remind them that Dash is the default /bin/sh on Debian/Ubuntu systems, and I don't want code that I produce to die strangely on such platforms.

I have no problems with using the original /bin/sh, "ash", or "dash", and I happily will use these if the situation is appropriate.

usage()

Putting usage information at the top of the file seems to allow me to provide documentation and useful code commentary with a minimum of duplication.

cleanup()

Many programs like this create temporary files, or some other thing that might need to be cleaned up.   I try to put all of my cleanup-related code in this function, as opposed to sprinkling code like this everywhere in the code.

errHandler() / set -E / trap errHandler ERR

This is how this program handles unanticipated errors. 

"set -e" and "set -E" are very similar, but, in my opinion, "set -E" is better because the "exit if something goes wrong" behavior is applied to subshells and functions.  I prefer to have Bash executing in this mode as much as possible -- I don't want this behavior to (silently...) not occur just because my code happens to be running in a function.

Notice how the errHandler() code properly calls cleanup() before it exits.

I can say a lot about Bash scripts that are written in this manner -- they exit if some unexpected error occurs.  Writing code in this manner is sort-of like employing a poor-man's exception-handler.   In my opinion, the following two points are very important to understand about code that executes in this configuration:

  • It does take some more effort to produce code that executes in this context.  The code needs to be written in such a way that every possible non-zero exit value is properly trapped.
  • On the other hand, code that is written in this manner is much more reliable than code that is written in the default mode.  As a programmer who is only interested in producing reliable code, I spend nearly zero time tracking down bizarre problems in code that is written in this style.   If the shell is not configured in this way (the default mode...) then you can pretty much guarantee having to track down a bizarro problem at some point in time.

main()

This is my invention.  I write code in this way because I've worked with hundreds of shell-scripts over the years that started out as some small bit of code...and then grew...and grew....and grew....and eventually became a steaming pile of top-level code plus functions plus more top-level code.  I've worked with some 20000 line shell-scripts in the past in which I had a hard time even figuring out where the code started executing.  Working with code like this causes me to want to wash my eyeballs in bleach afterwards.  I have little patience for this sort of thing: the code that I produce isn't going to suffer from this problem.

Please also notice that this code uses functions to group together related bits of code.  I have seen lots of shell-script code over the years that is just page after page after page of code...with no functions.  Code written in this "style" ignores basic software-engineering practices.  Yuck.

Epilogue

This shell-script template is my personal style.  I don't always write shell-script code (I'm more of a fan of higher-level languages...), but when I do write shell-script code I try to employ every reasonable technique that I can to help ensure that the code is clear and reliable.

In my opinion, the "errHandler() / set -E / trap errHandler ERR" part of this template is critically important.  I can think of several large shell-scripts that I have been called upon over the years to maintain that have benefited greatly from this scheme.  When I first started working with these large programs, the programs were HUGELY UNRELIABLE, regularly spewing errors and leaving systems in many inconsistent states.  It cost LARGE AMOUNTS OF TIME AND MONEY to fix systems that were left in an inconsistent state because of these problems.  After I started maintaining this scripts, it did take me a good deal of effort to update these unreliable programs to be written in my preferred style.  But, after this effort was over, the enhanced reliability has paid HUGE DIVIDENDS......I no longer have to chase boffo problems in these areas.

It is seductively easy to write shell-scripts.  But, the results are a lot better if some basic, common-sense rules are followed.