I've had a good couple of days.
This started a couple of days ago, when I became aware of a rather serious performance problem in some code that I help maintain. This problem exhibited itself on a busy production machine with lots of users. Solving this problem was Very Important.
At this point I started gathering data, trying to answer the question "where in the code are things going bad?".
After many hours of data collection, I analyzed the data and got a sense of the general neighborhood of the problem. I looked at the code in this area and thought about what was really going on in this area of the code. Eventually, I had a mini-epiphany: I realized that the algorithm I was looking at was (in a non-obvious way) actually an O(N) algorithm, and that I could probably replace it with an O(1) algorithm.
So, I rolled up my sleeves and implemented that O(1) algorithm I had in mind. After doing all of the testing that I could think of, I installed the new code on the production machine.
Like I said, this code was running on a production machine with lots of users. It didn't take long for the traffic load to ramp up. I'm pretty happy with the results: before my new code was installed on the production machine, this four-CPU-core machine had a 15-minute load-average of around 12.0 . After my updated code was deployed on the machine, the 15-minute load-average dropped down to 3.0.
Fifteen-minute load-averages of ~12.0 on a four-core machine remind me of a certain scene from "I Love Lucy". I'm glad I was able to help solve this problem without having to acquire some new hardware....
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment