I only have limited control over this traffic. It just shows up (sometimes as if fired by a Gatling gun), and code that I help maintain on a certain server has to deal with it. When the traffic isn't handled in a timely manner, people get upset and the phone starts ringing.
Recently, after witnessing our server starting to have difficulty handling a certain burst in traffic, I started to think to myself "how are we going to improve?". I came up with a variety of ideas.
One of my ideas went as follows:
It is computationally non-trivial to unpack everything in every PDU that arrives at the server. Under normal operations, everything is fine, and there are plenty of CPU resources for this sort of thing. However, during heavy traffic loads, my data suggests that unpacking all of these PDUs is something that I need to pay attention to.
And....one of the problems is that the {things} that are sending our server all of these PDUs almost seem to be impatient. Some of these {things} seem to operate in the following way:
- Send a PDU with a request in it
- Wait an infinitesimal amount of time
- Have you received a response yet? If "yes", we are done! However, if "no" then goto step 1.
But...when I look at the guts of this protocol, I believe that I see a non-trivial (but not difficult either) way of limiting these re-sent packets. The code that I have in mind won't actually have to entirely unpack all of the PDUs in order to figure out which packets are re-sent packets and which packets are valid new requests.
Here is the thing that I have in mind:
- Under normal operations, everything works as it usually does.
- A "retransmission dampening filter" (RDF) will always be running.
- The RDF will let new PDUs into the system with no restrictions.
- In a CPU-efficient way, the RDF will be able to detect PDUs that are retransmissions. If a PDU has been re-transmitted by the network {thing} without having waited {N} milliseconds, then the packet will be dropped. This will spare the rest of the system from having to unpack the entire PDU, which, again, takes some CPU time.
So, I threw together a simulator, and here are my results:
So, um, yeah....obviously I'm going to write some code soon to implement this. Of course, I'll make my code tunable, but I'm pretty sure that the default will be to enforce a minimum 1-second retransmission minimum. This should be both a fun project, as well as something that will really help the server that I help maintain!
No comments:
Post a Comment