I am still interested in figuring out the proper set of parameters for a TCP stack on the lan, and today I was taking a closer look at the TCP performances of a farm of memcache servers.

Connections from clients to memcache are, for the immense majority, short lived and with small amounts of data. The average packet size leaving the memcache system is below the MTU of 1500 bytes. But a significant amount of response are larger than the MTU, and thus require fragmentation to reach their destination.

Because the requests are short lived, the TCP window size between the clients and memcache never increases significantly. The initial window returned by memcache is typically initialized around 5800 bytes, and very rarely grows larger than 9500 bytes.

The RTT measured between the memcache nodes and its clients is around 0.140 milliseconds (140 microseconds). These nodes being connected in gigabits ethernet, the Bandwidth Delay Product is:

BDP = ( 1*10^9 ) * ( 142 * 10^-9) = 142 bytes

Therefore, the maximum amount of data in transit at any given time between memcache and one of its clients will never be more than 142 bytes.

142 Bytes is so small that neither the TCP window or the Congestion windows will be hit on either side of the connection. So the only thing we can improve seems to be the size of the packets, by enabling 9,000 Bytes Jumbo frames.

ip link set mtu 9000 dev eth1

Another aspect to consider is to increase the default txqueuelen from 1000 to something more suited to gigabits links, like 1,000,000.

ip link set eth1 txqueuelen 1000000

see this for a description of txqueuelen: http://wiki.linuxwall.info/doku.php/en:ressources:dossiers:networking:traffic_control#pfifo_fast