The Linux Page

How do I know whether a server sends Keep Alive signals over a TCP connection?

Heart with a Heart Beat drawn inside, looks like a computer being kept alive

Today I was curious about whether a certain server was sending me keep-alives.

The fact is that Wireshark is capable of the feat. It has access to the low level data packets and can tell you whether one of those looks like a Keep-Alive sequence (SEG.SEQ) or acknowledgement (ACK).

So if you need to know for a specific server, it's easy enough. You need to connect that specific server and look at the packets using Wireshark.

However, in software, it is not actually possible to determine whether Keep-Alives are being sent. This is because the TCP stack just answers to a signal with a form of ping and nothing else happens to the TCP stack. In other words, the client has no knowledge of the Keep-Alive signal at all.

Obviously, you can create a daemon which you can order to review a certain connection and tell you whether it looks like Keep-Alive packets are being send by the other side. That daemon needs to run as root, but if it only does tell you whether a connection is using Keep-Alive or not, it's relatively safe. Just remember that it will pretty much always be a guess. The Keep-Alive signal may be a suprious segment and not an actual Keep-Alive beep (albeit rather unlikely).

Maybe I'll implement such a daemon one day. It would be cool to be able to get such information at the client's level.

Note 1: remember that Keep-Alive is expected to only be used on the server side, hence this post because the client would need to know whether it is indeed implemented.

Note 2: Also, under a normal Linux system, the first Keep-Alive segment is sent after 7,200 seconds (2 hours) as per the TCP documentation. Further such segments are sent every 1 minute and 15 seconds (every 75 seconds) after those first 2 hours. In other words, if all your connections are very short lived (i.e. a quick HTTP GET request), the TCP Keep-Alive is definitely useless.

Technical Notes

The three parameters that the Linux TCP stack uses to manage the Keep-Alive heart beat are found in the /proc area:

  • /proc/sys/net/ipv4/tcp_keepalive_time

This is the number of seconds before the first Keep-Alive segment gets sent.

By default this is 7,200 seconds or 2 hours.

Remember that changing this value means ALL your clients and servers will be affected.

  • /proc/sys/net/ipv4/tcp_keepalive_intvl

Once the first Keep-Alive segement was sent, the following ones are sent at this interval, non-stop, until the feature is turned off or the socket closed.

By default this is set to 75 seconds which means that the connection gets continuously pinged that often once the Keep-Alive signal kicks in.

Note that the interval ignores all other frames and acknowledgements. A new Keep-Alive segment gets sent every 75 seconds no matter what gets transferred on the line.

  • /proc/sys/net/ipv4/tcp_keepalive_probes

The probes is a value used by the Keep-Alive counter. If no reply is received this number of times, then the connection is viewed as being dead.

Cons of the Keep-Alive Feature

Note that a TCP connection is up as long as the client and server are running and don't close the socket. Whether the intermediate computers (especially on the Internet) are up and running or not.

In other words, a computer that gets rebooted in the chain of computers used to connect  the client and server has no direct impact on a TCP connection as long as it gets back before the connection gets used again.

So the Keep-Alive in such a scheme is problematic because it will detect intermediate computers malfunctions and report that as "the other peer is down".

Another Con is the fact that the Keep-Alive increases traffic. Obviously, today we have extermely fast Internet. But that doesn't mean we should send Keep-Alive signals every minute forever when we don't have to.

What I think is surprising from the Linux implementation is the fact that they send those Keep-Alive signal all the time instead of just when no traffic was seen for a while. I think I can understand why. It may be less costly for the OS to wait 75 seconds and send a signal than resetting that timer on each message received. That being said, they could have a timer which adjust itself using the last message received + 75 seconds whenever they receive the tick. Less network traffic seems to be a better solution than more Keep-Alive segment on that network.