I found this discussion about TCP sockets in Linux sitting around half-finished in Google Drive. It was dated 2012-10-23. I have no memory of writing it. I cleaned it up and am posting it here in case anyone finds it useful.

A ‘socket’ is an abstraction for a network connection. Each machine has a socket it is communicating with; from an application’s perspective, sockets make it possible to treat a network connection much like a file on the local system, and just read from and write to it like any other file.

There are two categories of people who might want to know more about how sockets work: sysadmins and programmers. Sysadmins need to be able to track down troublesome connections and understand the state of things on their network. Programmers can write better networking code by understanding the behavior underlying their networked applications. This article has some information for each group, so read on!

Here is an overview of the lifecycle of a TCP socket:

Establishing a connection

An application on one system (we’ll call it the “server”) creates a socket. In C1, this is done by calling socket(), bind(), and then listen(). This creates a socket in the LISTEN state. To actually accept an incoming connection, the program also needs to call accept(). The socket will stay in the LISTEN state (and the accept() call will block) until an incoming connection is made on the port specified by the program. In the bind() call, the program specifies a port number use.

On another system, (the “client”) an application creates a socket (with socket()) and calls connect(), passing it the IP address of the server and the port number of the program it wants to connect to. This creates a client-side socket, which starts in SYN_SENT. It also, as the state name implies, sends a SYN packet to the server. This socket also has a port number attached to it, but it is chosen randomly by the operating system. (this is called an ‘ephemeral’ port)

Once these two sockets (one on each system!) start talking, they go through a ‘3-way handshake’ (transparent to the application) to establish some necessary metadata, and then both sockets change to the ESTABLISHED state.

Sending data

At this point the applications can simply read() and write() to the socket. Each application will read() what the other application write()s. If either system goes for a while without sending any data, the system on the other side of the connection will send a “keepalive” packet. This is a packet with no data in it that only exists to make sure the other side is still there.

As long as both sides keep sending/responding to data and/or keepalives, the sockets will stay in the ESTABLISHED state.

Shutting down the connection

When one side of the connection calls close() on its socket, it sends a special packet telling the other end it is finished (a FIN packet), and the socket that sent the FIN changes to FIN_WAIT.

When a connection in the ESTABLISHED state receives a FIN, it changes to the CLOSE_WAIT state.

When a socket that’s already in FIN_WAIT receives a FIN, it closes. It will also timeout after a couple minutes and close on its own, if it receives no response from the other system.

A socket in CLOSE_WAIT, by contrast, cannot close until the application calls close() on the socket. Once this happens, the system sends a FIN so that the remote end can finish closing, and the socket closes. The important takeaway from this is that if you see sockets stuck in CLOSE_WAIT indefinitely, that’s a bug in the application that opened the socket.

Preventing indefinite CLOSE_WAIT

From an application’s perspective, all of these socket states are transparent; they are a kernel abstraction. The application just has a file descriptor that it is writing and reading on. So how do we detect when the other side has closed down the connection? (i.e. that we are in CLOSE_WAIT)

When you try to read() on a socket the other side has close()d, you get EOF (the end of file marker). It is considered correct programming to, at this point, do any necessary clean-up work and call close() on your own socket.

Otherwise you end up with sockets hanging out forever in CLOSE_WAIT.

In other words, if a socket is stuck in CLOSE_WAIT, it is necessarily a fault in the application - the application is failing to check for the connection closing, and thus never calls close().

This problem is common in applications that do nothing but write() to their sockets (they never read() so they never detect the EOF). To prevent this problem, any program that opens a socket must occasionally try to read() from the socket to make sure it isn’t writing data into a closed connection. If blocking on read is a problem, you can open the socket with the O_NONBLOCK flag.

1Many languages and libraries abstract these calls and make opening sockets much, much easier. On any Unix-like system, though, they all make these calls eventually, since they are kernel syscalls.