TCP Sockets in C

In order to program an EPP Client in C how I want, the next thing I need to learn about are TCP sockets.

Recap

In my article Creating a Nominet-Compatible EPP Client I created a simple TLS HTTP(S) client. It doesn't have much functionality, but that is because there is no interaction.

I have decided that I will be using TCP sockets for clients to communicate with my EPP service, which will itself communicate with Nominet's EPP server.

While I will probably use encryption further down the line to secure the TCP connections, for now I just want to be able to test things using telnet. That means I just need to learn about TCP sockets (and how to program that stuff in C).

TCP

TCP is a connection-oriented (CO) protocol, as opposed to the connectionless UDP protocol. At its simplest that means that TCP confirms packets you have sent have been received, whereas with UDP you just throw packets at the destination and hope they arrive.

TCP also ensures packets arrive in the correct order, with the sender told to resend any packets that didn't arrive. For something where XML is being transmitted, it is important the packets are in the correct order otherwise the XML will likely be non-valid.

It is for that reason that the Web uses TCP rather than UDP. Gzip compressed Web pages split into packet-size chunks, sent through the tubes, and reassembled on the other side, probably wouldn't load properly if HTTP used UDP.

Anyway, the choice was between TCP and Unix Domain Sockets. Since you can't telnet to a UDS, I've gone with TCP. I may change my mind later, but the main consideration will be whether I want it to be accessible over the network or only from the local machine.

Creating a TCP Server

For this, I am going to heavily use code from Sockets Tutorial [www.LinuxHowtos.org] that I will build upon for my purpose.

The code I ended up with below has several modifications from the code in the tutorial. For example it is IPv6-based (IPv4-mapped IPv6 addresses supported) and it uses a fixed IP address and port (with no code for argv parameters).

This is just a starting point, and I wrote the code separate from my existing code so that I can see how it works. There is one issue with the code: a fixed buffer size of 256 bytes (255 excluding the \0) and that is all it will read.

Given my code from the previous article doesn't have that issue, I will attempt to further modify this code and remove the message size limit.

/* Standard Libraries */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <getopt.h>
/* Socket Libraries */
#include <unistd.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

void established_connection(int);

void error(char *msg)
{
	perror(msg);
	exit(1);
}

/* main() is server management. Established connections are not handled here. */
int main(int argc, char *argv[])
{
	char *BIND_PORT = "9999";
	char *BIND_IP = "::1";

	int sock_fd, newsock_fd, port_number, pid;
	socklen_t client_addr_len;
	struct sockaddr_in6 server_addr, client_addr;

	port_number = atoi(BIND_PORT);

	sock_fd = socket(AF_INET6, SOCK_STREAM, 0);
	if (sock_fd < 0)
	{
		error("socket");
	}

	bzero((char *) &server_addr, sizeof(server_addr));
	server_addr.sin6_family = AF_INET6;
	server_addr.sin6_port = htons(port_number);
	inet_pton(AF_INET6, BIND_IP, &server_addr.sin6_addr);

	int optval = 1;
	setsockopt(sock_fd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval));

	if (bind(sock_fd, (struct sockaddr *) &server_addr, sizeof(server_addr)) < 0)
	{
		error("bind");
	}

	listen(sock_fd, 5);

	client_addr_len = sizeof(client_addr);

	while (1)
	{
		newsock_fd = accept(sock_fd, (struct sockaddr *) &client_addr, &client_addr_len);
		if (newsock_fd < 0)
		{
			error("accept");
		}
		pid = fork();
		if (pid < 0)
		{
			error("pid");
		}
		if (pid == 0)
		{
			close(sock_fd);
			established_connection(newsock_fd);
			exit(0);
		}
		else {
			close(newsock_fd);
		}
	}
	close(sock_fd);
	return 0;

}

/* After a connection is successfully established, processing is no longer in main().
 * Instead, all processing within a client connection is handled within established_connection(). */
void established_connection(int sock)
{
	int n;
	char buffer[256];

	bzero(buffer, 256);
	n = read(sock, buffer, 255);
	if (n < 0)
	{
		error("read");
	}
	printf("Message: %s", buffer);

	char ack_msg[] = "Message received.\n";

	n = write(sock, ack_msg, sizeof(ack_msg));
	if (n < 0)
	{
		error("write");
	}
}

Reading the EPP Header

While working on this TCP client, I tried to work out how I could get the header length from an EPP data message.

The header is 4 bytes long (with leading zeros) and indicates the full length of the reply (including the header).

It took some work, but I ended up with the following replacement established_connection() code:

void established_connection(int sock)
{
	int n;
	char buffer[256];

	union {
		uint32_t whole;
		char bytes[4];
	} msg_length;

	n = recv(sock, msg_length.bytes, 4, MSG_WAITALL);
	if (n != 4)
	{
		error("recv");
	}
	msg_length.whole = ntohl(msg_length.whole);
	uint32_t msg_data_length = msg_length.whole - 4;

	printf("Message octets: %d %d %d %d\n", msg_length.bytes[0], msg_length.bytes[1], msg_length.bytes[2], msg_length.bytes[3]);
	printf("Message length: %u\n", msg_length);
	printf("Message data length: %u\n", msg_data_length);

	bzero(buffer, 256);
	n = read(sock, buffer, 255);
	if (n < 0)
	{
		error("read");
	}
	printf("Message: %s\n", buffer);

	char ack_msg[] = "Message received.\n";

	n = write(sock, ack_msg, sizeof(ack_msg));
	if (n < 0)
	{
		error("write");
	}
}

In order to test, I telnet'd to ::1 9999 and sent the simplest octets I could come up with that would make the maths easy: the ^A (1) control code.

^A^A^A^AHello World!
Message octets: 1 1 1 1
Message length: 16843009
Message data length: 16843005
Message: Hello World!

^A is an octet with a value of 1. In binary that is 00000001 and in hex 01. This makes the maths easy to test: using a calculator that shows a number in different bases (e.g. gnome-calculator shows bases 2, 8, 10, and 16 in programmer mode), 01010101 in hex equals 16,843,009 in decimal.

^B is an octet with a value of 2. 02020202 in hex equals 3,368,018 in decimal.

^B^B^B^BThis is a test.
Message octets: 2 2 2 2
Message length: 33686018
Message data length: 33686014
Message: This is a test.

I have worked out the ESP header part and the maths is correct.

One final test.

☠^Ahi
Message octets: 1 -96 -104 -30
Message length: 3801653249
Message data length: 3801653245
Message: hi

E298A001 equals… 3,801,653,249. I really hope Nominet never send an EPP response that is nearly 4 gigabytes in size.

Of course this code doesn't belong where it is. This code is for the server-client connection, not the server-service connection.

The Buffer

There is still the buffer issue to work out. I could just do things the way EPP does (and some other TCP protocols do) by setting the first so many bytes to the size of what needs reading.

Upon writing code to do things that way, I discovered some bugs in my code. ^@ is NUL (0) in telnet, and if the first 4 bytes contain a NUL then that marks it the end of the string within C.

What that had the effect of doing was making all strlen() calls return a length of zero.

Other than the first 4 bytes, however, there should not be a NUL character in the rest of the request/response. That is because NUL is not permitted anywhere within an XML 1.0 or 1.1 document (unsure about XML 1.2) and EPP's specification uses XML 1.0.

But this is a TCP server, and XML might not be what is being received (I'm thinking about code reuse here). For that reason I have added comments to the code where modifications would be needed for data that might include NUL (such as binary).

I have also renamed (and added) some variables so that the only things that need setting are the buffer size and the header size. Other variables use those numbers, and those variables are named in a way that should make the code clearer.

For example, buffer_size is the size of the buffers, and buffer_size_chars is the amount of chars that can fit in buffer_size (i.e. equal to buffer_size − 1). I find code is a bit easier to comprehend if I don't have to keep trying to understand why I'm using a calculation in one place and a different calculation on the next line.

/* After a connection is successfully established, processing is no longer in main().
 * Instead, all processing within a client connection is handled within established_connection(). */
void established_connection(int sock)
{
	uint32_t buffer_size = 256;
	uint32_t header_size = 4;

	uint32_t buffer_size_chars = buffer_size - 1;

	uint32_t msg_data_length, position, remaining_chars = 0;
	char buffer[buffer_size], temp[buffer_size];
	int n = 0;

	union {
		uint32_t whole;
		char bytes[header_size];
	} msg_length;

	bzero((char *) buffer, buffer_size);
	n = read(sock, buffer, buffer_size_chars);
	if (errno == EAGAIN)
	{
		exit(0);
	}
	if (n < 0)
	{
		error("first read");
	}
	memcpy(msg_length.bytes, buffer, header_size);
	msg_length.whole = ntohl(msg_length.whole);
	if (msg_length.whole < header_size) {
		fprintf(stderr,"Error: Data length less than header size.\n");
		printf("--------------------------------------------------------------------------------\n");
		exit(1);
	}
	msg_data_length = msg_length.whole - header_size;

//	printf("Message octets: %d %d %d %d\n", msg_length.bytes[0], msg_length.bytes[1], msg_length.bytes[2], msg_length.bytes[3]);
//	printf("Message length: %u\n", msg_length.whole);
//	printf("Message data length: %u\n", msg_data_length);

	bzero((char *) temp, buffer_size);
	memcpy(temp, buffer + header_size, buffer_size - header_size);
//	printf("Message: ");

	/*
	Make msg_data large enough to hold the entire message.
	While we aren't currently storing the entire message in the array, it is there for when we do.
	*/
	char msg_data[msg_data_length - header_size + 1]; // With NUL.
	bzero((char *) msg_data, msg_data_length - header_size + 1);
	/*
	If "msg_data_length" is less than "buffer size chars (buffer size minus 1) plus header_size", then the buffer contains the entire data message.
	*/
	if (msg_data_length < buffer_size_chars - header_size)
	{
		memcpy(msg_data, temp, msg_data_length - header_size);
	}
	/* Otherwise, it only contains the start of the message. */
	else
	{
		memcpy(msg_data, temp, buffer_size_chars - header_size);
		//strcat(msg_data, temp);
	}

	/*
	If the message contains a NUL character printf is not suitable here or in the while loop.
	*/
	printf("%s", msg_data);

	/* Position is how many bytes we have parsed. */
	position = buffer_size_chars - header_size;
	/*
	Keep parsing until position equals the size indicated in the header.
	*/
	while (position < msg_data_length)
	{
		remaining_chars = msg_data_length - position;
		bzero((char *) buffer, buffer_size);
		if (remaining_chars > buffer_size_chars)
		{
			n = read(sock, buffer, buffer_size_chars);
		}
		else if (remaining_chars > 0)
		{
			n = read(sock, buffer, remaining_chars);
		}
		if (errno == EAGAIN)
		{
			/* Connection closed due to read timeout. */
			fprintf(stderr, "Error: Timeout while waiting for rest of data.\n");
			printf("--------------------------------------------------------------------------------\n");
			exit(1);
		}
		if (n < 0)
		{
			error("subsequent read");
		}
		printf("%s", buffer);
		position = position + n;
	}

	printf("\n--------------------------------------------------------------------------------\n");

	char ack_msg[] = "Message received.\n";

	n = write(sock, ack_msg, sizeof(ack_msg));
	if (n < 0)
	{
		error("write");
	}
}

Next Step

I now have the code for a working TLS client and a functioning TCP server.

Before I start work on integrating everything into a single program (the service) the next thing I want to work on is parsing XML.

The EPP client will need to be able to understand XML to ensure what it is sending is valid EPP (and therefore valid XML after the header is stripped) and it will need to understand XML so it can manage the connection to the EPP server.

I'm not yet quite sure where I'm going to take this project, but unlike when I programmed an IRC client in C#, there is something more to this than just a me looking into a protocol I use and seeing if I can understand how it really works (and actually use that knowledge).

I am a Nominet registrar. Having a Nominet-compatible EPP client that I understand the workings of because I coded it could be useful.

Although there are already some EPP clients out there, such as Net::DRI, what I want is something simple that hears EPP and speaks EPP.

I suppose it could be said what I want is the EPP equivalent of imapproxy—PHP doesn't do persistent connections, so stick a proxy in there that can. Again, there are proxies out there, such as Net::EPP::Proxy, I want to write something from scratch in C.

I haven't really programmed in C before, so this is a rather complex experiment for me, but my hope is for an entire backend that is programmed by me.