Some of my newsletter subscribers have
asked me a few times what is the easiest way to think about
byte slices, or
using Go’s syntax:
byte. Folks that have experience with low-level
languages, where working with bytes is widespread, usually do not have
challenges comprehending what
byte means and how to use it.
But, if you come from a dynamic or a high-level language background, although everything we do ends up being a bunch of bytes, higher-level languages hide such details from us. Coming from a Ruby background, I sympathize with the folks that find this challenging.
Inspired by the conversations I had with these readers, I decided to try to help out. So, without beating around the bush – let’s explore how we can build a simple TCP-based application protocol. And thoroughly understand bytes in the process.
Standing on shoulders of giants
The protocol that we will be implementing will work on top of TCP, because:
- TCP is high level enough, so we don’t have to worry about too many low-level connection details
- Go has excellent TCP support through the
- The interfaces that the
netpackages provide allow us to play with bytes (and slices of bytes)
- TCP provides certain guarantees when it comes to data transfer, so we are left with less to worry about
- Standing on the shoulders of giants is more uncomplicated than reinventing the wheel
Now, because protocol design is no small feat, we will be implementing a very tiny and toyish protocol. I am by no means a protocol designer, so what we are going to be working on here is merely an example. But, if you have any experience with protocols, please do not hesitate to drop me a comment (or a message) and let me know what I got wrong.
Let us imagine that Slack, the omnipresent collaboration and communication app, wants to turn their chat design into an open internet protocol. Being an open internet protocol would allow anyone to create a Slack-compliant server by implementing the SLCK protocol and build their version of Slack.
In theory, having an open protocol would allow Slack to become distributed, with people hosting their Slack servers. If people would host their SLCK servers, the servers will communicate with a cluster (or inter-server) protocol. With a cluster protocol (between servers) and a client protocol (between client and server), two SLCK servers will have the ability to communicate between themselves and with clients.
But, a cluster protocol is not what we will explore here, as it is significantly more challenging to implement. That’s why we will focus only on the client SLCK protocol.
Now, to make the client SLCK protocol production-ready would be a massive effort, and it is beyond what an article can cover. But, I believe that it’s a great example through which we can learn more about working with bytes.
Without further ado, let’s talk about
SLCK is a text-based wire protocol, meaning that the data that travels on the wire is not binary, but just ASCII text. The advantage of text-based protocols is that a client can practically open a TCP connection to a server that implements the protocol and talk to it by sending ASCII characters.
Clients can connect to and communicate with a SLCK server through a TCP socket, using a set of commands and conventions that we will define next.
SLCK follows a few simple but important conventions:
- It has TLV (type-length-value) format
- It has a control line & a content line
- It uses carriage return (
\r\n) as delimiter
- It contains a fixed list of subjects (tags) representing actions
- And as already mentioned, it uses ASCII encoded text on the wire
Or, if we put all of these conventions in combination, a SLCK message would look like:
So, why an ASCII-encoded text-based wire protocol? Well, the advantage of such protocols is that once the specification (usually called spec) of the protocol is public, anyone can write an implementation. Thus, such a protocol can be the backbone on top of which an ecosystem can be born.
More practically, having a simple protocol makes it easy to work with it. We do
not need fancy clients to talk to a server that implements SLCK. A connection
telnet would suffice, and the messages sent to the server can be
written by anyone that understands the protocol specification, just by hand.
Protocol commands (subjects and options)
SLCK has a few different commands:
|Client||Register as client|
|Client||Join a channel|
|Client||Leave a channel|
|Both||Send or receive a message to/from entity (channel or user)|
|Client||List available channels|
Let’s explore each of them:
When a client connects to a server, they can register as a client using the
REG command. It takes an identifier as an argument, which is the client’s
handle: name of the user
When a client connects to a server, they can join a channel using the
command. It takes an identifier as an argument, which is the channel ID.
channel-id: ID of the channel
Once a user has joined a channel, they can leave the channel using the
command, with the channel ID as argument.
channel-id: ID of the channel
Example 1: to leave the
#general channel, the client can send:
To send a message to a channel or a user, the client can use the
with the channel or user identifier as argument, followed with the body length
and the body itself.
entity-id: the ID of the channel or user
length: payload length
payload: the message body
Example 1: send a
Hello everyone! message to the
Example 2: send a
Hello! message to
To list all available channels, the client can send the
CHNS message. The
server will reply with the list of available channels.
To list all users, the client can send the
USRS message. The server will
reply with the list of available users.
When the server receives a command, it can reply with
OK does not have any text after that, think of it as an
ERR <error-message> is the format of the errors returned by the server to the
client. No protocol errors result in the server closing the connection. That
means that although an
ERR has been returned, the server is still maintaining
the connection with the client.
Example 1: Protocol error due to bad username selected during registration:
ERR Username must begin with @
Example 2: Protocol error due to bad channel ID sent with
ERR Channel ID must begin with #
Implementing a server
Now that we have the basics of the SLCK protocol in place, we can move on to the implementation. There are many ways to create the server, but as long as it implements the SLCK protocol correctly, the clients will not care about what happens under the hood of the server.
In continuation, I will explain my approach to building the SLCK TCP server, and while we are on it, we will learn lots about bytes and slices of bytes.
(If you would like to jump ahead, the full code of the SLCK protocol server implementation can be found in this repo.)
The server design will have four different parts: a client (user), a channel (chat room), a command (from the client to the server), and a hub - the server that manages it all.
Let’s take it from the most straightforward piece towards the most complicated.
Commands are what flows from the clients to the hub. Each received command from
the user, such as
MSG, and the others, has to be appropriately parsed,
validated, and handled.
Each command is of type
command. The type definition is as follows:
The four attributes of the type, including their explanation:
id- the identification of the
command, which can be one of the protocol commands.
recipient- who/what is the receiver of the command. It can be a
sender- the sender of the command, which is the
@usernameof a user.
body- the body of the command sent by the sender to the receiver.
The flow of
commands will be: a
client receives the wire-protocol message,
parses it, and turns it in a
command, that the
client sends to the
command also uses a type
ID, which is an
alias. We use
ID so we can control the valid command types using a constants
clients have to work with the raw strings that they receive from
the network, internally in the server, we map the wire commands to their
constant counterparts. This way, we establish strict control of all the command
types, enforced by Go’s compiler. Using this approach, we assure that the
will always be a valid command type.
Channels in the
SLCK protocol lingo are just chat rooms. It’s worth
mentioning they have nothing in common with Go channels, except the name.
channel is just a
type with two attributes:
name of the channel is just a
string that contains the unique name of
the channel. The
clients map is a set of
*clients that are part of the
channel at a given time. Having the list of clients available allows us to
easily broadcast messages to all clients in the channel, such as:
Which brings us to the
client is a wrapper around the TCP connection. It encapsulates all the
functionality around accepting messages from the TCP connection, parsing the
messages, validating their structure and content, and sending them for further
processing and handling to the
Let’s look closer into the
The four attributes of the
client type, in order:
conn- the TCP connection itself (of type
outbound- a send-only channel of type
command. This channel will be the connection between the
hub, through which the
commands to the
register- a send-only channel of type
*clientthrough which the client will let the
hubknow that it wants to register itself with the
hub(a.k.a. the chat server)
deregister- a send-only channel of type
*clientthrough which the client will let the
hubknow that the user has closed the socket, so the
hubshould deregister the client (by removing it from the clients
mapand from all channels)
username- the username of the user (of type
string) that is sitting behind the TCP connection
If this is a bit confusing, worry not. It will get more evident once you see the whole thing in action.
Now, let’s move on to the
client’s methods. Once we intantiate a
can listen for incoming messages over the TCP connection. To do that, the
client has a method called
read loops endlessly using a
for loop, and accepts incoming messages from
conn attribute (the TCP connection). Once the message (
received, it will pass it on to the
handle method, which will process it.
In case the
err returned is
io.EOF, meaning the user can closed the
client will send notify the
hub through the
hub will remove the deregistered client from the
and from all of the channels that the
client participated in.
Handling bytes in
handle using the
Because of the protocol definition, we know the structure of the commands that
the chat server might receive from the user. That’s what the
handle method does - it get the raw messages from the socket and parses the
bytes to make meaning out of them.
Handling messages is where we get to see the slice of bytes (
byte) type in
action. So, what happens here? Let’s break it down.
Given that our SLCK protocol is a text-based wire protocol, the bytes that are flowing on the TCP connection are, in fact, plain ASCII text. Each byte (or octet, because a byte is eight bits) in the decimal number system has a value between 0 and 255 (2 to the power of 8). That means that each of the octets can contain any of the characters in the extended ASCII encoding. (Refer to this ASCII table to see all of the available characters.)
Having a text-based protocol allows us to easily convert each of the bytes that
arrive through the TCP connection into a meaningful text. That’s why each
byte in the
byte slice represents one character. Because each byte in the
byte is a character, converting a
byte in a string is as easy as:
s := string(slice).
And Go is good at handling bytes. For example, it has a
bytes package which lets us work with
byte, instead of converting them into
strings every time we want to work
Given that all SLCK commands begin with a single word separated with space
after it, we can simply take the first word from the
byte, upcase it and
compare it with the valid keywords of the protocol. “But, how are we supposed
to take a word from a slice of bytes?” you might ask. Since bytes are not
words, we have to resort to either compare them byte-by-byte or use the
bytes package. To keep things simple, we will use the
package. (You can check this snippet
to compare the two approaches.)
In the first line of the
handle, method we take the first part of the
received message, and we upcase it. Then, on the second line, we remove the
first part from the rest of the message. The split allows us to have the
cmd) and the rest of the command arguments (
args) in separate
After that, in the
switch construct, we handle all of the different commands.
For example, handling the
REG command is done using the
reg, and the
reg method takes the
args slice, and it removes any space bytes (using
bytes.TrimSpace). Given that the second argument of the
REG command is the
@username of the user, it checks if the passed username begins with
if it’s blank. Once it does that, it converts the username to a string, and it
assigns it to the client (
c) itself. From then on, the client has an assigned
As a second step, it sends the client itself through the
This channel is read by the
hub (the chat server), which will do more
validation of the username before it successfully registers the client.
err func simply takes an error and sends its contents back to the user,
using the underlying TCP connection of the client.
We will come back to the other commands and methods once we have thoroughly explored the chat server.
hub is the central entity that the clients connect and register with.
hub also manages the available channels (chat rooms), broadcasting
messages to said channels, and relaying messages (private/direct messages)
All of the above functionality means that the
hub is the central place of all
communications, hence the name.
First, let’s explore the
hub type with all of its attributes:
The attributes, and their explanations, in order:
mapof the channels (chat rooms), with the name of the channel as the key and the
mapof the clients (connected users), with the username as the key and the
commands- a channel of
commandthat are flowing from the clients to the
hub, that the
hubwill validate and execute
deregistrations- a channel of
*clientthrough which a client deregisters itself, through which the
hubwill be informed that the user has closed the connection and it will clean up any references to that
registrations- a channel of
*clientthrough which new clients register themselves to the
hub, through which the
hubwill accept the new client, validate their
usernameand add them to the
So, how does the
hub function? It all begins with the
When we establish a new
hub instance (which we will see later), we execute
run method in a goroutine. The goroutine will run the
indefinitely, processing the
deregistrations, and the
commands channels. Messages arriving through the
deregistrations channels will be handled differently from the messages that
will come from the
run will receive messages through the
registrations channel, and it will
send them to the
register method for processing:
register method will check if the
hub already has a user with the given
username, and it will react accordingly. If the username is taken, it will
remove the username from the
client and respond with an error. If the
username is not taken, then it will add the
client to the
clients map, with
the username as a key and the
client reference as a value.
run will receive messages through the
deregistrations channel, and it will
send them to the
deregister method for processing:
deregister method will check if the
hub already has a user with the
given username. If it finds the user, it will remove it from the
clients map. Also, it will go through the map of
channels and it will try
to remove it from each of the
When it comes to handling commands, things are more different. Each of the
commands, as we already established, has an
id attribute. For each of the
commands that we receive, we do a switch on the
id attribute, which will
invoke a different method. For example, to join a channel, the
id must be of
JOIN, which will invoke the
joinChannel function, with the
joinChannel function receives the username (
u) and the channel (
arguments. Then, if it finds the channel, it will add the
client to the
clients map. Otherwise, it will first create the channel, using the
newChannel constructor, and then add the
client as the first client to the
Now, let’s zoom out and see how a
client wraps a TCP connection. Then, we
will see how bytes are flowing from the user, through the
client, to the
hub ending up with the receiver (another channel or user).
The core functionality to a chat server and the purpose of our SLCK protocol is sending and receiving messages. Let’s follow the flow of the bytes and see how we can implement sending messages between clients.
The structure of the
MSG command is as follows:
For example, to send a
Hello! message to the
Or, to send a
Hey! message to
Once the user sends the
MSG command, the
handle method accepts
it. Then, in
handle we extract the message and the command, and we invoke the
msg method of the
args are passed to the
msg method, which does some heavy lifting.
msg method’s first step is to check if the first argument of the message
begins with a
# – a user or a channel name. If that’s correct, we
recipient, which can be the username or the channel name.
We extract the next argument after, which, according to the protocol specification, is the length of the body in bytes. Having the size of the body (the length of the bytes) as part of the command that is sent by the client allows the server to slice off the bytes it needs from the body efficiently.
Let’s see an example:
MSG #general 39\r\nHey folks, hope you are all doing well!
handle method, we sliced off
MSG, and we send the rest of the bytes
msg method. In
msg, we checked if the next argument is a channel or
a username – which is correct. Then, we pick up the
39, and we store them in
39 is not enough – the slice of bytes that represent the
[51 57]. The
57 bytes just mean that the two octets
9 in ASCII, have byte representation as
To make our Go code understand
[51 57] like
39, we have to convert them
string, so they become
"39" and then use the
strconv package to
string to an
Once we have the
length of the body, we validate that it will be at least one
byte. Next, we take the remaining bytes from the
args, and we slice off the
length amount of bytes from the
In context of our example above:
MSG #general 39\r\nHey folks, hope you are all doing well!
we take the length (2) of the
39 plus the length of the
\r\n delimiter and
then take the body out of the
args slice of bytes by using the “slice”
:). The slicing operation results in slicing all of the bytes
\n to the end of the body, meaning the
length were less then
39, then the
body would end up being
shorter, because the user has sent the wrong body size to the server.
Conversely, if we tried to slice off more than the size of the
body, then the
goroutine serving the
hub would crash, rendering our server useless.
body now contains the message itself, the last step of the
method is to send the new command it received through the
outbound channel to
command has a
recipient (channel or user ID), the
sender, which is
the username of the message author, the
body containing the
body of the
message, and the
id that’s the identifier of the
MSG in this
hub which infinitely loops and reads from the
(which is the same channel as the
outbound channel of the
pick up the message from the
client and process it in the
message method will check if the sender username (
u) is present in the
list of active clients (
h.clients). If the client is active, then it will
check if the first byte of the message (
m) is a
# or a
@. Based on the
result of the
switch, either it will broadcast the message (
m) to the
channel, or it will find the recipient (
r) from the
h.clients list and send
the message through the recipient’s TCP connection.
Let’s see this in action. I will start the server (using
go run .) and open
telnet sessions with the server:
Given that SCLK is a ASCII wire protocol, it means we can just send some
commands to the server right away. Let’s first register both clients. The first
one will be
And the second client will be
In both cases the registration with the server went well, so the server replied
OK. Now, let’s make both of the clients join the
#general channel. If
any of the users listed the channels they would get an error:
Great, let’s make them both join
#general, so the channel would be created.
@jane righ after:
CHNS #general JOIN #general OK
Now that both are in the channel, we can also send a message from
#general, and it should pop up on
@john’s screen too.
Sending the message:
@john’s screen we can see:
@john received the message. Let’s say
@john would like to send
@jane a direct message. He sends:
@jane’s screen she will see:
Tying it all together
Now that we went through the
command, we need
to see the last piece of the puzzle - the
main function that ties it all
main func, we will initialize a TCP listener, through which we can
accept a new TCP connection. Then, we will establish a new
hub and invoke
run in a separate goroutine.
hub is running, we will infinitely loop using a
for. Within the
for loop, we will accept new TCP connections, wrap them in a new
spin them off in a new goroutine.
As a refresher, within the
read function of the
client, we also infinitely
loop using a
for and accept new incoming TCP messages:
read method being run a separate goroutine allows us to
spin off as many goroutines as we have connections. That leaves our main thread
to accept new connections and just spin the off into goroutines. Once the
goroutine is running, it will take care of itself until it crashes or the
The pitfall of this approach is that we have a single
hub instance, which
means that there’s only one goroutine that is accepting messages from what can
be thousands of clients. While having a single
hub instance simplifies the
design, it also introduces a single point of failure. If the
goroutine crashes/exits for whatever reason, the server will be rendered
useless, although all of the client connections will be working fine.
The full code of the SLCK protocol server implementation can be found in this repo.
Before we wrap up here, I would like to highlight a few of the shortcuts we took while building this server implementation. Cutting these corners was with a purpose - not making this long article even longer.
First, we are missing resource locking when creating the channels or when a user joins/leaves a channel. If multiple people would join the same channel at the same time, it is possible to get a concurrent writes issue.
Second, our server does not have a graceful shutdown. A production-ready implementation would gracefully shut down all of the connections, informing the clients about the shutdown. Then, it would potentially save some state on disk before shutting down.
Another shortcut we took was validation of the body size in the
When we are performing the slicing of the message body, we do not take into
consideration if there are enough bytes in the message. If a client sends a
body size larger then the body itself, we might slice off more bytes than
available, which would result in a
panic and a slice out of bounds error.
If you would like to play more with our chat server, I recommend starting with adding each of these missing functionalities to it. And drop me the link to the repo in the comments, so I can see how you pulled it off.
- 2020-04-04 10:10UTC - Fixed the
clienttype definition, which was missing the
deregisterchannel, as pointed out by Rene C. over email.