Implementing a Custom Binary Protocol for your game

Let’s decompose the term “custom binary protocol”. In networking, a protocol is a set of rules about how several entities should communicate. In the context of networked games, it is a convention about how to represent and parse the data that the clients and the server will exchange. Binary means that the data will be represented in raw binary form rather than as a string. Finally, custom indicates that you have to get your hands dirty and do some of the implementation yourself!
Note: This article covers how to implement a custom binary protocol in Javascript in conjunction with Socket.io. Even if you didn’t make your game in Javascript, the main idea discussed here might be of interest to you, as the intuition remains the same and can be implemented in any programming language.

What’s a custom binary protocol and what are the benefits

Typically for browser-based online games, the data will be in JSON format. This format is very handy to represent structured data. Given the complexity of the information that has to be exchanged between the clients and the server of an online game, you will need to structure your data, and therefore rely on JSON.

If you use Socket.io, the structured objects that you send back and forth will be converted to JSON and serialized as strings. Let’s consider the following update coming from the server, in JSON:

{
   "players":{
      "2":{
          "x":190,
          "y":210
       }
   },
   "monsters":{
      "12":{
          "x":200,
          "y":210
       }
   }
}

This is a simple example, where the server instructs the client to update the position of player 2 to (190,210), and of monster 12 to (200,210). Serialized as a string, it would become something like:


"{"players":{"2":{"x":190,"y":210}},"monsters":{"12":{"x":200,"y":210}}}"

This string is 78 characters long, which amounts to 78 bytes if your encoding uses one byte per character. This looks like a lot, given that we basically only wanted to send six integers. The same information could fit in 6 bytes (actually a bit more, but you get the idea)! The 72 extra bytes all come from the variable names and structuring characters. These are extremely important if the update is to be read by a human being, but if the target is a machine, it kinds of feel like a waste.

If your client only expects updates in exactly that format, that is, one position update for one player and one position update for one monster (which is extremely restrictive, but let’s imagine), your could leverage this implicit information and just send the numbers, like so;

"2,190,210,12,200,210"

The result would weight 22 bytes, which is better, but still a lot. The remaining problem is that everything is encoded as a string, ignoring the nature of the transmitted information.

The solution to that is to get rid of the string serialization, and to transmit the raw bytes corresponding to what we want to say (but still in a structured way!). In order to do that, both the client and the server have to agree on a set of rules to encode and decode the information to and from the binary format. This is called a custom binary protocol.

Implementing a custom binary protocol

The source code corresponding to the rest of this article is available here.

The good news is that with Socket.io, it is trivial to send binary data. As the second argument of emit(), you simply need to provide an ArrayBuffer object instead of a normal Javascript one. The question is, how to populate your ArrayBuffer so that it contains the information you want to send, and how to make sure the receiving end can read it properly?

server.js

Before we see how to actually encode/decode the data, we first need to set up the rules that will allow the server and the client to understand each other. As a running example, we will focus on the following JSON object, to transmit as an ArrayBuffer:

var update = {
    playerName : "Mario",
    playerScore : 1000,
    coins : 700,
    x : 200,
    y : 100,
    isAlive : true,
    isPoweredUp : false
};

This example has the advantage to contain a string, four integers and two booleans, which makes for a good exercise.

Defining the schema

js/codec.js

For the receiving end to know how to parse the ArrayBuffer, it needs to know how the data is structured. More specifically, it needs to know what are the properties of the received object, their types, and how many bytes are allocated for each. This is what I call the schema. This information cannot be contained in the Buffer, since at this point the Buffer cannot yet be read. So it has to specified separately, and accessible by both the encoder and the decoder so they can operate consistently.

Note that in the source code, the server sends the message and the client receives it. Therefore, the server is the encoder, and the client is the decoder. But if the client were to send a message to the server, the roles would be reverted. The encoder is simply the sending entity, while the decoder is the receiving one.

Upon receiving a message, the receiving party can look up the schema and begin to parse the message. Below is the schema of our example update packet. The schema is represented as a Javascript object. Note that it is entirely up to you how to represent a schema, as long as it contains the necessary information.

Codec.updateSchema = {
    propertiesBytes: 1, // Size in bytes of the properties bitmask
    strings: ['playerName'],
    booleans: ['isAlive','isPoweredUp'],
    booleanBytes: 1, // How many bytes are used to store the boolean properties
    numerical:{
        playerScore: 2, // Use 2 bytes to represent playerScore
        coins: 2,
        x: 1, // Use 1 byte to represent x coordinate
        y: 1
    }
};

The schema above lists all the different properties that could be encoded in the update, as well as their types and for some the number of required bytes. The rest of this section details the purpose of each line.

We don’t want our messages to be too rigid and always encode the same amount of properties. If a property hasn’t changed, it should not be included in the update message. But if it is not included, the receiving end should not attempt to parse it! A solution for that is to add at the beginning of the ArrayBuffer a bitmask indicating which properties of the schema are indeed present in the message. Basically, we will reserve as many bits as there are properties in the schema, and we will set them to 0 or 1 to indicate if each property is present in the message or not. If there are 4 properties in the schema, we will need a bitmask of 4 bits. For eight properties, 8 bits. And so on and so forth.

In a Javascript ArrayBuffer, we cannot allocate memory on a bit basis, we need to do so in terms of bytes (octets). So we need to round up the size of our bitmask to the nearest byte. If we need 4 bits, we will use a whole byte. If we need 15 bits, two bytes, etc. The field propertiesBytes in the schema indicates how many bytes are reserved for the bitmask. In our example, not counting the boolean properties (see next paragraph), we have 5 properties in the schema (playerName, playerScore, coins, x,y), so we need 5 bits, and therefore 1 byte. If we had 9 properties, we would need 2 bytes, and the value of propertiesBytes would then be 2.

The same logic applies for booleanBytes, which denotes how many bytes are used to store the boolean properties contained in the message. In this case, the individual bits do not indicate if a property is present, but they directly indicate its value, true (1) or false (0). In our example, we have only two boolean properties (isAlive and isPowerdUp), which can fit in a single byte.

Next, strings lists all the string properties of the schema, while booleans lists the boolean ones. That way, the receiving end knows it should expect to decode one string and two booleans, and it even knows how it should name them once decoded.

For the numerical properties, we don’t simply want to list them, we also want to be able to specify on a individual basis how many bytes should be allocated for each. In our example, assume that the x and y coordinates are fixed between 0 and 256. For such small values, a single byte is sufficient. “playerScore” and “coins”, however, exhibit higher values, beyond the range of numbers that a single byte can represent. Therefore, we allow them to use two bytes each (we could use 3 bytes if these numbers were expected to go beyond 65536, and so on).

With that information in hand, it becomes possible to encode and decode our messages consistently.

Encoding

In the source code, Codec.encode() is responsible for the encoding:

Codec.encode = function(message,schema){
    var size = Codec.computeSize(message,schema); // Count how many bytes should be allocated in the buffer
    var buffer = new ArrayBuffer(size);
    Codec.encodeBuffer(message,buffer,schema);
    return buffer;
};

The first thing to do is to compute how many bytes will be needed to fit the message, because this is a parameter that has to be passed when creating a new ArrayBuffer. The size is computed in Codec.computeSize(). It follows a few simple rules:
– For the numerical properties, count as many bytes as indicated in the schema;
– For the string properties, count one byte per character (may be different with a different encoding), plus 1 byte storing the size of each string;
– Reserve ‘booleanBytes’ bytes to store the boolean properties;
– Reserve ‘propertiesBytes’ bytes for the bitmask.

I refer you to the source code to see how it is accomplished in practice. In our example, we have a total of 6 bytes for the numerical properties, 6 bytes for the playerName (5 characters + the length of the string), one byte for the two boolean properties and one byte for the bitmask, for a grand total of 14 bytes.

Once the ArrayBuffer is created, the encoding can begin (and takes place in Codec.encodeBuffer). The ArrayBuffer is not accessed directly, but can be written to or read by using an interface called a DataView, initialized as follows:

var dv = new DataView(buffer); // A DataView is needed to interact with the buffer

A DataView exposes several methods allowing to write various amounts of bytes at specific locations of the buffer. All these methods require an offset as an argument, which corresponds to the number of bytes from the beginning of the buffer at which the read or write operation should take place. For example,

dv.setUint8(2,250)

Will write the 8-bit unsigned integer “250” two bytes after the beginning of the buffer. In the source code, all writing operations (and the recourse to setUint8, setUint16, etc. depending on the property to encode) are wrapped up inside Codec.encodeBytes(), or Codec.encodeString() for strings.

Armed with these methods, the encoding process becomes straightforward. The encoder method iterates over all the properties listed in the schema. For each, it checks if it is actually present in the message to encode or not. If it is, it writes the value of that property (using encodeBytes() for numerical values, encodeString() for strings, or by performing bitwise operations on the reserved byte(s) to encode the booleans). It also updates the bitmask to reflect the presence of the property in the message. If the property is not present, it does nothing and moves on to the next one.

Decoding

If you have understood the encoding process, there is not much to stay for the decoding. Every step mirror the steps of the encoding, using appropriate variations of the methods used previously. The source code should be explicit enough (if not, don’t hesitate to ask in the comments).

One important aspect worth mentioning when implementing a custom binary protocol is that the order in which you encode and decode the properties has to be exactly the same! If the ‘x’ coordinate is the first property in the buffer, the decoder should not attempt to read the player’s name instead. If the order is wrong, this won’t result in an error, but you will get very strange values (abnormal numbers or unreadable strings for example).

Closing remarks

Using a custom binary protocol allows for a more compact representation of structured data. As the result is less readable (not at all readable, actually), make sure in your implementation to be able to toggle the binary encoding on or off easily, for testing and debugging purposes. Also keep in mind that if you change the structure of the data you send, you need to update the corresponding schema accordingly!

The savings achieved with that technique are not tremendous, as all the packets you send, regardless of how you represent your data, will also carry a lot of additional bytes needed for the routing and handling of the packet. An empty TCP packet weights 64 bytes (thanks to Aaron for the link). From your point of view, this is dead weight that you cannot really act upon. Nevertheless, for games which necessitate a lot of interactions per second between the clients and the server, saving a few dozen bytes each time can quickly add up. Combined with other ways of optimizing your data exchanges (such as using an AOI system for example), you can achieve substantial improvements in bandwidth consumption.

Implementation in Phaser Quest

Phaser Quest uses a custom binary protocol similar to what was presented here. The main difference is that it can also encode arrays, maps and custom objects. The encoding of sub-objects is achieved by making recursive calls to the encoding methods (same for the decoding). The code is a bit more complex, but the intuition remains the same. If it is unclear, don’t hesitate to ask about it in the comments!

Jerome Renaux

I'm an independent game developer, working mainly on browser-based multiplayer online game (HTML5 & Javascript). I love to discuss about all aspects of game development so don't hesitate to get in touch!

More Posts

Follow Me:
Twitter

Jerome Renaux

I'm an independent game developer, working mainly on browser-based multiplayer online game (HTML5 & Javascript). I love to discuss about all aspects of game development so don't hesitate to get in touch!