Chapter 2: WebSocket Framer Parser

Introduction

In the last chapter, we saw how an HTTP connection is upgraded to a WebSocket connection using the WebSocket handshake.

In this chapter, we will understand the WebSocket frame format and how to parse it.

What we will cover here

WebSocket frame format
How to parse a WebSocket frame
How to send and receive messages in real-time
I have only covered how to parse text data.

Visualizing WebSocket Frames

I built a WebSocket Frame Visualiser that shows how frames are constructed at the byte level.

Using this tool, you can:

Toggle FIN, opcode, and mask bits
Experiment with payload lengths
See how extended payload lengths work
Build intuition for how frames look on the wire

Understanding the WebSocket Frame Structure

Frame size: 2 bytes are one frame
Payload length
- If 7 bits result in <= 125, then it is the payload length
- If 7 bits result in 126, then need to read next 2 bytes to get the payload length
- If 7 bits result in 127, then need to read next 8 bytes to get the payload length. In this example we are not covering this case.
Fin bit (0th bit)
- If 1, then it is the last frame
- If 0, then it is not the last frame
Reserved bits (1st, 2nd and 3rd bit)
- 1: reserved
- 0: not reserved
Opcode (4 bits)
- 0: continuation frame
- 1: text frame
- 2: binary frame
- 3: close frame
- 4: ping frame
- 5: pong frame
Mask bit (7th bit): Indicates if payload is masked.
- 1 = masked (required for client-to-server)
- 0 = not masked (server-to-client)

How to parse a WebSocket frame

A WebSocket connection is a byte stream, not a message stream. Data can arrive in partial chunks, full frames, or multiple frames together.

Because of this, frame parsing always happens in stages.

Full code: https://github.com/Saurabh-kayasth/websocket-from-scratch/blob/master/src/WebSocketFrameParser.ts

1. Reading the Base Header

The parser first waits until at least 2 bytes are available. These two bytes decide how the rest of the frame should be interpreted.

const fin = (byte1 & 0x80) !== 0;
const opcode = byte1 & 0x0f;
const masked = (byte2 & 0x80) !== 0;
const payloadLength = byte2 & 0x7f;

At this point, we know:

Whether the frame is final
What type of frame it is
Whether masking is applied
How payload length should be resolved

2. Resolving the Actual Payload Length

The 7-bit payload length field can either be:

The actual payload size
Or a signal that the real size is stored next

if (payloadLength === 126) {
  actualLength = buffer.readUInt16BE(offset);
}

This design keeps small messages efficient while still supporting large payloads.

3. Reading the Masking Key

For client-to-server frames, masking is mandatory.

If the mask bit is set, the next 4 bytes are read as the masking key.

const maskKey = buffer.subarray(offset, offset + 4);

4. Extracting and Unmasking the Payload

Once the payload length is known, the payload bytes are read.

If the frame is masked, each byte is unmasked using XOR.

payload[i] = payload[i] ^ maskKey[i % 4];

After this step, the payload represents the original application data.

5. Handling the Parsed Frame

Once decoded, frames are handled based on their opcode:

Text and binary frames → application data
Ping frames → must be answered with pong
Close frames → connection shutdown

Control frames are handled immediately and are not part of the application layer.

6. See It in Action

Open the browser console and run the following code:

const ws = new WebSocket('ws://localhost:5000');

Observe the console output to see the parsed frames.

Summary

In this chapter, we have seen how to parse a WebSocket frame and understand the frame format.

We have also seen how to send and receive messages in real-time.

Introduction#

What we will cover here#

Visualizing WebSocket Frames#

Understanding the WebSocket Frame Structure#

How to parse a WebSocket frame#

1. Reading the Base Header#

2. Resolving the Actual Payload Length#

3. Reading the Masking Key#

4. Extracting and Unmasking the Payload#

5. Handling the Parsed Frame#

6. See It in Action#

Summary#