Introduction

In the last chapter, we saw how an HTTP connection is upgraded to a WebSocket connection using the WebSocket handshake.

In this chapter, we will understand the WebSocket frame format and how to parse it.

What we will cover here

  • WebSocket frame format
  • How to parse a WebSocket frame
  • How to send and receive messages in real-time
  • I have only covered how to parse text data.

Visualizing WebSocket Frames

I built a WebSocket Frame Visualiser that shows how frames are constructed at the byte level.

Using this tool, you can:

  • Toggle FIN, opcode, and mask bits
  • Experiment with payload lengths
  • See how extended payload lengths work
  • Build intuition for how frames look on the wire

Understanding the WebSocket Frame Structure

  • Frame size: 2 bytes are one frame
  • Payload length
    • If 7 bits result in <= 125, then it is the payload length Image-Pieces
    • If 7 bits result in 126, then need to read next 2 bytes to get the payload length Image-Pieces
    • If 7 bits result in 127, then need to read next 8 bytes to get the payload length. In this example we are not covering this case. Image-Pieces
  • Fin bit (0th bit)
    • If 1, then it is the last frame
    • If 0, then it is not the last frame
  • Reserved bits (1st, 2nd and 3rd bit)
    • 1: reserved
    • 0: not reserved
  • Opcode (4 bits)
    • 0: continuation frame
    • 1: text frame
    • 2: binary frame
    • 3: close frame
    • 4: ping frame
    • 5: pong frame
  • Mask bit (7th bit): Indicates if payload is masked.
    • 1 = masked (required for client-to-server)
    • 0 = not masked (server-to-client)

How to parse a WebSocket frame

A WebSocket connection is a byte stream, not a message stream. Data can arrive in partial chunks, full frames, or multiple frames together.

Because of this, frame parsing always happens in stages.

Full code: https://github.com/Saurabh-kayasth/websocket-from-scratch/blob/master/src/WebSocketFrameParser.ts

1. Reading the Base Header

The parser first waits until at least 2 bytes are available. These two bytes decide how the rest of the frame should be interpreted.

const fin = (byte1 & 0x80) !== 0;
const opcode = byte1 & 0x0f;
const masked = (byte2 & 0x80) !== 0;
const payloadLength = byte2 & 0x7f;

At this point, we know:

  • Whether the frame is final
  • What type of frame it is
  • Whether masking is applied
  • How payload length should be resolved

2. Resolving the Actual Payload Length

The 7-bit payload length field can either be:

  • The actual payload size
  • Or a signal that the real size is stored next
if (payloadLength === 126) {
  actualLength = buffer.readUInt16BE(offset);
}

This design keeps small messages efficient while still supporting large payloads.

3. Reading the Masking Key

For client-to-server frames, masking is mandatory.

If the mask bit is set, the next 4 bytes are read as the masking key.

const maskKey = buffer.subarray(offset, offset + 4);

4. Extracting and Unmasking the Payload

Once the payload length is known, the payload bytes are read.

If the frame is masked, each byte is unmasked using XOR.

payload[i] = payload[i] ^ maskKey[i % 4];

After this step, the payload represents the original application data.

5. Handling the Parsed Frame

Once decoded, frames are handled based on their opcode:

  • Text and binary frames → application data
  • Ping frames → must be answered with pong
  • Close frames → connection shutdown

Control frames are handled immediately and are not part of the application layer.

6. See It in Action

  1. Open the browser console and run the following code:
const ws = new WebSocket('ws://localhost:5000');
  1. Observe the console output to see the parsed frames. Image-Pieces

Summary

In this chapter, we have seen how to parse a WebSocket frame and understand the frame format.

We have also seen how to send and receive messages in real-time.