Serialization vs Encoding vs Marshalling

TL;DR: Money & Time Link to heading

Backend is all about cost & latency. Pick wrong, and you’re burning money or making users wait.

1. Serialization & Deserialization Link to heading

What? Dump structured data (objects, maps) into something that can be stored or sent (JSON, Protobuf, Avro, MessagePack, etc.).
Latency? JSON is slow, Protobuf is fast but needs a schema.
Cost? JSON bloats (human-readable = wasteful), Protobuf saves bytes & speeds up transmission.
Used in? APIs, storage, message queues, logs.

2. Encoding & Decoding Link to heading

What? Convert data for representation (text/binary conversion, Base64, UTF-8, gzip, etc.).
Latency? Adds CPU overhead. Base64 inflates data size (~33% bloat).
Cost? Compress where possible (gzip/zstd = lower network cost), but watch CPU load.
Used in? Sending binary in JSON (bad idea), compressing responses, file storage.

3. Marshalling & Unmarshalling Link to heading

What? Serialization + protocol-specific packaging (headers, metadata, RPC framing).
Latency? Adds overhead. gRPC (Protobuf + HTTP/2) is fast; XML-based SOAP is a relic.
Cost? Wrong marshalling can kill perf (e.g., JSON API with large nested structures).
Used in? gRPC, RPC frameworks, API requests.

4. Parsing Link to heading

What? Convert raw data (text, structured formats) into something usable.
Latency? JSON parsing is slow (use simdjson for a speed boost), Protobuf/Avro are pre-parsed for efficiency.
Cost? High parsing overhead = wasted CPU.
Used in? Query engines, config loaders, log ingestion.

Custom Binary Serialization: When & Why? Link to heading

Standard serialization (JSON, Protobuf) works in most cases, but for extreme efficiency—especially when storing data on the same system or between tightly coupled backend services, custom binary serialization can be a game changer. More details in my blog post.

Real-World Backend Cost & Latency Impact Link to heading

Scenario	Best Choice	Why? (Money & Latency Perspective)
API responses	Protobuf/gRPC	Smaller payloads, faster parsing, saves bandwidth
Config files	JSON (if human-readable), YAML (for simplicity), or binary (for speed)	JSON/YAML easy but slow, binary fast but hard to debug
Logs & Events	JSON (if small), Avro/Parquet for big data	JSON = human-readable but bloated, Avro/Parquet = compressed, schema-aware
Metrics (Remote-Write)	Snappy + Protobuf (e.g., Prometheus Remote Write)	JSON is too slow & fat for high-throughput metrics
File Storage	Avro, Parquet (columnar), Protobuf (row-based)	JSON/XML = waste of disk, Avro/Parquet = compression + fast reads
Message Queues	Protobuf, Avro, MessagePack	Smaller payloads = faster transmission & lower queue pressure
Large Payloads (uploads)	Multipart, chunked encoding	Avoid Base64

Summary Link to heading

Text-based (JSON, YAML, XML): Human-friendly, slow, large.
Binary-based (Protobuf, Avro, MessagePack, Parquet): Compact, fast, schema-driven.
Compression (gzip, Snappy, Zstd): Saves bandwidth but eats CPU.
Wrong choice = wasted money (bandwidth, storage, CPU time, response time).