TL;DR: Money & Time Link to heading
Backend is all about cost & latency. Pick wrong, and you’re burning money or making users wait.
1. Serialization & Deserialization Link to heading
- What? Dump structured data (objects, maps) into something that can be stored or sent (JSON, Protobuf, Avro, MessagePack, etc.).
- Latency? JSON is slow, Protobuf is fast but needs a schema.
- Cost? JSON bloats (human-readable = wasteful), Protobuf saves bytes & speeds up transmission.
- Used in? APIs, storage, message queues, logs.
2. Encoding & Decoding Link to heading
- What? Convert data for representation (text/binary conversion, Base64, UTF-8, gzip, etc.).
- Latency? Adds CPU overhead. Base64 inflates data size (~33% bloat).
- Cost? Compress where possible (gzip/zstd = lower network cost), but watch CPU load.
- Used in? Sending binary in JSON (bad idea), compressing responses, file storage.
3. Marshalling & Unmarshalling Link to heading
- What? Serialization + protocol-specific packaging (headers, metadata, RPC framing).
- Latency? Adds overhead. gRPC (Protobuf + HTTP/2) is fast; XML-based SOAP is a relic.
- Cost? Wrong marshalling can kill perf (e.g., JSON API with large nested structures).
- Used in? gRPC, RPC frameworks, API requests.
4. Parsing Link to heading
- What? Convert raw data (text, structured formats) into something usable.
- Latency? JSON parsing is slow (use
simdjson
for a speed boost), Protobuf/Avro are pre-parsed for efficiency. - Cost? High parsing overhead = wasted CPU.
- Used in? Query engines, config loaders, log ingestion.
Custom Binary Serialization: When & Why? Link to heading
Standard serialization (JSON, Protobuf) works in most cases, but for extreme efficiency—especially when storing data on the same system or between tightly coupled backend services, custom binary serialization can be a game changer. More details in my blog post.
Real-World Backend Cost & Latency Impact Link to heading
Scenario | Best Choice | Why? (Money & Latency Perspective) |
---|---|---|
API responses | Protobuf/gRPC | Smaller payloads, faster parsing, saves bandwidth |
Config files | JSON (if human-readable), YAML (for simplicity), or binary (for speed) | JSON/YAML easy but slow, binary fast but hard to debug |
Logs & Events | JSON (if small), Avro/Parquet for big data | JSON = human-readable but bloated, Avro/Parquet = compressed, schema-aware |
Metrics (Remote-Write) | Snappy + Protobuf (e.g., Prometheus Remote Write) | JSON is too slow & fat for high-throughput metrics |
File Storage | Avro, Parquet (columnar), Protobuf (row-based) | JSON/XML = waste of disk, Avro/Parquet = compression + fast reads |
Message Queues | Protobuf, Avro, MessagePack | Smaller payloads = faster transmission & lower queue pressure |
Large Payloads (uploads) | Multipart, chunked encoding | Avoid Base64 |
Summary Link to heading
- Text-based (JSON, YAML, XML): Human-friendly, slow, large.
- Binary-based (Protobuf, Avro, MessagePack, Parquet): Compact, fast, schema-driven.
- Compression (gzip, Snappy, Zstd): Saves bandwidth but eats CPU.
- Wrong choice = wasted money (bandwidth, storage, CPU time, response time).