TL;DR: Money & Time Link to heading

Backend is all about cost & latency. Pick wrong, and you’re burning money or making users wait.

1. Serialization & Deserialization Link to heading

  • What? Dump structured data (objects, maps) into something that can be stored or sent (JSON, Protobuf, Avro, MessagePack, etc.).
  • Latency? JSON is slow, Protobuf is fast but needs a schema.
  • Cost? JSON bloats (human-readable = wasteful), Protobuf saves bytes & speeds up transmission.
  • Used in? APIs, storage, message queues, logs.

2. Encoding & Decoding Link to heading

  • What? Convert data for representation (text/binary conversion, Base64, UTF-8, gzip, etc.).
  • Latency? Adds CPU overhead. Base64 inflates data size (~33% bloat).
  • Cost? Compress where possible (gzip/zstd = lower network cost), but watch CPU load.
  • Used in? Sending binary in JSON (bad idea), compressing responses, file storage.

3. Marshalling & Unmarshalling Link to heading

  • What? Serialization + protocol-specific packaging (headers, metadata, RPC framing).
  • Latency? Adds overhead. gRPC (Protobuf + HTTP/2) is fast; XML-based SOAP is a relic.
  • Cost? Wrong marshalling can kill perf (e.g., JSON API with large nested structures).
  • Used in? gRPC, RPC frameworks, API requests.

4. Parsing Link to heading

  • What? Convert raw data (text, structured formats) into something usable.
  • Latency? JSON parsing is slow (use simdjson for a speed boost), Protobuf/Avro are pre-parsed for efficiency.
  • Cost? High parsing overhead = wasted CPU.
  • Used in? Query engines, config loaders, log ingestion.

Custom Binary Serialization: When & Why? Link to heading

Standard serialization (JSON, Protobuf) works in most cases, but for extreme efficiency—especially when storing data on the same system or between tightly coupled backend services, custom binary serialization can be a game changer. More details in my blog post.

Real-World Backend Cost & Latency Impact Link to heading

Scenario Best Choice Why? (Money & Latency Perspective)
API responses Protobuf/gRPC Smaller payloads, faster parsing, saves bandwidth
Config files JSON (if human-readable), YAML (for simplicity), or binary (for speed) JSON/YAML easy but slow, binary fast but hard to debug
Logs & Events JSON (if small), Avro/Parquet for big data JSON = human-readable but bloated, Avro/Parquet = compressed, schema-aware
Metrics (Remote-Write) Snappy + Protobuf (e.g., Prometheus Remote Write) JSON is too slow & fat for high-throughput metrics
File Storage Avro, Parquet (columnar), Protobuf (row-based) JSON/XML = waste of disk, Avro/Parquet = compression + fast reads
Message Queues Protobuf, Avro, MessagePack Smaller payloads = faster transmission & lower queue pressure
Large Payloads (uploads) Multipart, chunked encoding Avoid Base64

Summary Link to heading

  • Text-based (JSON, YAML, XML): Human-friendly, slow, large.
  • Binary-based (Protobuf, Avro, MessagePack, Parquet): Compact, fast, schema-driven.
  • Compression (gzip, Snappy, Zstd): Saves bandwidth but eats CPU.
  • Wrong choice = wasted money (bandwidth, storage, CPU time, response time).