JSON vs YAML vs TOML vs CSV vs Protobuf: Developer's Guide

You're building an API and defaulting to JSON because everyone does. The endpoint works fine. Then it needs to handle large payloads at high throughput, and suddenly the bottleneck is payload parsing. Or you're writing a config file and three months later a new team member can't figure out what half of it means because there are no comments allowed. Or you're exporting data to finance and the CSV import broke because one field contained a comma. Every data format is a tradeoff. JSON is not always the right answer. Neither is anything else. The question is which tradeoffs match your actual constraints. This guide covers six formats you'll encounter in production work — what each one is, where it fits, and how to recognize when you're reaching for the wrong tool. JSON (JavaScript Object Notation) is the lingua franca of the web. It's human-readable, supported natively in JavaScript, and has library support in every language worth using. Every REST API you interact with today almost certainly speaks JSON. What you get: text-based, nested key-value pairs, arrays, strings, numbers, booleans, and null. No comments. No dates (you serialize them as strings). No binary data (you base64-encode it). No schema enforcement unless you add one separately. What you lose: parsing speed at scale, compact wire size, type safety without tooling. // A typical API response const order = { id: "0195d3a2-f8c0-7b4e-8f32-1a2b3c4d5e6f", customerId: "cust_01234", status: "pending", amount: 4999, currency: "EUR", items: [ { sku: "WIDGET-001", quantity: 2, unitPrice: 2499 }, { sku: "WIDGET-002", quantity: 1, unitPrice: 1 }, ], createdAt: "2026-04-23T09:12:00Z", }; // Serialize / parse — built-in in Node.js, zero dependencies const serialized = JSON.stringify(order); const parsed = JSON.parse(serialized) as typeof order; Pros: Universal support. Browser-native. Human-readable and editable. Trivial to debug — paste it into any JSON formatter. Schema validators exist (JSON Schema, Zod, Ajv). Cons: No comments. Verbose — field names repeat on every object in an array. Numbers lose precision above 2^53 (use strings for large integers). Dates are strings by convention, not by type. Parsing is CPU-intensive at high volumes. Use it when: building REST APIs, storing documents in PostgreSQL/MongoDB, passing data between services that don't share a schema definition, any context where a human might need to read the raw payload. Avoid it when: you're parsing large payloads (100KB+) at high req/sec and it shows up in your profiler, binary data is common, or payload size is a hard constraint (mobile clients, IoT). YAML (YAML Ain't Markup Language) is a superset of JSON (as of YAML 1.2) designed for human-writable configuration. It replaces braces and quotes with indentation and supports comments — two things JSON deliberately omits. You have already written a lot of YAML: GitHub Actions workflows, Docker Compose files, Kubernetes manifests, Helm charts, swagger.yaml. # Same order data as above — notice: no quotes required on strings, # no commas, and comments are allowed order: id: 0195d3a2-f8c0-7b4e-8f32-1a2b3c4d5e6f customer_id: cust_01234 status: pending amount: 4999 currency: EUR items: - sku: WIDGET-001 quantity: 2 unit_price: 2499 - sku: WIDGET-002 quantity: 1 unit_price: 1 created_at: "2026-04-23T09:12:00Z" # quoted to prevent datetime parsing Pros: Comments. More readable than JSON for multi-level nesting. Multi-line strings with | (literal) and > (folded) blocks. Anchors and aliases (&anchor, *alias) for DRY config. Cons: Indentation-sensitive — a misplaced space breaks the file silently. The type coercion rules are surprising in YAML 1.1 parsers (still widely used): yes, no, on, off, true, false are all valid booleans, and Norwegian country code NO parses as false. YAML 1.2 removed most of this, but not all parsers have caught up. Numbers with leading zeros parse as octal in YAML 1.1. Slow to parse relative to JSON. Not suitable for machine-generated data. # YAML 1.1 footguns (Python's PyYAML, Ruby's Psych <4, many older tools) country: NO # parses as false in YAML 1.1 parsers version: 012 # parses as octal 10 in strict parsers port: 8080 # integer, not string — may surprise you api_key: 1234e5678 # parses as scientific notation float Use it when: writing configuration files that humans edit frequently — CI/CD pipelines, Docker Compose, infrastructure definitions, application configs where comments add real value. Avoid it when: the data is machine-generated, the people editing it aren't developers, or you need strict type guarantees. YAML's implicit type coercion has caused real production incidents. TOML (Tom's Obvious, Minimal Language) was designed specifically to fix YAML's footguns while remaining human-readable. It's unambiguously typed, not indentation-sensitive, and has no surprising coercion rules. You know TOML from Cargo.toml (Rust), pyproject.toml (Python packaging), and increasingly from application config files in Go projects. # Same data — notice explicit types and section headers [order] id = "0195d3a2-f8c0-7b4e-8f32-1a2b3c4d5e6f" customer_id = "cust_01234" status = "pending" amount = 4999 currency = "EUR" created_at = 2026-04-23T09:12:00Z # native datetime type — no quoting needed [[order.items]] sku = "WIDGET-001" quantity = 2 unit_price = 2499 [[order.items]] sku = "WIDGET-002" quantity = 1 unit_price = 1 Pros: Native datetime type. Explicit strings require quotes — no NO becoming false. Integers, floats, booleans, and datetimes are distinct types. Comments. Not indentation-sensitive — sections are flat. Easier to write tooling for than YAML. Cons: Less suitable for deeply nested structures — the [[array.of.tables]] syntax becomes awkward. Fewer language libraries than JSON or YAML. Not useful for data interchange (only configuration use cases make sense). # pyproject.toml — real example of TOML for project config [build-system] requires = ["hatchling"] build-backend = "hatchling.build" [project] name = "eu-vat-rates-data" version = "2026.4.23" description = "EU VAT rates dataset" requires-python = ">=3.9" dependencies = [] [project.urls] Homepage = "https://github.com/vatnode/eu-vat-rates-data" Use it when: writing application-level configuration that developers own and edit — database configs, server settings, CLI tool configuration. Especially appropriate for Rust and Python projects where the ecosystem expects it. Avoid it when: the config is deeply nested (YAML is more concise for that), or the consumers are non-developers who expect a simpler format. CSV (Comma-Separated Values) is the lowest common denominator of data exchange. Every spreadsheet application on earth reads it. Every database can export it. It's the format that finance, operations, and non-technical stakeholders actually use. CSV has no official specification. RFC 4180 is the closest thing to a standard, and most implementations deviate from it in some way. id,customer_id,status,amount,currency,created_at 0195d3a2-f8c0-7b4e-8f32-1a2b3c4d5e6f,cust_01234,pending,4999,EUR,2026-04-23T09:12:00Z 0195d3a2-f8c0-7b4e-8f33-2b3c4d5e6f7a,cust_05678,completed,12000,EUR,2026-04-23T09:15:00Z CSV is flat. No nesting, no arrays, no objects. One row = one record. Every value is a string unless the consuming application interprets it otherwise. // Parsing CSV in Node.js — use a library, never hand-roll the parser import { parse } from "csv-parse/sync"; import { stringify } from "csv-stringify/sync"; import { readFileSync, writeFileSync } from "fs"; interface OrderRow { id: string; customer_id: string; status: string; amount: string; // CSV has no number type — always a string currency: string; created_at: string; } // Parse const raw = readFileSync("orders.csv", "utf-8"); const rows = parse(raw, { columns: true, // use first row as headers skip_empty_lines: true, trim: true, }) as OrderRow[]; const orders = rows.map((row) => ({ ...row, amount: parseInt(row.amount, 10), // explicit conversion required })); // Generate const output = stringify(orders, { header: true, columns: ["id", "customer_id", "status", "amount", "currency", "created_at"], }); writeFileSync("orders-export.csv", output); Pros: Universal tool support (Excel, Google Sheets, LibreOffice). Trivial to generate. Small file size for flat tabular data. Non-developers can open and understand it immediately. Cons: No types. No nesting. No schema. Delimiter conflicts (a field containing a comma breaks naive parsers). Encoding issues (UTF-8 vs. Windows-1252 for European characters). No standard for null vs. empty string. Excel auto-converts strings that look like dates or numbers. The encoding issue is particularly sharp for European projects — Finnish names with ä, ö, ü, and similar characters are frequently mangled when the exporting system uses UTF-8 and the importing system expects Windows-1252. Always specify encoding explicitly and add a UTF-8 BOM if Excel is in the receiving chain. Use it when: exporting data for non-developers (finance reports, operations data, client-facing exports), importing data from third-party systems that only speak CSV, or any context involving a spreadsheet. Avoid it when: the data is hierarchical, types matter at parse time, or the pipeline is fully automated without human review. Protocol Buffers (Protobuf) is a binary serialization format developed by Google and used internally across their infrastructure. It's the wire format for gRPC. You define a schema in a .proto file, compile it to code, and serialize/deserialize with generated functions. The schema definition is not optional — it's the point. Protobuf enforces structure at compile time, not runtime. // order.proto syntax = "proto3"; package commerce; message OrderItem { string sku = 1; int32 quantity = 2; int32 unit_price = 3; } message Order { string id = 1; string customer_id = 2; string status = 3; int32 amount = 4; string currency = 5; repeated OrderItem items = 6; int64 created_at_unix = 7; // Unix timestamp — no native datetime in proto3 } // Using @bufbuild/protobuf v2 (buf's modern TypeScript runtime) // npm install @bufbuild/protobuf // Generate code: npx buf generate import { create, toBinary, fromBinary } from "@bufbuild/protobuf"; import { OrderSchema } from "./generated/order_pb"; // Serialize const order = create(OrderSchema, { id: "0195d3a2-f8c0-7b4e-8f32-1a2b3c4d5e6f", customerId: "cust_01234", status: "pending", amount: 4999, currency: "EUR", createdAtUnix: BigInt(Math.floor(Date.now() / 1000)), // seconds, not milliseconds items: [{ sku: "WIDGET-001", quantity: 2, unitPrice: 2499 }], }); const bytes = toBinary(OrderSchema, order); // Uint8Array — compact binary // Deserialize const decoded = fromBinary(OrderSchema, bytes); console.log(decoded.amount); // 4999 — typed integer, not a string Size comparison: the same order object serialized to JSON is roughly 280 bytes. In Protobuf, it's around 80 bytes — about 70% smaller. Parsing speed: Protobuf deserialization is typically 2–6x faster than JSON parsing in Node.js (higher in Go or C++). The gap widens with payload size and narrows with small messages. At sustained high throughput, the difference is measurable. Pros: Compact binary. Fast serialization/deserialization. Strict schema enforced at compile time. Field numbers enable backward-compatible schema evolution. Language-agnostic generated code (Go, Java, Python, TypeScript, C++, and many more). Cons: Not human-readable — you cannot curl an endpoint and read the response. Requires a .proto file and a code generation step. Schema changes must be managed carefully (never reuse field numbers). Higher operational complexity: the .proto files become an API contract that must be versioned and distributed. Use it when: building internal service-to-service communication where both sides control the schema, performance is a hard requirement (high-frequency trading, telemetry pipelines, game servers), or you are already using gRPC. Avoid it when: external clients need to inspect payloads, the schema changes frequently and the tooling overhead is a burden, or the team is small and the setup cost exceeds the performance benefit. MessagePack describes itself as "like JSON but fast and small." That's accurate. It's a binary format that maps directly to JSON's data model — objects, arrays, strings, numbers, booleans, null — but uses a compact binary encoding instead of text. The practical advantage over JSON: no parsing of text. Numbers are binary-encoded in 1–8 bytes depending on size. Strings do not need quote characters or escape sequences. There is no code generation step, no schema file — you serialize a native data structure directly. // npm install @msgpack/msgpack import { encode, decode } from "@msgpack/msgpack"; const order = { id: "0195d3a2-f8c0-7b4e-8f32-1a2b3c4d5e6f", customerId: "cust_01234", status: "pending", amount: 4999, currency: "EUR", createdAt: new Date("2026-04-23T09:12:00Z"), // @msgpack/msgpack serializes Date via extension type (library-specific) items: [{ sku: "WIDGET-001", quantity: 2, unitPrice: 2499 }], }; // Serialize — returns Uint8Array const packed = encode(order); console.log(packed.byteLength); // ~160 bytes vs ~280 bytes JSON (for this example — actual savings depend on key lengths) // Deserialize const unpacked = decode(packed) as typeof order; MessagePack has extension types — a mechanism for encoding types that JSON cannot represent natively: arbitrary binary, bigint, and custom types. Date support via extension types is library-specific (available in @msgpack/msgpack, not guaranteed by the spec). This solves one of JSON's most annoying limitations without requiring a separate schema system. // Custom extension type for Decimal values (useful for financial data) import { encode, decode, ExtensionCodec } from "@msgpack/msgpack"; import Decimal from "decimal.js"; const extensionCodec = new ExtensionCodec(); extensionCodec.register({ type: 1, encode: (input: unknown): Uint8Array | null => { if (input instanceof Decimal) { return new TextEncoder().encode(input.toString()); } return null; }, decode: (data: Uint8Array): Decimal => { return new Decimal(new TextDecoder().decode(data)); }, }); const payload = { amount: new Decimal("49.99"), currency: "EUR" }; const packed = encode(payload, { extensionCodec }); const unpacked = decode(packed, { extensionCodec }) as typeof payload; // unpacked.amount is a Decimal instance, not a float Pros: No schema required — drop-in replacement for JSON in most internal APIs. Roughly 30–50% smaller payloads than JSON for typical objects. Faster parsing than JSON. Binary extension types for Date, Buffer, and custom types. Cons: Not human-readable. Less universal than JSON — requires a MessagePack library on both sides. Smaller ecosystem than JSON Schema for validation. Still slower than Protobuf (no schema means no precomputed field layout). Use it when: you need something faster and smaller than JSON without the operational overhead of Protobuf — internal caching layers, WebSocket message frames, session storage, or any internal API where you control both ends. Avoid it when: external consumers need to inspect payloads, or you need strict schema validation and backward-compatibility guarantees (use Protobuf for that). Property JSON YAML TOML CSV Protobuf MessagePack Human-readable Yes Yes Yes Yes No No Comments No Yes Yes No Yes (.proto) No Native types Limited Limited Rich None Rich Rich Schema required No No No No Yes No Relative payload size 100% ~115% ~110% ~70%* ~30% ~55% Parse speed (relative) 1x ~0.5x ~0.8x ~1.2x ~5–10x ~3–5x Nesting support Yes Yes Limited No Yes Yes Binary data No† No† No No Yes Yes Ecosystem maturity Max High Medium Max High Medium * CSV size advantage applies only to flat tabular data — nested data cannot be represented at all. † JSON and YAML can represent binary as base64-encoded strings, increasing size by ~33%. Before picking a format, answer these questions: 1. Who reads the output? If a human reads it directly — developer, operations engineer, finance team — you need a text format. JSON for API payloads, YAML or TOML for configs, CSV for reports and exports. If it's machine-to-machine only, binary formats are worth considering. 2. Do both sides share a schema definition? If yes, and if performance matters, Protobuf is worth the setup cost. The .proto file becomes a contract that both sides compile against — changes are caught at build time, not in production. If no, JSON or MessagePack are more pragmatic. MessagePack if wire size matters; JSON if debuggability matters more. 3. Is the data hierarchical or flat? Flat tabular data with a fixed set of columns — CSV. Any nesting at all — everything else. 4. What does the team already know? A team that has never used Protobuf will spend a week on tooling setup and schema management before writing any business logic. That cost is real. For a two-person startup, JSON everywhere plus MessagePack for the one hot path is a better tradeoff than Protobuf across the board. 5. What are the performance constraints? If parsing JSON is not showing up in your profiler, you do not have a JSON performance problem. Profile first, optimize second. I've seen teams add Protobuf to internal services handling 100 requests a day because they read a benchmark post. 6. Is the config user-editable? If yes: TOML for application config (especially in Rust or Python projects), YAML for infrastructure config (GitHub Actions, Docker, Kubernetes). Both support comments. Neither is appropriate as a data interchange format. Situation Format REST API, external consumers JSON Internal high-throughput API, shared schema Protobuf Internal API, no schema file wanted MessagePack CI/CD pipeline, Kubernetes, Docker Compose YAML Application config, developer-editable TOML Export to Excel / non-developer stakeholders CSV Database dump for data pipeline CSV or JSON (newline-delimited) WebSocket message frames at scale MessagePack Cache storage (Redis, Memcached) MessagePack The mistake I see most often is defaulting to JSON for everything and then treating the performance or size problem as a framework problem. It's usually a format problem. The second most common mistake is introducing Protobuf prematurely, before the team has the tooling discipline to manage .proto files across multiple services. Use JSON as your default. Add MessagePack when you measure a real parsing bottleneck or a real payload size constraint. Add Protobuf when you have multiple services that need a strict contract and the engineering time to maintain it. Use YAML for infrastructure config. Use TOML for application config. Use CSV for anything that ends up in a spreadsheet. The format you choose is the one both sides of your system have to live with. Pick boring over clever until boring stops working. Format decisions compound. In the eu-vat-rates-data open-source project, I publish the same dataset to five ecosystems — each one expects data in a different idiomatic form. Getting that right up front meant zero cross-registry bugs. If you need API integrations that bridge format boundaries cleanly, or a technical consultation on a data architecture decision, get in touch. Further reading: RFC 4627 — The application/json Media Type RFC 4180 — Common Format for CSV Files TOML specification Protocol Buffers documentation MessagePack specification msgpack/msgpack npm package buf — modern Protobuf toolchain One Package, Five Registries: How I Maintain eu-vat-rates-data — how TOML, JSON, and Python packaging work together in a multi-ecosystem project UUID v7 in Production: Why Your Database Hates v4 — identifier format decisions for database-heavy systems

JSON vs YAML vs TOML vs CSV vs Protobuf: Developer's Guide

Key Takeaways

Related Articles

I built an entire agency management platform by myself. Here's what actually happened.

AI For Fun! Électrique Chats at Hack the Kitty, Built with Kiro.

디지털 최전선, 시험대에 오르다: 암호화폐와 AI 시대, 데이터 신뢰성, 지정학적 갈등, 알고리즘 불투명성 헤쳐나가기

Why I'm Building the Fast Series

Discussion

JSON vs YAML vs TOML vs CSV vs Protobuf: Developer's Guide

Key Takeaways

Related Articles

I built an entire agency management platform by myself. Here's what actually happened.

AI For Fun! Électrique Chats at Hack the Kitty, Built with Kiro.

디지털 최전선, 시험대에 오르다: 암호화폐와 AI 시대, 데이터 신뢰성, 지정학적 갈등, 알고리즘 불투명성 헤쳐나가기

Why I'm Building the Fast Series

Discussion

Related Articles

Dev.to
I built an entire agency management platform by myself. Here's what actually happened.
I used to deliver food on Zepto. 14-15 hours a day. Sun, rain, didn't matter. I saved up, bought a laptop, and started doing video editing for clients. That's when things got messy. I was managing clients on WhatsApp. Tracking who paid me in Google Sheets. Sending invoices as PDF attachments that no

Dev.to
AI For Fun! Électrique Chats at Hack the Kitty, Built with Kiro.
A cat astrologer, spec-driven and running on Amazon Bedrock A companion to A Builder in Paris: Do Devs Dream of Électrique Chats? Last month I wrote about the idea. Six rainy days in Paris, a closed laptop, and a hackathon I did not mean to enter, and somewhere between the Musée de l'Orangerie a

Dev.to
디지털 최전선, 시험대에 오르다: 암호화폐와 AI 시대, 데이터 신뢰성, 지정학적 갈등, 알고리즘 불투명성 헤쳐나가기
디지털 자산과 인공지능 분야는 핵심 기술은 다르지만, 데이터의 진실성, 규제 체계, 지정학적 함의에 대한 공통된 도전에 직면하며 점차 수렴하고 있다. 최근 일련의 사건들은 탈중앙화와 첨단 연산이 약속하는 미래가 인간의 행동, 경제적 유인, 그리고 국가적 목표라는 현실과 충돌하는 중요한 변곡점을 보여준다. 제재 대상 러시아 스테이블코인의 논란 많은 거래량 주장부터 전 미국 대통령이 약세장 속에서 거둔 전례 없는 암호화폐 수익, 그리고 선두 AI 모델을 둘러싼 당혹스러운 "너프(성능 저하)" 논쟁에 이르기까지, 이 모든 이야기는 혁신과

Dev.to
Why I'm Building the Fast Series
Why I'm Building the Fast Series I'm building the Fast Series because creator software has gotten too complicated. Plenty of tools are powerful, but they make you fight the software before you can make anything. You want to record a tutorial, stream a game, clip a useful moment, compress a file, o