Format Overview β
TOON syntax reference with concrete examples. See Getting Started for an introduction.
Data Model β
TOON models data the same way as JSON:
- Primitives: strings, numbers, booleans, and
null - Objects: mappings from string keys to values
- Arrays: ordered sequences of values
Root Forms β
A TOON document can represent different root forms:
- Root object (most common): Fields appear at depth 0 with no parent key
- Root array: Begins with
[N]:or[N]{fields}:at depth 0 - Root primitive: A single primitive value (string, number, boolean, or null)
Most examples in these docs use root objects, but the format supports all three forms equally (spec Β§5).
Objects β
Simple Objects β
Objects with primitive values use key: value syntax, with one field per line:
id: 123
name: Ada
active: trueIndentation replaces braces. One space follows the colon.
Nested Objects β
Nested objects add one indentation level (default: 2 spaces):
user:
id: 123
name: AdaWhen a key ends with : and has no value on the same line, it opens a nested object. All lines at the next indentation level belong to that object.
Empty Objects β
An empty object at the root yields an empty document (no lines). A nested empty object is key: alone, with no children.
Arrays β
TOON detects array structure and chooses the most efficient representation. Arrays always declare their length in brackets: [N].
Primitive Arrays (Inline) β
Arrays of primitives (strings, numbers, booleans, null) are rendered inline:
tags[3]: admin,ops,devThe delimiter (comma by default) separates values. Strings containing the active delimiter must be quoted.
Arrays of Objects (Tabular) β
When all objects in an array share the same set of primitive-valued keys, TOON uses tabular format:
items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5users[2]{id,name,role}:
1,Alice Admin,admin
2,"Bob Smith",userThe header items[2]{sku,qty,price}: declares:
- Array length:
[2]means 2 rows - Field names:
{sku,qty,price}defines the columns - Active delimiter: comma (default)
Each row contains values in the same order as the field list. Values are encoded as primitives (strings, numbers, booleans, null) and separated by the delimiter.
NOTE
Tabular format requires identical field sets across all objects (same keys, order per object may vary), primitive values only (no nested arrays/objects), and at least one key per object β arrays that contain an empty {} element fall back to the expanded list form below.
Mixed and Non-Uniform Arrays β
Arrays that don't meet the tabular requirements use list format with hyphen markers:
items[3]:
- 1
- a: 1
- textEach element starts with - at one indentation level deeper than the parent array header.
Objects as List Items β
When an array element is an object, it appears as a list item:
items[2]:
- id: 1
name: First
- id: 2
name: Second
extra: trueWhen a tabular array is the first field of a list-item object, the tabular header appears on the hyphen line, with rows indented two levels deeper and other fields indented one level deeper:
items[1]:
- users[2]{id,name}:
1,Ada
2,Bob
status: activeWhen the object has only a single tabular field, the same pattern applies:
items[1]:
- users[2]{id,name}:
1,Ada
2,BobThis is the canonical encoding for list-item objects whose first field is a tabular array.
Arrays of Arrays β
When you have arrays containing primitive inner arrays:
pairs[2]:
- [2]: 1,2
- [2]: 3,4Each inner array gets its own header on the list-item line.
When the inner arrays are themselves arrays of objects or non-uniform arrays, the same - [N]: header appears on the hyphen line and the nested items follow one indent deeper:
items[3]:
- summary
- id: 1
name: Ada
- [2]:
- id: 2
- status: draftEmpty Arrays β
Empty arrays render as key: [] for fields and [] at the root:
items: []The legacy items[0]: form is still decoded for backward compatibility.
Array Headers β
Header Syntax β
Array headers follow this pattern:
key[N<delimiter?>]<{fields}>:Where:
- N is the non-negative integer length
- delimiter (optional) explicitly declares the active delimiter:
- Absent β comma (
,) \t(tab character) β tab delimiter|β pipe delimiter
- Absent β comma (
- fields (optional) for tabular arrays:
{field1,field2,field3}
NOTE
The array length [N] helps LLMs validate structure. If you ask a model to generate TOON output, explicit lengths let you detect truncation or malformed data.
Delimiter Options β
TOON supports three delimiters: comma (default), tab, and pipe. The delimiter is scoped to the array header that declares it.
items[2]{sku,name,qty,price}:
A1,Widget,2,9.99
B2,Gadget,1,14.5items[2 ]{sku name qty price}:
A1 Widget 2 9.99
B2 Gadget 1 14.5items[2|]{sku|name|qty|price}:
A1|Widget|2|9.99
B2|Gadget|1|14.5Tab and pipe delimiters are explicitly encoded in the header brackets and field braces. Inside an array scope, only the active delimiter triggers quoting β the others are literal data. Object field values (key: value) follow the document delimiter (Β§11.1) regardless of any surrounding array's active delimiter.
TIP
Tab delimiters often tokenize more efficiently than commas, especially for data with few quoted strings. Use encode(data, { delimiter: '\t' }) for additional token savings.
Key Folding (Optional) β
Key folding is an optional encoder feature (since spec v1.5) that collapses chains of single-key objects into dotted paths, reducing tokens for deeply nested data.
Basic Folding β
Standard nesting:
data:
metadata:
items[2]: a,bWith key folding (keyFolding: 'safe'):
data.metadata.items[2]: a,bThe three nested objects collapse into a single dotted key data.metadata.items.
When Folding Applies β
A chain of objects is foldable when:
- Each object in the chain has exactly one key (leading to the next object or a leaf value)
- The leaf value is a primitive, array, or empty object
- All segments are valid identifier segments (letters, digits, underscores only; no dots within segments)
- The resulting folded key doesn't collide with existing keys
Advanced Folding Rules
Segment Requirements (safe mode):
- All folded segments must match
^[A-Za-z_][A-Za-z0-9_]*$(no dots, hyphens, or other special characters) - No segment may require quoting per Β§7.3 of the spec
- The resulting folded key must not equal any existing sibling literal key at the same depth (collision avoidance)
Depth Limit:
- The
flattenDepthoption (default:Infinity) controls how many segments to fold flattenDepth: 2folds only two-segment chains:{a: {b: val}}βa.b: val- Values less than 2 have no practical effect
Round-Trip with Path Expansion: To reconstruct the original structure when decoding, use expandPaths: 'safe'. This splits dotted keys back into nested objects using the same safety rules (spec Β§13.4).
Round-Trip with Path Expansion β
When decoding TOON that used key folding, enable path expansion to restore the nested structure:
import { decode, encode } from '@toon-format/toon'
const original = { data: { metadata: { items: ['a', 'b'] } } }
// Encode with folding
const toon = encode(original, { keyFolding: 'safe' })
// β "data.metadata.items[2]: a,b"
// Decode with expansion
const restored = decode(toon, { expandPaths: 'safe' })
// β { data: { metadata: { items: ['a', 'b'] } } }Path expansion is off by default, so dotted keys are treated as literal keys unless explicitly enabled.
Quoting and Types β
When Strings Need Quotes β
TOON quotes strings only when necessary to maximize token efficiency. A string must be quoted if:
- It's empty (
"") - It has leading or trailing whitespace
- It equals
true,false, ornull(case-sensitive) - It looks like a number (e.g.,
"42","-3.14","1e-6","05") - It contains special characters: colon (
:), quote ("), backslash (\), brackets, braces, or any control character in U+0000βU+001F - It contains the relevant delimiter (the active delimiter inside an array scope, or the document delimiter elsewhere)
- It equals
"-"or starts with"-"followed by any character
Otherwise, strings can be unquoted. Unicode, emoji, and strings with internal (non-leading/trailing) spaces are safe unquoted:
message: Hello δΈη π
note: This has inner spacesEscape Sequences β
In quoted strings and keys, six escape sequences are valid:
| Character | Escape |
|---|---|
Backslash (\) | \\ |
Double quote (") | \" |
| Newline (U+000A) | \n |
| Carriage return (U+000D) | \r |
| Tab (U+0009) | \t |
| Any other U+0000βU+001F control character | \uXXXX |
Other escapes (e.g., \x, \0, \b) are always rejected, as are lone-surrogate \uXXXX values (U+D800βU+DFFF).
Type Conversions β
Numbers are emitted in canonical decimal form for values in the Β§2 carve-out range; exponent notation is permitted outside. Non-JSON types (NaN, Infinity, BigInt, Date, Set, Map, undefined, etc.) are normalized before encoding β see API Reference β Type Normalization for the full mapping.
Decoders accept both decimal and exponent forms on input (e.g., 42, -3.14, 1e-6), and treat tokens with forbidden leading zeros (e.g., "05") as strings, not numbers.
Custom Serialization with toJSON β
Objects with a toJSON() method are serialized by calling the method and normalizing its result before encoding, similar to JSON.stringify:
const obj = {
data: 'example',
toJSON() {
return { info: this.data }
}
}
encode(obj)
// info: exampleThe toJSON() method:
- Takes precedence over built-in normalization (Date, Array, Set, Map)
- Results are recursively normalized
- Is called for objects with
toJSONin their prototype chain
For complete rules on quoting, escaping, type conversions, and strict-mode decoding, see spec Β§2β4 (data model), Β§7 (strings and keys), and Β§14 (strict mode).