Advanced Topics: Performance Comparison and Optimization

In the realm of AI, particularly with Large Language Models (LLMs), “performance” isn’t just about speed; it’s crucially about token efficiency and accuracy. Every token processed by an LLM incurs a cost (monetary and computational) and consumes context window space. This chapter provides a detailed comparison of JSON and TOON’s performance, analyzes real-world benchmarks, and offers advanced strategies for optimizing your AI data pipelines.

6.1 Token Cost Analysis: The Hidden Expense

The fundamental driver for TOON’s existence is the token cost associated with LLM interactions. LLMs tokenize input text into discrete units, and their pricing models are typically based on the number of input and output tokens.

JSON’s Verbosity: JSON, designed for human readability and machine parsing, includes many characters that serve structural purposes (curly braces {}, square brackets [], double quotes "", commas ,, colons :) but convey minimal semantic information to an LLM once the pattern is established. When these characters are repeated across many data points (e.g., in an array of objects), they quickly inflate the token count.
TOON’s Conciseness: TOON specifically targets this verbosity. By using:
- Indentation instead of braces.
- Declaring array field names once.
- Smart quoting (only when necessary).
- Explicit array lengths. It significantly reduces the number of characters required to represent the same structured data, directly translating to fewer tokens.

Example Comparison (revisiting):

Consider a list of two users.

JSON (compacted):

{"users":[{"id":1,"name":"Alice","role":"admin"},{"id":2,"name":"Bob","role":"user"}]}

This is around 49 tokens with a common tokenizer.

TOON:

users[2]{id,name,role}:
1,Alice,admin
2,Bob,user

This is around 26 tokens.

Savings: ~47% fewer tokens for the same data. This is the core economic argument for TOON.

6.2 Benchmarks: Real-World Token Savings

Numerous benchmarks confirm TOON’s token efficiency. The savings vary depending on the data structure:

Uniform Tabular Data (TOON’s Sweet Spot): For arrays of objects where all objects have identical, primitive-valued fields (e.g., user records, product catalogs, time-series data), TOON consistently achieves 30-60% token reduction compared to compact JSON. This is due to declaring keys only once and streaming data as CSV-like rows.
Small Datasets (<10 items): Savings might be lower (e.g., 20-30%) because the overhead of TOON’s structural elements (like field headers and array lengths) is amortized over fewer data points.
Deeply Nested Objects: For data with many levels of nesting, TOON’s indentation-based structure replaces JSON’s braces. While it still saves some tokens, the percentage reduction might be less dramatic (10-20%) compared to tabular data, and compact JSON might sometimes even be more efficient for very deep, non-uniform nesting.
Mixed/Non-Uniform Arrays: When an array contains objects with varying keys or nested structures, TOON falls back to a list-like format using hyphens. Savings are typically minimal (0-10%) compared to compact JSON because much of the structural information still needs to be explicit.

Key Insights from Benchmarks:

Average Savings: Across diverse datasets, average token savings often hover around 40-50% when TOON is used appropriately.
Impact on Cost: For applications making millions of LLM API calls, these percentage savings translate to substantial monetary savings, potentially tens or hundreds of thousands of dollars annually.
Impact on Context Window: Fewer tokens mean more room in the LLM’s context window, allowing for larger datasets, longer conversation histories, or more complex instructions within a single prompt.

6.3 Accuracy and Reliability with LLMs

A surprising finding from TOON benchmarks is that it often improves LLM comprehension and data retrieval accuracy compared to JSON. This might seem counter-intuitive as JSON is considered “structured,” but there are several reasons:

Explicit Structural Cues: TOON’s explicit array lengths (e.g., [N]) and clear field headers ({field1,field2}:) provide stronger “guardrails” for LLMs. The model knows exactly how many items to expect and what fields each item should have. This reduces ambiguity and the chance of hallucinating or misinterpreting the structure.
Reduced Syntactic Noise: By removing redundant punctuation, TOON presents a cleaner, more focused representation of the actual data. This can make it easier for LLMs to extract the relevant information without getting distracted by unnecessary characters.
Pattern Recognition: The tabular format in TOON, resembling CSV, might align better with how LLMs process structured sequences, enabling more efficient pattern recognition for data extraction.

Benchmark Results for Accuracy (Illustrative):

Retrieval Accuracy: TOON often shows a 3-7% improvement in data retrieval accuracy compared to JSON across various models and datasets.
Model-Specific: Performance varies by LLM. Some models (e.g., GPT-5 Nano) show significant accuracy boosts with TOON, while others might show equal accuracy but with substantial token savings.
Validation: TOON’s structural guarantees (like explicit lengths) can lead to higher accuracy in validation tasks compared to JSON where structural errors might be less immediately obvious to an LLM.

6.4 Optimization Strategies

Beyond simply converting JSON to TOON, here are advanced strategies for optimizing data transfer and processing for AI:

6.4.1 Pre-processing and Filtering Data

Before converting to TOON (or even sending JSON), meticulously curate your data:

Remove Irrelevant Fields: Only include the data absolutely necessary for the LLM’s task. Every field, even if null, consumes tokens.
Simplify Complex Structures: If a nested object or array isn’t critical for the LLM’s reasoning, consider flattening it or summarizing it into a single string.
Aggregate Data: Instead of sending raw, granular data, pre-aggregate it to higher-level summaries if that meets the LLM’s needs.
Filter Rows/Items: Send only the most relevant subset of data, especially for large lists. Techniques like vector search or keyword matching can help identify top-k relevant items.

6.4.2 Batching and Chunking

Batching: If you have multiple independent tasks for an LLM that involve similar data structures, consider batching them into a single, larger TOON request (if your context window allows). This amortizes the prompt overhead.
Chunking: For extremely large datasets that exceed the LLM’s context window, you’ll need to chunk them. Each chunk can be converted to TOON and sent separately. Be mindful of maintaining context across chunks.

6.4.3 Choosing the Right Format Dynamically

Develop logic in your application to dynamically choose between JSON (compacted) and TOON based on the data’s characteristics:

Tabular Eligibility Check: If your data is a uniform array of primitive objects, TOON is almost always superior.
Nesting Depth and Uniformity: For deeply nested, highly non-uniform data, benchmark compact JSON against TOON. There might be cases where compact JSON is better, or the difference is negligible.
Hybrid Data: For data containing a mix of tabular and non-tabular parts, you might convert the tabular parts to TOON and keep other parts as JSON, or vice-versa, embedded within a larger structure. (This requires LLM guidance on parsing hybrid inputs).

6.4.4 LLM-Specific Tokenizer Awareness

Use the official tokenizer for your target LLM (e.g., tiktoken for OpenAI models, or corresponding tools for Claude, Gemini, Llama) to precisely measure token counts.

Delimiter Experimentation: Test different TOON delimiters (\t, |, ,) with your actual data and tokenizer to find the most efficient one. Some tokenizers might treat \t as a single token while a comma-space pair is two tokens.
Context Length Planning: Knowing the exact token count helps you manage the LLM’s context window effectively and avoid truncation.

6.4.5 Instruction Tuning and Few-Shot Examples

Explicit TOON Instructions: Clearly instruct the LLM on how to parse TOON input and how to generate TOON output.
Few-Shot Examples: Provide one or two small, correctly formatted TOON examples in your prompt (both for input parsing and desired output generation). This greatly helps the LLM learn the expected structure and reduces formatting errors.
Error Handling in Prompts: Instruct the LLM on how to handle missing data or validation errors when processing TOON input.

6.5 Example: Token Counting with Python (`tiktoken`)

Let’s put the token counting into practice.

import json
from toon import encode as toon_encode
import tiktoken

def count_tokens(text: str, model_name: str = "gpt-4o-mini") -> int:
    """Counts tokens using the tiktoken library for OpenAI models."""
    try:
        # 'cl100k_base' is suitable for gpt-4, gpt-3.5-turbo, text-embedding-ada-002, etc.
        # Use encoding_for_model if you need model-specific encoding logic for other OpenAI models.
        encoding = tiktoken.get_encoding("cl100k_base")
        return len(encoding.encode(text))
    except Exception as e:
        print(f"Error counting tokens: {e}")
        return -1

# Sample data: A list of products (tabular candidate)
products_data = {
    "products": [
        {"id": 101, "name": "Laptop Pro", "price": 1200.00, "inStock": True, "category": "Electronics"},
        {"id": 102, "name": "Wireless Mouse", "price": 25.50, "inStock": True, "category": "Accessories"},
        {"id": 103, "name": "USB-C Hub", "price": 49.99, "inStock": False, "category": "Accessories"},
        {"id": 104, "name": "External SSD 1TB", "price": 150.00, "inStock": True, "category": "Storage"},
        {"id": 105, "name": "Gaming Keyboard", "price": 99.99, "inStock": True, "category": "Peripherals"}
    ]
}

# --- 1. JSON (pretty-printed) ---
json_pretty = json.dumps(products_data, indent=2)
json_pretty_tokens = count_tokens(json_pretty)
print("--- JSON (Pretty) ---")
print(json_pretty)
print(f"Tokens: {json_pretty_tokens}\n")

# --- 2. JSON (compacted) ---
json_compact = json.dumps(products_data, separators=(',', ':'))
json_compact_tokens = count_tokens(json_compact)
print("--- JSON (Compact) ---")
print(json_compact)
print(f"Tokens: {json_compact_tokens}\n")

# --- 3. TOON (default comma delimiter) ---
toon_default = toon_encode(products_data)
toon_default_tokens = count_tokens(toon_default)
print("--- TOON (Default Delimiter) ---")
print(toon_default)
print(f"Tokens: {toon_default_tokens}\n")

# --- 4. TOON (tab delimiter) ---
toon_tab_delimiter = toon_encode(products_data, delimiter='\t')
toon_tab_tokens = count_tokens(toon_tab_delimiter)
print("--- TOON (Tab Delimiter) ---")
print(toon_tab_delimiter)
print(f"Tokens: {toon_tab_tokens}\n")

print("\n=== Performance Summary ===")
print(f"JSON (Pretty)   : {json_pretty_tokens} tokens")
print(f"JSON (Compact)  : {json_compact_tokens} tokens")
print(f"TOON (Comma)    : {toon_default_tokens} tokens (Savings vs JSON Compact: {((json_compact_tokens - toon_default_tokens) / json_compact_tokens * 100):.2f}%)")
print(f"TOON (Tab)      : {toon_tab_tokens} tokens (Savings vs JSON Compact: {((json_compact_tokens - toon_tab_tokens) / json_compact_tokens * 100):.2f}%)")

Running this code will give you concrete numbers, demonstrating the token savings of TOON compared to both pretty-printed and compact JSON. You might even observe differences between comma and tab delimiters.

Exercise 6.5.1: Benchmark Your Own Data

Take a reasonably sized JSON dataset (e.g., from a previous exercise, or a sample you find online for things like user lists, articles, log entries – ensure it’s primarily tabular).
Create a Python script similar to the example above.
Represent your data in a Python dictionary.
Calculate and print the token counts for:
- JSON (pretty-printed, indent=2)
- JSON (compact, separators=(',', ':'))
- TOON (default comma delimiter)
- TOON (tab delimiter)
Analyze the results:
- What are the absolute token counts for each format?
- What are the percentage savings of TOON (both comma and tab) compared to compact JSON?
- Did the tab delimiter provide additional savings over the comma delimiter for your specific data and tokenizer?
- Reflect on why TOON performed the way it did for your data structure (e.g., was it highly tabular? deeply nested? very short?).

By diligently applying these optimization strategies and continuously benchmarking, you can build AI applications that are not only effective but also cost-efficient and scalable.

Advanced Topics: Performance Comparison and Optimization

// table of contents

Advanced Topics: Performance Comparison and Optimization

6.1 Token Cost Analysis: The Hidden Expense

6.2 Benchmarks: Real-World Token Savings

Key Insights from Benchmarks:

6.3 Accuracy and Reliability with LLMs

6.4 Optimization Strategies

6.4.1 Pre-processing and Filtering Data

6.4.2 Batching and Chunking

6.4.3 Choosing the Right Format Dynamically

6.4.4 LLM-Specific Tokenizer Awareness

6.4.5 Instruction Tuning and Few-Shot Examples

6.5 Example: Token Counting with Python (tiktoken)

Exercise 6.5.1: Benchmark Your Own Data

6.5 Example: Token Counting with Python (`tiktoken`)