Benchmarks

Repetitive knowledge-graph edge extraction task. OpenAI o200k_base tokenizer. 10,000-row corpus, fixed prompt. Each format scored on payload size, output token cost, malformed-row rate, and downstream mechanical parse throughput.

Format Avg payload Approx. output tokens Malformed row rate Parse speed
Verbose JSON 507 B ~127 0.2% ~45 k rows/s
YAML 312 B ~85 1.1% ~22 k rows/s
NDJSON 410 B ~105 0.1% ~60 k rows/s
ASHRU 99 B ~25 0.8% ~115 k rows/s

Reading the table

Methodology

Each format used the same source prompt structure (header instructions + one example row + the input text). Model output was scored against a canonical ASHRU row set generated from the same source documents to determine malformation rate. Parse-speed numbers are single-threaded on an M3-class Apple Silicon core.

The benchmark scripts, corpus, and scorers will be open-sourced alongside the SDK at github.com/sumaproai/ashru. Reproducible runs welcome.

Full whitepaper with the complete methodology + analysis: /whitepaper.