Benchmarks
Repetitive knowledge-graph edge extraction task. OpenAI o200k_base
tokenizer. 10,000-row corpus, fixed prompt. Each format scored on payload
size, output token cost, malformed-row rate, and downstream mechanical
parse throughput.
| Format | Avg payload | Approx. output tokens | Malformed row rate | Parse speed |
|---|---|---|---|---|
| Verbose JSON | 507 B | ~127 | 0.2% | ~45 k rows/s |
| YAML | 312 B | ~85 | 1.1% | ~22 k rows/s |
| NDJSON | 410 B | ~105 | 0.1% | ~60 k rows/s |
| ASHRU | 99 B | ~25 | 0.8% | ~115 k rows/s |
Reading the table
- Output tokens dominate operational cost for repetitive extraction. ASHRU is roughly 5× cheaper than verbose JSON and 4× cheaper than NDJSON per row.
- Parse speed reflects pipe-split throughput on a single core. ASHRU's positional shape is cheaper to tokenize than nested formats.
- Malformation rate is the trade-off. ASHRU sits at 0.8% — higher than NDJSON's 0.1%, lower than YAML's 1.1%. Recoverable rows can be salvaged with the patterns in the spec §5; persistent failures should be dropped + logged.
Methodology
Each format used the same source prompt structure (header instructions + one example row + the input text). Model output was scored against a canonical ASHRU row set generated from the same source documents to determine malformation rate. Parse-speed numbers are single-threaded on an M3-class Apple Silicon core.
The benchmark scripts, corpus, and scorers will be open-sourced alongside the SDK at github.com/sumaproai/ashru. Reproducible runs welcome.
Full whitepaper with the complete methodology + analysis: /whitepaper.