v1.0 · CC0 public domain
The ASHRU Specification
1. Row shape
A single ASHRU row is exactly eleven fields, separated by the pipe
character |. Fields are positional —
the meaning of each cell is determined by its index, not by a label.
Example: V|deploy|engineer|api||||staging|p|0|
2. Field semantics
| # | Field | Required | Notes |
|---|---|---|---|
| 0 | version | yes | Schema version. V or V1 for this spec. |
| 1 | action | optional | The verb in base form. |
| 2 | subject | optional | The agent. |
| 3 | object | optional | The direct object. |
| 4 | instrument | optional | Means / how. |
| 5 | recipient | optional | Indirect object. |
| 6 | source | optional | Origin. |
| 7 | location | optional | Where the action takes place. |
| 8 | tense | optional | Single char: p past, n present, f future, h habitual, c conditional. |
| 9 | negated | yes | 0 or 1. |
| 10 | attributes | optional | Free-form annotation slot. |
3. Encoding rules
- Fields are separated by a single
|character. - Empty fields are sequential pipes (
||). - Literal pipe inside a value:
\|. Literal backslash:\\. - No leading whitespace inside fields (parsers may or may not trim — emitters should not rely on it).
- Rows are line-oriented. Newlines (
\n) are row terminators, never embedded inside fields. - UTF-8 throughout. No BOM.
4. Version semantics
Field 0 is reserved for protocol versioning. Future revisions of the
ASHRU spec bump the version string (V2, V3, …) to ensure
downstream parsers can detect and route by schema generation. Parsers
SHOULD reject unrecognized version strings; producers SHOULD emit only
versions they know how to populate.
5. Parser recovery
Extraction is probabilistic. Production parsers should expect occasional malformed rows from LLM outputs. Recommended recovery patterns:
- Strict default. Reject rows with the wrong field count.
- Trailing-pipe stripping. If a row has 12 fields and the last is empty, treat as 11. (LLMs commonly add an extra trailing pipe.)
- Short-row padding. Conservatively pad 9- or 10-field rows to 11 with empty cells. Reject rows shorter than 9 — guessing past that point is unsafe.
- Drop and log. Persistently malformed rows belong in a side queue, not the main ingestion stream.
6. Relationship to richer schemas
ASHRU is a transport protocol. Downstream systems may extend rows with additional metadata at storage time — for instance, the KOSHA engine layers retrieval factors (embeddings, recency, persona weights) on top of the ASHRU base. Those extensions live in the storage system, not in the wire format.
Specification text is CC0 / public domain. Reference parsers are MIT-licensed. Bug reports and protocol proposals: github.com/sumaproai/ashru.