Dataset note
A plain-language summary of what a dataset contains and how its current results were produced — auto-generated, owner-editable.
Last updated
A plain-language summary of what a dataset contains and how its current results were produced. Auto-generated at analysis time and editable by the owner. The dataset note is rendered prominently on the dataset page so readers see provenance before they see results.
Why it matters for your research. This is where the corpus’s scope, the OCR engine, the embedding model, and any caveats live. For a public dataset the note is the first thing a colleague reads; for a private one it’s the record that future-you will be grateful for.
In Archēglyph. Generated as an analysis plugin; refreshed whenever the upstream stages change their outputs so the note never lies about which model produced the current results.
Not to be confused with. A README in an exported bundle is static and drifts from reality; the dataset note is regenerated from the live dataset state every time the pipeline runs.