This is a 'sharp-format'-based format for representing RDF data along with 'authority' information. The purpose is to represent a collection of statements of the form "$subject $predicate $object, according to $authority at $location" in a way that is easy to generate and parse, for interchange between different tools.
The first line should be #format urn:uuid:b783bac7-58e9-4340-93ef-7973914732d5.
Following lines are of the form:
source location subject predicate object
Any number of whitespace may be used, tabs and/or spaces.
Blank lines and lines beginning with whitepace + "#" + whitespace (or end of line) can be ignored.
Lines beginning with "#" + not-whitespace-and-not-end-of-line are special directives.
Only "#format" and "#alias" are defined.
Aliases can be defined as a simple compression scheme.
Aliases whose new name ends with ':' or '#' or '/'
can be used as namespaces,
and aliases can be defined in terms of earlier-defined aliases.
#format urn:uuid:b783bac7-58e9-4340-93ef-7973914732d5 # This line is a comment #alias the-doc tag:nuke24.net,2025-12-08:Example#Doc #alias bob tag:nuke24.net,2025-12-08:Example#BobHope #alias name http://purl.org/dc/terms/title #alias dcterms: http://purl.org/dc/terms/ #alias abstract dcterms:abstract the-doc l0c13 bob name data:,Bob%20Hope
To start over within a stream (clearing all aliases), just do the format line again:
#format urn:uuid:b783bac7-58e9-4340-93ef-7973914732d5
All columns are URIs or pseudio-URIs, except for location, which can be like:
llineccolumn - zero-indexed line/columnLlineCcolumn - one-indexed line/columnboffset - byte offsetBoffset - would bean 1-based byte offset, but disallowed because that's dumb...", indicating the span between those positionse.g. "l0c0", "L1C1", and "b0" all mean beginning of file.
'Source' references either a document, or some abstract object (possibly by a x-rdf-subject or UUID URI) that generated the information. The object could be a 'loading' of a document, which might reference a 'batch' (information about what program was used and when on what input dataset) along with the specific document from which this bit of information was derived (which is presumably what the 'location' refers to).
That information may also be included in the same format, and in the same stream, possibly with the batch indicating itself as the authority in the statements about itself.
Later batches might indicate that previous batches were bad info and should be ignored, though a better approach would probably be to do the inverse; vouch for documents containing information that is known to be good.
A TS34-like encoding can be used to
represent values that are encoded by following a URI with a pipe ("|"),
and the URI of the first encoding to be unapplied, and so on.
- can be used to indicate 'none' or 'not applicable',
e.g. if source or location are unknown.
Anonymous objects can be named "_:" + internal name,
same as in Turtle.
TODO: Some way to refer to this document, and to name an otherwise-anonymous node that is its primary subject. Some ideas:
"THISDOC" refers to the document itself,
and "THISDOC#" or "x-rdf-subject:THISDOC" to its subject.
"_:main" can refer to the 'primary subject'
of the document, and "_:doc" to the document itself?
In which case "_:main", "_:doc#", and "x-rdf-subject:_:doc"
would all mean the same thing.
I think I like the THISDOC idea better.