TOGoS Binary Blocks

Is a simple, extensible, and compact way to associate arbitrarily encoded objects with encoding information.

A TBB packet is encoded as follows:

The TBB format does not provide any way to indicate the length or checksum of the packet; that is left to the container (such as a UDP packet, a file, a database record, etc).

Schema schema

A schema should indicate a format for the payload and a prototype. How to interpret these properties is up to the application, and they can be either rdf:resource links or inline RDF+XML. It's expected that format will often be an opaque resource URI identifying a predefined format, but it may also point to a machine-readable format description (such as ASN.1 encoding rules), or simply be a block of human-readable text describing the format.

As an example, say there are QQ objects to be described (nevermind what a QQ represents) that have, among other properties, foofiness and color, and that 23 and purple are such common values for those properties that we don't want to repeat them in each serialization of a QQ object. In that case, the schema might be:

<Schema xmlns="http://ns.nuke24.net/TBB/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <format rdf:resource="http://example.com/QQ/CompactQFormat1"/>
    <prototype>
        <qq:QQObject xmlns:qq="http://example.com/QQ/">
            <qq:foofiness>23</qq:foofiness>
            <qq:color>purple</qq:color>
        </qq:QQObject>
    </prototype>
</Schema>

In addition to format and prototype, a schema may specify other, format-specific information. When represented as RDF, format-specific properties should be namespaced appropriately (not in the http://ns.nuke24.net/TBB/ namespace).

As an alternative to RDF+XML, the schema itself may be encoded in the TBB format, or any other encoding, so long as it's distinguishable and understood by the target application.

Applications whose objects have only a few predetermined schemas may have interpreters pre-loaded, so that they do not need to load schemas at all, but simply look them up by their hash. For future compatibility, those hashes should still be based on RDF+XML documents that actually describe the schemas.

Fetching schemas

How to go about this is left completely up to the application. Since the SHA-1 sum of the schema is known, you are free to fetch it from untrusted sources. Since many objects will share a single schema, and because schemas are uniquely identified by their hash, applications can easily cache an efficient representation.

TOGoS Text Blocks

A format with semantics identical to those of TOGoS Binary Blocks, but with a text header of the form "#TTB " + datatype URI + newline, or a shebang line (i.e. starting with "#!"), which is ignored, followed by a "#TTB" line.

Rather than opaque 20-byte format identifiers, XML datatype URIs can be used. To refer to format that is in turn described by another document, use a URI of the form document URI + "#" + fragment ID, where fragment ID may be blank for cases where the entire document unambiguously describes a single concept, such as RDF+XML documents where the root node is a description.

TOGoS Text Blocks is most useful as an alternative to TOGoS Binary Blocks when the content is also a text-based format, but it not required. So long as a format has both a 20-byte ID and a datatype URI defined, it could be encapsulated in either a TBB or a TTB document. They can even embed each other! Or at least will be able to once I define a 20-byte ID corresponding to http://ns.nuke24.net/Datatypes/Subject

TODO: Define a standard, repeatable method for converting XML datatype URIs to TBB schema IDs, maybe using v5 UUIDs. Embed a JS form on this page to do the calculation.

Referring to the subject of a TBB or TTB document from RDF

A URI of the form document URI + "#", where document URI is the URI of a TBB or TTB document, refers to the object described by that document.

You can also use the http://ns.nuke24.net/Datatypes/Subject XML datatype to indicate the lexical-to-value mapping of a RDF literal, or in other cases where such datatype URIs are used, such as TTB itself. As a silly example, you could have a TTB document prefixed with any number of "#TTB http://ns.nuke24.net/Datatypes/Subject" lines, which would act as no-ops, as they would essentially say "interpret this document the same way you were already about to, but start at the next line".