I've been contemplating a URI scheme for referencing objects in Git repositories.


I would want this scheme to be able to:

It would be nice if it could also:


References the object encoded by the file with hash 'asdf1234' (that being a placeholder for a hex-encoded SHA-1, which is how Git normally presents hashes). If the object is a blob, this URN would resolve to the blob itself (the same sequence of bytes that was at one point `git add`ed). If the referenced object is not a blob (for example, because it is a tree or commit object), then it is not directly representable as a byte stream, and a strict resolver would return an error if it is asked to represent it as a byte stream (e.g. by asking for http://some-resolver/uri-res/N2R?x-git-object:somecommithash).
References the object identified by 'asdf1234' and provides a repository from which to fetch it. More than one 'repository' parameter could be specified.
Assuming asdf1234 identifies a commit or tree object, #path/to/file would find the object called 'path/to/file' referenced by that commit.
References the latest commit on branch 'master' in any of the listed repositories.
References the latest commit on branch 'master' signed using the specified public key. Not sure what format the public key would be in. Possibly that output by 'gpg --export' (which you could fingerprint using 'gpg --with-fingerprint <file>')


The URI of a repository where the named object might be found. Multiple repositories may be specified.
The name of a branch on which to find a commit when the ID is 'latest' rather than a hash.
The URI of a public key that the object must be signed by in order for this URI to resolve.
Assert that the named object is of a certain type: blob, tree, commit, or tag.
How the identified object should be encoded in the resolved resource. Default is no encoding, which means non-blob objects cannot be represented. Using "git-object" as the encoding would return objects in the same format that Git uses for storing in .git/objects (i.e. including a short header of the form "<type> <content length (bytes)><NUL byte>")


x-git-object:0d187173d04e67fef5e9178ee761786df04905f0?repository=https://github.com/TOGoS/TFMPM.git references this commit and hints to look for it in that GitHub repository.

x-git-object:669ac7c32292798644b21dbb5a0dc657125f444d is a specific snapshot of Linux's README file. Since this blob is part of commit a7ddce, we could also call it x-git-object:a7ddcea58ae22d85d94eabfdd3de75c3742e376b#README.

Test vectors

URIs on the same line are equivalent. You should be able to copy-paste the data: URIs into your browser to see the actual data.

# Empty string
x-git-object:e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	urn:sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ	data:,

# The string "Hello, world!"
x-git-object:5dd01c177f5d7d1be5346a5bc18a569a7410c2ef	urn:sha1:SQ5HALIG6NCZTLXB7DNI56PXFFQDDVUZ	data:,Hello,%20world!

# The string "Hello, world!" followed by a newline
x-git-object:af5626b4a114abcb82d63db7c8082c3c4756e51b	urn:sha1:BH5MRW75E66ZWTJDUAHLMSFKOULYSU3N	data:,Hello,%20world!%0a

# README from the Linux repo at commit aa50faff
x-git-object:669ac7c32292798644b21dbb5a0dc657125f444d	urn:sha1:ZIO4GZICFXFKOKG7WEN43ZAK2PGOAV2L	https://raw.githubusercontent.com/torvalds/linux/aa50faff4416c869b52dff68a937c84d29e12f4b/README

# Same, but by referencing the file relative to the commit
x-git-object:aa50faff4416c869b52dff68a937c84d29e12f4b#README	urn:sha1:ZIO4GZICFXFKOKG7WEN43ZAK2PGOAV2L
x-git-object:aa50faff4416c869b52dff68a937c84d29e12f4b?repository=https://github.com/torvalds/linux.git#README	urn:sha1:ZIO4GZICFXFKOKG7WEN43ZAK2PGOAV2L

# blob header + "Hello, world!\n", as stored (zlib-compressed) by Git in .git/objects/af/5626b4a114abcb82d63db7c8082c3c4756e51b
# (note that the hex-encoded SHA-1 of this is "af5626b4a114abcb82d63db7c8082c3c4756e51b")
x-git-object:af5626b4a114abcb82d63db7c8082c3c4756e51b?encoding=git-object	urn:sha1:V5LCNNFBCSV4XAWWHW34QCBMHRDVNZI3	data:,blob%2014%00Hello,%20world!%0a

# Encoded tree object containing simply hello-world.txt = x-git-object:af5626b4a114abcb82d63db7c8082c3c4756e51b
x-git-object:50318d4d5ad8a79c84b56ff54861af91b2111c8e?encoding=git-object	urn:sha1:KAYY2TK23CTZZBFVN72UQYNPSGZBCHEO	data:,tree%2043%00100644%20hello-world.txt%00%AFV%26%B4%A1%14%AB%CB%82%D6%3D%B7%C8%08%2C%3CGV%E5%1B

# The tree itself, which is an abstract data structure, hence no equivalent SHA1 URI

De-proposed bits

I have decided it's probably better to just have one scheme and parameterize some options, but previously suggested these:

References the object file with hash 'asdf1234', including the header. This allows one to reference non-blob objects and parse them using Git's native encoding.

Parameterized alternative: x-git-object:asdf1234?encoding=git-object
Equivalent to to x-git-object, but asserts that the named object is a commit. Maybe x-git-tree and x-git-blob could be their own schemes, too.

Parameterized alternative: x-git-object:asdf1234?type=commit (or 'blob', or 'tree', etc)

See Also