I've been contemplating a URI scheme for referencing objects in Git repositories.

Requirements

I would want this scheme to be able to:

It would be nice if it could also:

Proposal

x-git-object:asdf1234
References the object encoded by the file with hash 'asdf1234' (that being a placeholder for a hex-encoded SHA-1, which is how Git normally presents hashes). If the object is a blob, this URN would resolve to the blob itself (the same sequence of bytes that was at one point `git add`ed). If the referenced object is not a blob (for example, because it is a tree or commit object), then it is not directly representable as a byte stream, and a strict resolver would return an error if it is asked to represent it as a byte stream (e.g. by asking for http://some-resolver/uri-res/N2R?x-git-object:somecommithash).
x-git-object:asdf1234?repository=http://github.com/TOGoS/PHPN2R.git
References the object identified by 'asdf1234' and provides a repository from which to fetch it. More than one 'repository' parameter could be specified.
x-git-object:asdf1234#path/to/file
Assuming asdf1234 identifies a commit or tree object, #path/to/file would find the object called 'path/to/file' referenced by that commit.
x-git-object:latest?branch=master&repository=http://github.com/TOGoS/PHPN2R.git
References the latest commit on branch 'master' in any of the listed repositories.
x-git-object:latest?branch=master&signedby=urn:bitprint:ABCDEFG
References the latest commit on branch 'master' signed using the specified public key. Not sure what format the public key would be in. Possibly that output by 'gpg --export' (which you could fingerprint using 'gpg --with-fingerprint <file>')

Parameters

repository
The URI of a repository where the named object might be found. Multiple repositories may be specified.
branch
The name of a branch on which to find a commit when the ID is 'latest' rather than a hash.
signedby
The URI of a public key that the object must be signed by in order for this URI to resolve.
type
Assert that the named object is of a certain type: blob, tree, commit, or tag.
encoding
How the identified object should be encoded in the resolved resource. Default is no encoding, which means non-blob objects cannot be represented. Using "git-object" as the encoding would return objects in the same format that Git uses for storing in .git/objects (i.e. including a short header of the form "<type> <content length (bytes)><NUL byte>")

Examples

x-git-object:0d187173d04e67fef5e9178ee761786df04905f0?repository=https://github.com/TOGoS/TFMPM.git references this commit and hints to look for it in that GitHub repository.

x-git-object:669ac7c32292798644b21dbb5a0dc657125f444d is a specific snapshot of Linux's README file. Since this blob is part of commit a7ddce, we could also call it x-git-object:a7ddcea58ae22d85d94eabfdd3de75c3742e376b#README.

Test vectors

URIs on the same line are equivalent. You should be able to copy-paste the data: URIs into your browser to see the actual data.

# Empty string
x-git-object:e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	urn:sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ	data:,

# The string "Hello, world!"
x-git-object:5dd01c177f5d7d1be5346a5bc18a569a7410c2ef	urn:sha1:SQ5HALIG6NCZTLXB7DNI56PXFFQDDVUZ	data:,Hello,%20world!

# The string "Hello, world!" followed by a newline
x-git-object:af5626b4a114abcb82d63db7c8082c3c4756e51b	urn:sha1:BH5MRW75E66ZWTJDUAHLMSFKOULYSU3N	data:,Hello,%20world!%0a

# README from the Linux repo at commit aa50faff
x-git-object:669ac7c32292798644b21dbb5a0dc657125f444d	urn:sha1:ZIO4GZICFXFKOKG7WEN43ZAK2PGOAV2L	https://raw.githubusercontent.com/torvalds/linux/aa50faff4416c869b52dff68a937c84d29e12f4b/README

# Same, but by referencing the file relative to the commit
x-git-object:aa50faff4416c869b52dff68a937c84d29e12f4b#README	urn:sha1:ZIO4GZICFXFKOKG7WEN43ZAK2PGOAV2L
x-git-object:aa50faff4416c869b52dff68a937c84d29e12f4b?repository=https://github.com/torvalds/linux.git#README	urn:sha1:ZIO4GZICFXFKOKG7WEN43ZAK2PGOAV2L

# blob header + "Hello, world!\n", as stored (zlib-compressed) by Git in .git/objects/af/5626b4a114abcb82d63db7c8082c3c4756e51b
# (note that the hex-encoded SHA-1 of this is "af5626b4a114abcb82d63db7c8082c3c4756e51b")
x-git-object:af5626b4a114abcb82d63db7c8082c3c4756e51b?encoding=git-object	urn:sha1:V5LCNNFBCSV4XAWWHW34QCBMHRDVNZI3	data:,blob%2014%00Hello,%20world!%0a

# Encoded tree object containing simply hello-world.txt = x-git-object:af5626b4a114abcb82d63db7c8082c3c4756e51b
x-git-object:50318d4d5ad8a79c84b56ff54861af91b2111c8e?encoding=git-object	urn:sha1:KAYY2TK23CTZZBFVN72UQYNPSGZBCHEO	data:,tree%2043%00100644%20hello-world.txt%00%AFV%26%B4%A1%14%AB%CB%82%D6%3D%B7%C8%08%2C%3CGV%E5%1B

# The tree itself, which is an abstract data structure, hence no equivalent SHA1 URI
x-git-object:50318d4d5ad8a79c84b56ff54861af91b2111c8e

De-proposed bits

I have decided it's probably better to just have one scheme and parameterize some options, but previously suggested these:

x-git-object-encoded:asdf1234
References the object file with hash 'asdf1234', including the header. This allows one to reference non-blob objects and parse them using Git's native encoding.

Parameterized alternative: x-git-object:asdf1234?encoding=git-object
x-git-commit
Equivalent to to x-git-object, but asserts that the named object is a commit. Maybe x-git-tree and x-git-blob could be their own schemes, too.

Parameterized alternative: x-git-object:asdf1234?type=commit (or 'blob', or 'tree', etc)

See Also

Changelog

2014-11-18
2018-09-21
2021-12-11
2021-12-13