These are the slides for a tech talk I did in 2015.

You may also be interested in HashURNs.html, an article specifically about representing hashes as URNs and some related topics.

A note on the medium:

Opera used to support @media projection, which was good for making slide shows.

Nobody supports that any more, including Opera, which is lame.

So I hacked together some javascript to simulate slideshow mode.

Controls: previous, next, beginning, end.

Hit n to see the first slide.

Content-Addressable Storage

Naming Stuff using Hashes

Other Approaches?

[AUDIENCE PRECIPITATION]

Some Other Approaches

Problems with Those Approaches

Names can be ambiguous

Hard to verify that what you asked for is what you got.

Cache invalidation is haaaard.

(Especially in distributed systems.)

A Naming Convention To Solve All?

Cryptographic Hashes

Cryptographic Hashes

A Parable

That solution's kind of ugly becuse you pass the location of the file (the URL) and the deta to verify it separately. And bob's probably not going to actually check the hash because people are lazy.

Here's an Idea

Let's just identify the file by its hash.

As long as the hash matches, we don't care where it came from!

Systems that do this

Systems that do this

Bittorrent

Pirated-Movie.torrent:

announcehttp://9.rarbg.com:2710/announce
info
namePirated-Movie.mkv
piece length262144
length12345678
pieces
ac9e79797f8208a6c1b5c7f79a6bafdd07bd0bb2
be2e9400d39a60b75fed2bb25851ccd48b6151c8
7f0a9dfe35531f7eee14112ef34ac4434598b2de
1a7f3107f9b92ea07dbda6f7cc07f17eaa1abdaa
b8c9bdec5a025eaeca6584572b88def2ce55cb76
636ac031b749e214f1e5254211590eb7e4a3bbe2
...

Bittorrent

magnet:?xt=urn:btih:103863b4749ab394f0235ef2a4d988663d7d8579

(See Magnet URI Scheme [Wikipedia])

Git

Git

3 primary object types: commits, trees, and blobs

Objects are identified by the hash of their serialized form (with a header)

Git

$ echo 'Hello!' > hello.txt
$ git add hello.txt
$ git commit -m "Hello."

[master (root-commit) 49d177e] Hello.
 1 file changed, 1 insertion(+)
 create mode 100644 hello.txt

$ echo 'Goodbye!' > goodbye.txt
$ git add goodbye.txt
$ git commit -m "Add a goodbye file."

[master (root-commit) 83b2093] Add a goodbye file.
 1 file changed, 1 insertion(+)
 create mode 100644 goodbye.txt

Git

Git

83b209323ce9c03bd3c9e8fa9e3139c5acc5f54c:

commit 234 
tree 503333de279da4cef9c359be02651fc285dfcc6f
parent 49d177e8179e0b3d5c21827e181811f4c1c7839c
author Dan Stevens <stevens@earthit.com> 1447783213 -0600
committer Dan Stevens <stevens@earthit.com> 1447783213 -0600

Add a goodbye file.

503333de279da4cef9c359be02651fc285dfcc6f:

tree 76 
100644 goodbye.txt b04c55e39ec0138a2f04ffa29191457abc658bac
100644 hello.txt 10ddd6d257e01349d514541981aeecea6b2e741d

b04c55e39ec0138a2f04ffa29191457abc658bac:

blob 9 
Goodbye!

10ddd6d257e01349d514541981aeecea6b2e741d:

blob 7 
Hello!

49d177e8179e0b3d5c21827e181811f4c1c7839c:

commit 173 
tree 04541bff04d1d03bae56102f80bd68cb0be0e167
author Dan Stevens <stevens@earthit.com> 1447778840 -0600
committer Dan Stevens <stevens@earthit.com> 1447778840 -0600

Hello.

04541bff04d1d03bae56102f80bd68cb0be0e167:

tree 37 
100644 hello.txt 10ddd6d257e01349d514541981aeecea6b2e741d

Features of Git's Data Model

Git Weirdnesses

Git-like Systems

Merkle Trees

Hashes of pairs of hashes of pairs of hashes .... of chunks of your data

Merkle Trees

Benefits of Hash-Based Addressing

Limitations of Hash-Based Addressing

Signature-Based Addresses

One solution to the "Can't update" problem.

Identifier includes public key of trusted author.

This scheme could be used by fully distributed systems in conjunction with hash-based keys to construct updateable entities This is how Freenet implements SSKs.

Query might be something like: "Give me the latest commit signed by the guy with public key urn:sha1:W6QJK24LSJ64HSIYWUQXXLA7RPKB43YY"

Conclusion