date: 2021-12-17
tags: ontology, typescript
title: Overthinking TypeScript Types

Overthinking TypeScript Types - 2021-12-17 - Entry 32 - TOGoS's Project Log

I've been up to some weird stuff lately, like:

TODO: Pix of goat shacks and snow trails

And also

Overthinking Types In TypeScript

Today's detour into the ontological wilderness started because I was trying to solve the following TypeScript interface for WhatGoesHere:


interface Resolver {
	resolve( reference:string|Expression ) : Promise<SimplifiedExpression>;
	getBlob( expr:SimplifiedExpression ) : Promise<Blob>;
	getObject( expr:SimplifiedExpression ) : Promise<WhatGoesHere>;
}

Some context: This is part of a build system that I'm writing in TypeScript, similar in idea to NetKernel, which is all about encoding processes in URIs so that the results can be easily cached ("Resource-oriented computing", I guess they call it). Sometimes the result of a process is a sequence of bytes. But other times it is a more abstract concept, like of 'a directory'. Sure, you can come up with encodings for those things, but sometimes you want to refer to the idea itself. For some URI schemes (such as x-git-object), you don't know from the URL whether the result will be a blob or something else. In the case of x-git-object, it could be a commit, tree, or tag object.

There are some different ways I could solve this. One would be to forget about getObject entirely, and instead have a sub-type of SimplifiedExpression that includes information on how to interpret some other value. I'm currently leaning towards this approach, since it avoids adding more concepts to the system.

But another option would be to have getObject return an RDF node representing the object! I like RDF because it can represent anything. Just make up new types and predicates as needed. But there's no obvious canonical way to represent a node of an RDF graph as a JavaScript object. Off the top of my head, there are a couple of different ways one might do it. One in which you model the 'RDF node' type:


interface RDFObject1 {
	uri?: string,
	simpleValue?: string,
	attributes?: {[attr:string] : RDFObject1[]},	
}

and another in which you blur the distinction between the JavaScript object and the RDF node that it represents by cramming the predicates in as regular attributes:


type RDFObject2 = {[predicate:string]: RDFValue2[]}

type RDFValue2 = RDFObject2|string;

Different applications might prefer different representations, quite possibly non-RDF-based ones. So I didn't want to lock my Resolver interface into any one representation of non-blob objects. But if WhatGoesHere is any, then I need some way to indicate what representation is being used!

The solution I spent this afternoon exploring is to add an attribute to the object. Since symbols are now a thing, I went with:


const SCHEMA_SYMBOL = Symbol.for("http://ns.nuke24.net/Synx/schema");

interface RDFObject1 {
	[SCHEMA_SYMBOL]?: "http://ns.nuke24.net/Synx/Schema/RDFObject1",
	uri?: string,
	simpleValue?: string,
	attributes?: {[attr:string] : RDFObject1[]},	
}

Why a symbol? Why a long name? Short answer: to avoid being confused with the rest of the stuff in the object.

Long, rambly answer:

Something that I think is a bit of a misfeature of JavaScript is that its objects mix data and metadata. In a language that conventionally separates these, like Java, you might have an object that implements the Map interface, which can map any key to any value (someMap.get("foo")), and also provide information about itself, e.g. to ask how many items are in it (someMap.size()). Javascript eventually added a Map type, but from years of not having it (and it not being representable in JSON), objects in JavaScript are often used as collections that include arbitrary keys. x.size and x["size"] mean exactly the same thing.

Putting attributes and collection elements into the same space presents some issues:

I'd say "never use arbitrarily keyed objects; always use Map for that kind of thing", except there's no way to represent Maps in JSON. So if you're serializing your data as JSON you'll need those objects-as-collections anyway.

One method to differentiate data from metadata attributes is to put metadata in a prototype, and use Object.hasOwn to determine if a given attribute is meant as data.

Another is to use symbols for the metadata (I really wish they had done this with .then), hence my SCHEMA_SYMBOL symbol. This still has the problem of not being representable in JSON, but allows you to differentiate between different kinds of properties without involving a prototype. It also allows you to add metadata on objects used by code that, for better or worse, uses Objects as keyed collections (perhaps because it reads them straight from JSON!).

Being almost representable as JSON is a pretty compelling feature to me, because writing translators between serialized and in-memory forms is a pain. This might be a lousy reason for preferring prototype-less APIs, though, since Object.create(prototype, ownProperties) is a thing. Well, it might still be annoying to do for deeply nested structures.

Another reason I prefer to use prototype-less (prototype-agnostic, really) objects is that it allows different libraries to agree on data types without having to reference some shared object. instanceof locks you into a specific implementation, but anyone can create an Object with some properties. This is also why I like that TypeScript interfaces are entirely structural. As long as you can construct an object of the matching shape, it is a valid instance of that interface, and that means I can write libraries that use compatible data types without having to 'know' about each other. My type RationalNumber = { numerator: number, denominator: number } objects will be accepted in any library that expects rational numbers to be that shape. And moreover, the TypeScript compiler can verify that this is all hunky dory.

So yeah, all that is to say that [Symbol.for("http://ns.nuke24.net/Synx/schema")]: "whatever" struck me as a reasonable way to add a bit of metadata to objects without messing with prototypes or having that tag mistaken as part of the object's data.

Note that the property is optional (as indicated by the question mark in { [SCHEMA_SYMBOL]?: ... }. If you're building a library where there's no ambiguity (nobody's returning an RDFObject as any and hoping the caller is smart enough to differentiate objects from meta-objects), you can skip actually assigning that attribute, and the TypeScript compiler will still be able to enforce that no objects with incompatible schemas can be assigned. What the above definition of RDFObject1 is really saying is that "if the object does have the SCHEMA_SYMBOL property, the value needs to be "http://ns.nuke24.net/Synx/Schema/RDFObject1"! And by declaring the type of a property that's not even there, we've effectively turned a structural type into a nominal type! Hey, what else can we do with this?

Well, there's the old enum FooMarker { }; type Foo = string | FooMarker; trick, which allows you to differentiate different types of strings. I never much liked it. Maybe because it requires defining two types. And also because then you have to do some trickery to assign a string to your foo. But that business with optional properties led me to this solution:


type Feet = number & { unitName?: "foot" };
type Meters = number & { unitName?: "meter" };

function toMeters(feet: Feet) : Meters {
	return feet * 0.3048; // Okay because converted to number and back.
}

toMeters(3); // Okay, but don't do this.
toMeters(3 as Feet); // Better!
toMeters(5 as Meters); // Compile error!

This provides stronger typing than simply aliasing primitive types (type Feet = number), but weaker than if you intersect the type with an enum or an interface with non-optional properties, which you can still fake by explicitly casting. For primitive types the stronger version might actually be preferrable, since it would prevent the above toMeters(3) from compiling, but if using this pattern to differentiate types of actual objects, it might be a good idea to make such 'fake' marker properties optional so as not to be confused with properties that you could expect to actually be present at runtime! [1][2]

You can go kind of nuts with your fake types, safe in the knowledge that there's no runtime cost. Like, how about we distinguish between different dimensions? Just slap a fake dimension property on our fake unit objects! Or even reify them somewhat, but keeping the connection between quantities and their values compile-time only:


interface Dimension<DimName> {
	name: DimName,
}

interface Unit<UnitName, Dim> {
	name: UnitName,
	dimension: Dim,
	unitValue: number,
}

function dimension<DimName>(name:DimName) : Dimension<DimName> {
	return { name };
}
function unit<UnitName, DimName>(name:UnitName, dimension:Dimension<DimName>, unitValue:number) : Unit<UnitName, Dimension<DimName>> {
	return {
		name,
		dimension,
		unitValue,
	}
}

type Quantity<U> = number & { unit: U };

const LENGTH = dimension<"length">("length");
// Must make unit and dimension name types explicit
// or TypeScript deduces them to be 'string', undermining our type checking!
const METER = unit<"meter","length">("meter", LENGTH, 1.0);
const INCH = unit<"inch","length">("inch", LENGTH, 0.0254);

type Meters = Quantity<Unit<"meter", Dimension<"length">>>;
type Inches = Quantity<Unit<"inch", Dimension<"length">>>;

const DURATION = dimension<"duration">("duration");
const SECOND = unit<"second","duration">("second", DURATION, 1);
const HOUR = unit<"hour","duration">("hour", DURATION, 3600);

type Seconds = Quantity<Unit<"second", Dimension<"duration">>>;
type Hours = Quantity<Unit<"hour", Dimension<"duration">>>;

// I've had better luck using the unit name types as template parameters
// rather than the unit types themselves.
// Must be careful or TypeScript deduces the types to be looser than what I want!
// I use 'ValueUnitName extends FromUnitName' instead of just using FromUnitName
// in both the value and fromUnit types because the latter can expand to
// a union type like "meter"|"inch" instead of an error, and I want the error!
function convert<
	FromUnitName,
	ToUnitName,
	ValueUnitName extends FromUnitName, // FromUnitName is authoritative
	// Separating 'ToDimension' from 'FromDimension' isn't necessary,
	// I guess because the dimensions were made explicit in the unit types.
	// But I separate them anyway for clarity:
	FromDimension,
	ToDimension extends FromDimension, // FromDimension is authoritative
>(
	value: Quantity<Unit<ValueUnitName, FromDimension>>,
	fromUnit: Unit<FromUnitName, FromDimension>,
	toUnit: Unit<ToUnitName, ToDimension>,
) : Quantity<Unit<ToUnitName, ToDimension>> {
	return value * fromUnit.unitValue / toUnit.unitValue as Quantity<Unit<ToUnitName, ToDimension>>;
}

console.log( "3 inches is " + convert(3 as Inches, INCH, METER) + " meters" );
console.log( "3 inches is " + convert(3 as Inches, INCH, INCH) + " inches" );

// Compile error!  type "inch" is not assignable to type "meter":
console.log( "3 inches is " + convert(3 as Inches, METER, INCH) + " inches" );

// Compile error!  "duration" is not assignable to type "length":
console.log( "3 inches is " + convert(3 as Inches, INCH, HOUR) + " hours" );

Okay, maybe "just" was a bit of a stretch. I always find that things get tricky when there's template arguments, and I had to try a few different ways of writing them in order to get the compiler to error as I wanted.

I probably won't try to use such complicated types in real code. In projects where I model physical properties, I do track units at runtime using type ComplexAmount = {[unitName:string]: RationalNumber}, but I've never run into trouble mixing units of different dimensions.

Alright, I've satisfied the part of me that was anxious about not having written this all down, and now I'm pretty tired of thinking about it. JavaScript's object model is a mess. Time to go feed the cows.