Inside the .upg File
One product graph, one self-describing JSON file: canonical, diffable, and readable by any tool.
Key Takeaways
- A .upg file is one product graph written to disk as plain JSON, so it travels to any tool and reads cleanly to any person.
- The $upg header makes the file self-describing: format version, spec version, node and edge counts, provenance, and a body integrity checksum.
- Canonical serialisation means the same graph always produces byte-identical output, so a git diff shows the change in meaning rather than formatting noise.
- One read path (parseUpg / normalizeDocument) accepts both the canonical and older flat files, so tools stay forward and backward compatible.
- upg fmt enforces the canonical shape, the way gofmt or Prettier settle formatting for code.
A .upg file is one product graph, written to disk as one file. It is plain JSON, so any tool that can read JSON can read it, and any person can open it in an editor and follow it. What makes it a format rather than some JSON we happened to write is a small set of rules about how that JSON is structured and serialised. UPG-577 made those rules canonical. Here is what they are, and why they matter.
One file, one graph
The body of a .upg file is the graph: a list of nodes (the entities, such as personas, problems, features, metrics, and decisions) and a list of edges (the typed relationships between them). A product's whole structured memory lives in this one file. You can commit it to a repository, attach it to a pull request, hand it to a colleague, or pass it to an AI agent. It travels as a unit.
A self-describing header
Every canonical .upg file opens with a single $upg header object. It carries the metadata a reader needs before touching the graph itself:
format_version: which version of the on-disk file format this is (currently1.0.0).spec_version: which version of the UPG specification the entities conform to.- A product summary plus node and edge counts, so a tool can describe the file without parsing all of it.
provenance: which tool wrote the file, and when.integrity: a checksum over the body, so tampering or corruption is detectable.
The header means a .upg file explains itself. You do not need out-of-band knowledge to understand what you are holding.
Canonical serialisation: diffs that mean something
The core rule is this: the same logical graph always serialises to byte-identical output, regardless of which tool wrote it. The SDK, the CLI, the MCP server, a cloud export, and an AI agent writing through MCP all produce the same bytes for the same graph.
This matters because .upg files live in version control. If two tools serialise the same graph two different ways, every save produces noise in the diff: reordered keys, shifted whitespace, resorted lists. You lose the ability to see what actually changed. Canonical serialisation removes that noise. A git diff on a .upg file shows you the change in meaning: the node you added, the edge you redirected, the property you updated. Nothing else moves.
The rules are anchored on RFC 8785, the JSON Canonicalization Scheme, with two deliberate deviations for the review lifecycle:
- Pretty-printed, not compact. Two-space indentation, one element per line, Unix line endings. A
.upgfile is meant to be read and reviewed, not only parsed. - Set-like arrays are sorted by content. The
nodes,edges,cross_edges, andtagsarrays are sets, not sequences, so their on-disk order is fixed by what they contain rather than the order they were added. The one exception isaliases, whose order is preserved because there it carries meaningful history.
One read path
Reading a .upg file always goes through the same function. parseUpg, and its lower-level companion normalizeDocument, accepts both the canonical $upg header envelope and older flat files, and returns one in-memory shape. It also repairs minor serialisation drift on the way in, so a read-then-write round trip is always clean.
The practical consequence is that tools are forward and backward compatible by construction. A reader written today handles files written before the canonical format existed, and a file written today is understood by any reader that uses the standard path.
Single graphs and portfolios
The same envelope describes two scales. A single-product file holds one product's graph. A portfolio file, marked $upg.kind: "portfolio", holds several products plus the cross-product edges between them, with the organisation and its collections at the top level. One format and one read path, whether you are tracking a single product or a company's worth of them.
Formatting on demand
Because the canonical form is defined, it can be checked and enforced. upg fmt rewrites any .upg file into canonical form. upg fmt --check verifies that a file is already canonical without changing it, which is what you run in CI. The same way gofmt or Prettier give a codebase one settled shape, upg fmt gives a product graph one.
Why the file earns its extension
A format is a promise. The .upg extension now means something specific: a self-describing, integrity-checked, byte-stable JSON file that any tool can read and any reviewer can follow, holding one product's structured knowledge or a portfolio of them. It is portable because it is plain JSON. It is diffable because it is canonical. It explains itself because of the header. Those three properties are what let product knowledge accumulate in a file instead of evaporating between tools.
Explore the UPG
Dive deeper into the Unified Product Graph, the open specification for product knowledge.