Open-data capability demonstration · Heritage & Culture
From Digitised to Computable: an Open Standard for Aerial Photography Heritage
Most of the UK's heritage is digitised but not computable. It is scanned, catalogued and on a map, yet a researcher still cannot query it at scale. This is a worked, fully open demonstration of how we close that gap for one collection type done properly, tested against real national archives in the UK, Canada and the United States.
Real frames
292
harvested live from the public catalogue
Substrate present
100%
already carry a footprint and an ISO date
Validation
0
SHACL violations after lift to Baseline
The Challenge
For twenty years, heritage funding has been measured in things scanned. By that metric the work is a success. By the metric that now matters it is not finished: a photograph on an interactive map is legible to a human with a browser but not to a machine. A researcher cannot ask a collection of thirty million aerial images to return every frame over a given city between 1943 and 1946, run an image-similarity search, or cross-reference a reconnaissance sortie to another archive, without clicking through by hand. This is the gap the Towards a National Collection programme (AHRC / UKRI) and its N-RICH work set out to close, and the gap we address for aerial photography specifically.
We built NAPH, a computation-ready digitisation standard for aerial photography heritage. It is synthesis, not invention: it is assembled entirely from existing standards (GeoSPARQL, PROV-O, Dublin Core, IIIF, Records in Contexts), and it defines three tiers (Baseline, Enhanced, Aspirational) that an institution can adopt incrementally without rebuilding what it already has.
We measured a real collection, not a hypothetical one
We took the National Collection of Aerial Photography (NCAP), one of the world's largest aerial archives at over thirty million images and part of Historic Environment Scotland, and measured a sample of 300 real records from its public catalogue. Metadata only, read-only, rate-limited, and in good faith. The result is a compliment to the collection: the hard part is already done.

| # | Field | What the real data shows |
|---|---|---|
| 1 | Machine-readable footprint | A WKT polygon is present for 100% of records, in EPSG:3857. Reaching a geographic CRS is a reprojection, not new data. |
| 2 | ISO-8601 capture date | Every record already carries an ISO-8601 date, with a self-declared precision flag distinguishing day-level from year-level dates. |
| 3 | Stable identifier | A stable unique identifier is present for 100% of records, ready to be minted into a resolvable URI. |
| 4 | Archival reference | An ISAD(G) archival reference is present for 86% of records, anchoring each frame to its finding aid. |
| 5 | Machine-readable rights | Absent from the payload. This is the one genuine baseline gap, and the one that matters most for a collection that licenses its imagery. |
292 real frames, on one map
Once the footprints are reprojected and the records are linked data, the collection becomes something you can see and query. Every frame below is a real record, positioned by its own footprint and coloured by decade, from a 1924 Royal Navy sortie over Hong Kong to post-war surveys across the Caribbean. Click any footprint in the live demo and its full NAPH metadata appears.

Zoom to a single sortie and the value of computable footprints becomes obvious. Below are the frames of a 1924 Royal Navy reconnaissance run over Hong Kong. Reprojected and overlapped, the individual footprints trace the aircraft's flight line across Victoria Harbour, a hundred-year-old survey you can now select, query and cross-reference frame by frame.

One standard, three national collections, three continents
A standard tested against a single archive risks being quietly shaped around it. So we put the same question to two more national collections on two more continents, and changed nothing but the thin harvester that reads each catalogue. The ontology, the SHACL shapes and the RiC-O to STAC crosswalk stayed identical. All three lift to the same NAPH Baseline at zero SHACL violations, and the interesting result is that each collection is missing a different Baseline piece, which is exactly what a shared standard exists to normalise.

| Collection | Real records tested | The one Baseline piece it lacks | Single closing transform |
|---|---|---|---|
| NCAP, United Kingdom Historic Environment Scotland | 292 frames (1924–1956), frame-level | Machine-readable rights (0%) | Reproject footprint EPSG:3857 to WGS84 |
| NAPL, Canada Natural Resources Canada | 40 dated mosaics across 8 regions (1932–2004) | Frame-level granularity (publishes regional mosaics) | None for geometry: footprints are native WGS84, rights already present (OGL-Canada) |
| WHAIFinder, United States UW-Madison / Wisconsin SCO | 225 frames (1937–1967, public-domain USDA) | Polygon geometry (publishes a centerpoint, not an area) | Reconstruct footprint closed-form from centerpoint plus map scale |
The UK collection has frame-level detail but no machine-readable rights; Canada's open subset has rights and native WGS84 but publishes mosaics; the US index has rights and frames but only a centerpoint. Three collections, three different gaps, one unchanged standard that makes each gap precise and automatable instead of leaving every archive to describe itself in its own vocabulary. That is the portability claim demonstrated, not asserted.
The expert ground: binding two worlds that never met
The archival community describes these collections with Records in Contexts and PROV-O, capturing custody and provenance but not spatial computability. The geospatial community indexes imagery with STAC and GeoSPARQL, delivering search by space and time but no archival provenance. Nobody had crosswalked the two for historic aerial photography. NAPH publishes that bridge, wrapped in FAIR and CARE, and every term mapped to Records in Contexts was verified against the published RiC-O 1.1 ontology rather than assumed.
Outcome
We harvested 292 real frames spanning 1924 to 1956, from Hong Kong to the Caribbean, reprojected their footprints to WGS84, and validated them against the standard at zero SHACL violations. Testing against real holdings also earned its keep: year-only dates in the sample exposed a defect in the standard's own date-precision shape, which we then corrected. The same records export cleanly to a STAC 1.0 catalogue, to GeoJSON and to IIIF, so a collection adopting the standard gains the entire geospatial and viewer ecosystem without bespoke integration.
The full ontology, the SHACL shapes, the live harvester, the RiC-O crosswalk and an interactive map are published open source as a case study in Open Ontologies, our open data-validation platform.
"Computation-readiness for heritage is not a request to start over. The substrate is usually already in the data. The work is a thin, automatable layer: reproject the geometry, mint a stable URI, attach machine-readable rights. Do that, and twenty years of digitisation becomes twenty years of computable research."
Tesseract Academy
Explore the open-source case study
Standard, ontology, SHACL shapes, harvester, STAC and RiC-O crosswalk on GitHub. MIT + CC BY 4.0.
