ScreenJSON for Archivists

The preservation problem, briefly

Any moving-image archive today has, somewhere in its holdings, screenplays. Feature film scripts, TV bibles, episode outlines, shooting drafts, continuity scripts. Some are original typescripts. Some are faxed rewrites. Most of the ones produced after 1995 live in Final Draft .fdx files.

Final Draft is a fine tool. It is also a proprietary format, maintained by a single vendor, whose long-term availability is not under the archive’s control. The same is true, to varying degrees, for FadeIn, Movie Magic Screenwriter, Celtx, and every web-based writing tool that has come and gone in the past decade.

An archival profession that spent thirty years migrating audio and video from proprietary containers into open, documented formats (WAV, FLAC, Matroska, FFV1) should be able to recognise the situation for what it is. The screenplay file is the next migration.

Why ScreenJSON, for an archive

Three properties matter more for preservation than they do for any other use case:

Openness. The schema is published, versioned, and available as a JSON Schema document at a stable URL. A conforming validator can be written from the specification alone. No vendor has unique insight into what a ScreenJSON file means; if everyone who works on the format disappeared tomorrow, the specification would still describe the files.

Text, not binary. A ScreenJSON file is plain UTF-8 JSON. It opens in any text editor. It survives every kind of archival operation — copy, checksum, diff, compression, bit-rot detection, format migration — that archives have spent decades getting right.

Structural, not presentational. The preserved artifact is “the screenplay”, not “a rendering of the screenplay”. Rendering can be reconstructed at any future date from the document plus a rendering convention. The opposite — reconstructing a structured document from a rendered PDF — is lossy, manual, and expensive.

A migration workflow

The pattern is familiar to any archive that has ingested legacy media:

Characterise the source. Inventory what you have: how many .fdx, how many .fadein, how many .fountain, how many PDFs. Which are originals, which are intermediate, which are duplicates.
Convert to the open format. Run each source through screenjson-cli (or screenjson-export for the free reference subset). Each yields a ScreenJSON document.
Validate. Every output goes through screenjson validate --strict. Any failure is held back for manual review.
Catalogue. Derive catalogue records from the document’s metadata — authors, title, logline, characters, registration. Promote any free-text headers in the source to meta entries, don’t discard them.
Store both. Keep the original source file alongside the ScreenJSON output. An archive preserves evidence; the original is the evidence, the ScreenJSON is the accessible rendition.
Checksum, fixity, bit-rot protection. Same as any other digital archival object.

Metadata discipline

ScreenJSON is generous about metadata, but an archive wants some of it populated to a consistent standard. A few things we suggest treating as non-negotiable in any archival ingestion:

id — a new UUID per ingested document, minted at ingest, even if the source contains one.
title — populated in at least one language.
authors — every credited author, with a stable UUID per person across the collection.
generator — record the ingestion tool’s name and version. This is the closest ScreenJSON has to PREMIS agent.
registrations — record WGA / guild / national registry data if available.
license — a named license descriptor for anything open-licensed; otherwise the rights statement your archive uses for unresolved rights.
meta — anything else you’d put in a PREMIS intellectualEntity or a Dublin Core element, keyed consistently across your collection.

Authority control

Characters, authors, and contributors all carry UUIDs, which makes cross-collection authority control tractable. A single writer working across fifty screenplays in the archive is one UUID, not fifty hand-typed strings. Whether you reconcile against ORCID, VIAF, or an internal authority file is your call; ScreenJSON doesn’t mandate one.

Revisions and provenance

The revisions array at the document level and on individual elements is the canonical place to record authorial revision history. For archival purposes, treat it as part of the evidentiary record: never squash revisions on ingestion, always preserve them as-is, and record your own ingestion as a final, clearly-labelled revision if your archive needs that discipline.

Open questions the schema doesn’t fix

The schema doesn’t solve every archival problem, and we’re explicit about that. A few things remain your institution’s policy call:

Rights metadata. The schema has a license descriptor but doesn’t mandate a rights vocabulary. Use RightsStatements.org, Europeana Rights, or your internal taxonomy, recorded in meta.
Physical provenance. If the source was a paper typescript scanned and OCR’d into a PDF, that history belongs in your repository’s provenance metadata, not in the ScreenJSON file.
Contextual access. Some material will be restricted. ScreenJSON’s content encryption gives you a technical layer; your access control is still your repository’s job.

On versioning the schema itself

ScreenJSON uses semantic versioning. The specification commits to:

Major version bumps when a change is backwards-incompatible.
Minor version bumps when fields are added.
Patch version bumps for clarifications.

An archive should pin to a known schema version for a given ingestion project and migrate deliberately when upgrading, the same way you would pin any other ingestion contract.

Tool: screenjson-cli
Tool: Greenlight — for batch migration of large collections.
How-to: Validate a ScreenJSON document
How-to: Migrate from FDX archives to ScreenJSON
Specification: versioning & conformance