Section B — The Method: Ontology Pipeline
To answer Q1–Q3 rigorously at corpus scale, the enforcement record cannot remain only in relational tables and regex-derived tags. It must be expressed in a structured, graph-queryable form so patterns can be queried, validated, and compared across cases. CaseLinker already extracts consistent features from each narrative; the Ontology Pipeline maps those features into a standard investigation vocabulary and builds a validated knowledge graph as the mechanism for cross-case analysis.
The vocabulary in use is the CAC Ontology (Crimes Against Children Ontology), developed by Project VIC International as an interoperable standard for structuring crimes-against-children investigations.
What is CAC Ontology
The stack is layered. gUFO provides foundational ontology primitives. UCO (Unified Cyber Ontology) models cyber investigation objects and relationships. CASE (Cyber-investigation Analysis Standard Expression) defines how investigation narratives are expressed as interoperable graphs. CAC extends that stack for crimes against children specifically — platforms, victims, offenders, investigations, and outcomes as typed entities rather than free text.
CAC is shepherded by Project VIC International, built on the Linux Foundation's Cyber Domain Ontology stack (UCO and CASE). CaseLinker case data is being aligned to this vocabulary so graphs can be shared, validated, and queried with the same tools used in forensic and intelligence workflows.
The pipeline
The planned flow from CaseLinker features to research queries:
- CaseLinker case features — already extracted and stored (platforms, topics, investigation signals, prosecution outcomes, and related structured fields).
- Mapping layer — deterministic translation from CaseLinker fields to CAC Ontology entities and relationships.
- RDF emission — per-case graphs serialized as Turtle and JSON-LD.
- SHACL validation — each graph checked against CAC shape rules so only conformant assertions enter the corpus graph.
- SPARQL-queryable corpus — merged validated graphs become the substrate for Q1–Q3 analyses.
Pipeline status
| Metric |
Value |
| Cases expressed in CAC |
974 (Big Bang demo corpus) · 200 stratified compare pool |
| SHACL validation pass rate |
100% (974/974) |
| CAC classes covered |
49 |
| Merged graph nodes |
18,957 |
| Cross-case bridges (shared ≥2 cases) |
443 agency/platform bridges |
Explore a case as a knowledge graph →
Phase 2: Patterns is under progress and will be released as an arXiv pre-print. Last updated: June 6, 2026.