Section A — The Research Questions

Q1 — Platform Harm Analysis

View findings

What is the platform and what capabilities does it have? How has it been misused in practice, and what surfaces, vectors, and avenues for exploitation does it expose? What specifically about this medium makes it usable for exploitation? Can those properties be generalized into a framework that stress-tests future platforms before abuse patterns consolidate?

Why it matters

Press releases name platforms. They rarely explain what about a platform enabled the offense. Across thousands of cases, platforms present different roles: grooming surface, distribution channel, payment mechanism, communication layer. An affordance-level analysis turns those scattered mentions into a transferable framework for understanding how a medium enables harm.

Current state

In progress.

Q2 — Exploitation Life Cycle

View findings

What does enforcement and offending actually look like at scale across the ICAC/CAC offense landscape — familial and custodial abuse, grooming, sextortion, trafficking, production, possession and distribution, hands-on possessors, and AI-generated CSAM? What platforms, methods, and patterns recur within each subset? What is the structure of the lifecycle from contact through investigation through prosecution?

Why it matters

Most ICAC research works on hashing or in aggregate counts. Neither reveals how offenses unfold, how investigators respond to them, or how the cases contrast against the broader exploitation landscape. The lifecycle view, stratified by offense subset, is the layer between a single press release and a corpus-wide statistic.

Current state

In progress.

Q3 — Kill Chain Interventions

View findings

Given Q1 and Q2, where can technology, investigation, or enforcement intervene in the lifecycle to disrupt exploitation? Which intervention points carry the most leverage given the lifecycle and platform affordances the data actually shows?

Why it matters

Platform mapping and lifecycle analysis are not ends in themselves. The point is identifying where intervention is most plausibly effective: at detection, at reporting, at warrant execution, at prevention. Q3 connects Q1 and Q2 to the operational question practitioners and platform safety teams actually face.

Current state

In progress.

Section B — The Method: Ontology Pipeline

To answer Q1–Q3 rigorously at corpus scale, the enforcement record cannot remain only in relational tables and regex-derived tags. It must be expressed in a structured, graph-queryable form so patterns can be queried, validated, and compared across cases. CaseLinker already extracts consistent features from each narrative; the Ontology Pipeline maps those features into a standard investigation vocabulary and builds a validated knowledge graph as the mechanism for cross-case analysis.

The vocabulary in use is the CAC Ontology (Crimes Against Children Ontology), developed by Project VIC International as an interoperable standard for structuring crimes-against-children investigations.

What is CAC Ontology

The stack is layered. gUFO provides foundational ontology primitives. UCO (Unified Cyber Ontology) models cyber investigation objects and relationships. CASE (Cyber-investigation Analysis Standard Expression) defines how investigation narratives are expressed as interoperable graphs. CAC extends that stack for crimes against children specifically — platforms, victims, offenders, investigations, and outcomes as typed entities rather than free text.

CAC is shepherded by Project VIC International, built on the Linux Foundation's Cyber Domain Ontology stack (UCO and CASE). CaseLinker case data is being aligned to this vocabulary so graphs can be shared, validated, and queried with the same tools used in forensic and intelligence workflows.

The pipeline

The planned flow from CaseLinker features to research queries:

  1. CaseLinker case features — already extracted and stored (platforms, topics, investigation signals, prosecution outcomes, and related structured fields).
  2. Mapping layer — deterministic translation from CaseLinker fields to CAC Ontology entities and relationships.
  3. RDF emission — per-case graphs serialized as Turtle and JSON-LD.
  4. SHACL validation — each graph checked against CAC shape rules so only conformant assertions enter the corpus graph.
  5. SPARQL-queryable corpus — merged validated graphs become the substrate for Q1–Q3 analyses.

Pipeline status

Metric Value
Cases expressed in CAC 974 (Big Bang demo corpus) · 200 stratified compare pool
SHACL validation pass rate 100% (974/974)
CAC classes covered 49
Merged graph nodes 18,957
Cross-case bridges (shared ≥2 cases) 443 agency/platform bridges
Explore a case as a knowledge graph

Section C — Open Records and Field Work

Open records requests are in progress to extend the corpus with state-prosecuted cases beyond federal and ICAC press-release coverage. This section will track request status, jurisdictions, and corpus additions as the work develops.

Phase 2: Patterns is under progress and will be released as an arXiv pre-print. Last updated: June 6, 2026.