01 Doctoral Research · SMU

Multimodal AI for human-trafficking intelligence.

A modular agentic pipeline that ingests trafficking evidence in every format — court records, text, images, audio, structured data — and turns it into an investigator-accessible knowledge graph.

Research vision

Trafficking evidence arrives in every format — court records, online posts and ads, images, audio, case-management data — but it's fragmented, siloed, and rarely analyzable across sources. My goal is one system that ingests all of it, organizes it semantically, and makes it queryable at scale.

The architecture is modality-aware: a router classifies each incoming item and sends it to the right open-source model — legal text to a legal-domain extractor, general text to an instruction-tuned LLM, audio through speech recognition, images and audio through ImageBind for cross-modal embedding, structured records through schema mapping.

Every modality converges on a shared embedding space — indexed and searched with FAISS — and a unified knowledge graph (Neo4j), exposed through a retrieval-augmented interface, so an audio clip, a court-exhibit image, and a paragraph of an indictment can be retrieved and reasoned over in the same query.

Why legal dockets first

A system this broad can't be validated all at once, and the wrong starting modality stalls everything downstream. The dissertation deliberately begins with the most readily available, most structured, highest-signal source: federal trafficking legal dockets.

Court filings are public record, retrievable at scale, and dense with the exact entities the graph is built to hold — defendants, charges, parties, outcomes, jurisdictions. Proving the full ingestion → extraction → graph → RAG path on legal text first means the messier modalities extend a foundation that already works, rather than fighting an unproven architecture.

What I'm building

A modular, agentic pipeline that ingests dockets across both strands, derives its schema from the real docket population, assembles structured records into a queryable Neo4j knowledge graph, and exposes it through a RAG interface for non-technical analysts.

The dissertation evaluates the legal-text path of the broader multimodal architecture end-to-end. The contribution is the working system itself, measured against whether it's operationally useful.

Research questions

RQ1

Reliability

Can the pipeline reliably convert heterogeneous federal trafficking dockets — civil and criminal alike — into faithful, schema-consistent structured data?

RQ2

Knowledge graph value

Does the resulting knowledge graph let investigators surface cross-case and cross-strand patterns that document-level tools cannot?

RQ3 · central

Operational usability

Can non-technical analysts actually retrieve, reason about, and act on case information better than with their current tools? Answered through a formal user study — the central validation question.

Approach

Applied computer science: success is operational utility for investigators, not benchmark superiority. The model bake-offs and embedding evaluations are design decisions justified by their effect on the working system.

Multimodal by design, legal-first in practice

The pipeline is one path of a system architected to take in every trafficking evidence modality through a shared graph and embedding space.

An out-of-distribution domain

Federal trafficking dockets are absent from every major legal NLP corpus — genuinely new territory rather than a re-benchmarking of well-trodden appellate text.

The civil § 1595 strand, first-class

The under-studied, faster-growing civil strand — including third-party corporate liability theories developing in real time — is treated as a first-class case type.

Parallel civil/criminal architecture

Schema and graph carry both strands as equals, with a parallel_proceeding edge linking a civil suit to its corresponding criminal case so cross-strand queries are first-class.

A data-derived schema

The extraction schema is itself a research deliverable, derived from stratified analysis of the real docket distribution rather than assumed in advance.

The system · five modular components

1 Ingestion
2 Embedding
3 Relation Assembly
4 Graph Construction
5 Search & RAG

1 Ingestion Agent
Modality router, acquisition, role-aware PII scrubbing, and case-type classification — emitting modality-tagged structured JSON. The router is the surface along which the system extends from legal text to the full multimodal vision.
2 Embedding Agent
Builds FAISS indexes for hybrid retrieval — text encoders for the dissertation, ImageBind cross-modal embeddings for the broader system.
3 Relation Assembly Agent
Converts structured records into (subject, relation, object) triples with confidence scores and source-chunk provenance — case-type-aware, including parallel-proceeding linkage.
4 Graph Construction Agent
Writes triples into Neo4j under enforced schema constraints, with parallel civil and criminal node and edge types and full provenance.
5 Search & RAG Agent
Interprets natural-language investigative queries, retrieves subgraphs via FAISS + Neo4j Cypher, and synthesizes answers with source citations, a confidence score, and a reasoning chain.

Modality routing · the full system

Modality	Input	Model / method	Output	Status
LEGAL_TEXT	Federal court filings, dockets (PDF)	Legal-domain structured extraction (RQ1 bake-off)	Structured JSON → KG	Active — dissertation scope
GENERAL_TEXT	Social posts, ads, NGO narrative reports	Instruction-tuned LLM (LLaMA-3)	Entities, trafficking indicators, URLs	Designed; future extension
AUDIO	Interviews, intercepted calls (MP3/WAV)	Whisper ASR + LLM extraction + ImageBind	Structured payload + 1024-dim embedding	Designed; future extension
IMAGE/VIDEO	Ads, location/vehicle images, keyframes	ImageBind + OCR	1024-dim embedding + descriptors + OCR text	Designed; future extension
STRUCTURED_DATA	CSV/JSON from case systems, open data	Schema mapping (YAML, no LLM)	Structured JSON → KG	Designed; future extension

All routes pass through the PII scrubbing gate before any model is invoked. ImageBind's cross-modal alignment is what lets audio, image, and text be retrieved in a single embedding space.

Core technical experience

Developed across the broader multimodal program of work.

ImageBind & cross-modal retrieval

Encoding image and audio into a shared embedding space aligned with text, enabling queries that cross modality boundaries.

FAISS vector search

Index construction and tuning (Flat / IVF / IVFPQ) for scalable semantic retrieval, with embedding-to-source traceability.

Knowledge graph engineering

Neo4j schema design, ontology constraints, entity resolution via embedding-similarity merges, and provenance-tagged nodes and edges.

Retrieval-augmented generation

Hybrid vector + graph-traversal retrieval feeding an LLM synthesis layer that returns cited, auditable answers with reasoning chains.

Ethics as design requirements

In a criminal-justice or civil-litigation context, a system that fails these shouldn't be deployed regardless of technical performance.

Public-record corpus only

Filed court documents — never open investigations or survivor service records.

Role-aware PII scrubbing

Defendants retained as public record; victims, witnesses, minors, informants, and survivor-capacity plaintiffs tokenized with within-docket-consistent, cross-docket-unlinkable tokens; ambiguous cases flagged for review.

Bias transparency

Corpus composition documented, with metrics disaggregated by case type, statute, jurisdiction, and document type.

Dual-use mitigation

Resolved court documents only; legal constructs rather than operational detail; a research tool, not a live feed. Extension to live multimodal data requires additional dual-use assessment.

Survivor agency

Survivor-informed extensions to schema, query design, and governance are scoped as required steps before any operational deployment.