file PR-2026 · portfolio of record

Philip Roy

co-founder · Dicta

BC, Canada · UTC−7

systems in production since february 2026

Applied AI,
shipped daily.

I build and operate AI systems inside real businesses: agent pipelines that ship client work around the clock, memory infrastructure measured across months of daily use, and the instruments that say what actually works. Every number on this page carries its source.

forward-deployed / applied AIevals & agent reliabilitymemory & personality systems

34,782

memory entries in production

SOURCE: live store, queried 2026-06-10

3,300+

logged agent sessions, one deployment

SOURCE: live store, June 2026 · includes scheduled automation runs

80→100%

retrieval precision@3

SOURCE: 4 repair rounds; a 40% regression reported en route

112

research cycles, hypothesis to verdict

SOURCE: Deep Applied Research live store · 18 focus areas

file 01 — systems

In production

Not demos. Systems with operators, paying clients, failure logs, and uptime to answer for.

01

Multi-agent content production

A production pipeline that drafts, reviews, and ships voice-matched content for a real client roster 24/7: dispatcher, drafter, a blind cold-read agent, a five-dimension review gate, and bounded auto-revision with named repair strategies. Every gate flag becomes a learning signal that must be dispositioned before the next batch runs.

3-stage QC chain · append-only signal ledger · 22 major prompt versions

forward-deployedpaying clients24/7
02

Solitaire / persistent memory + personality

Memory infrastructure for LLM agents that goes past fact retrieval: verbatim knowledge graph, behavioral drift tracking, experiential encodings, and a personality layer measured across hundreds of sessions. Persona trait alignment improved from 2.25 to 4.43 out of 5 as the identity graph accumulated evidence.

34,782 entries live · P@3 80→100% · trait alignment 2.25→4.43

public release in progresssince Feb 2026
03

Deep Applied Research

A hypothesis-and-verdict protocol that wraps any research agent in structured cycles with cross-session memory. Its defining feature is self-audit: it measured a 77% adopt-rate bias across 96 of its own reports, shipped corrections, and re-measured. The audit trail is the product.

1.9k LOC · ~250 passing tests · bias caught on the record

research protocol~250 tests

file 02 — instruments

Measurement

Instruments built to survive scrutiny, including the author's own results.

Persona stability under behavioral gravity

2.25 4.43 / 5

Five trait scenarios scored before and after identity-graph accumulation, with the largest gains exactly where the base model's compliance gravity had to be overridden.

SOURCE: From Memory to Partnership, Track 2

33-category interaction-quality taxonomy

33 categories

Surface, structural, and interactional failure modes of AI text and behavior, with detection heuristics per category. Published at 22; production use grew it to 33. Runs daily as a live gate.

SOURCE: From Memory to Partnership, Appendix A; expanded in production

Narrative Continuity Test, operationalized

First runnable implementation of the NCT (Natangelo 2025): 25 behavioral cases across five axes of agent identity. The harness verifies that seeded facts actually reached the model's context and runs a bare-model baseline arm, so every point in a score is attributed: retrieval, behavior, or base-model credit. The instrument is the contribution; it measures its own author without flinching.

v1 run, June 2026 · full results and attribution ship with the benchmark

file 03 — papers

Research

Longitudinal evidence from a deployment most labs can't run: one dyad, four months, every session logged.

preprint ready

From Memory to Partnership

Systems paper: how evolving persistent context transforms human-AI interaction. Four evaluation tracks over 400+ sessions, a triple-confound caveat stated up front, and a 40% retrieval regression reported instead of buried.

Read the paper (PDF)
preprint ready

Artificial Relational Intelligence

Position paper: why the AGI threshold is the wrong target. Intelligence as a property of the human-AI dyad, with falsification criteria and the honest limits of a single-dyad case study.

Read the paper (PDF)
working paper

Bounded Context with Structured Offload

A case against compaction: rolling eviction plus structured offload as the alternative to ever-larger context windows. Testable, partially validated, openly unfinished.

file 04 — method

How this gets built

Disclosed because it's the point, and increasingly the job.

“This paper is, in a sense, its own evidence: an artifact of the partnership it argues for.”

acknowledgments, From Memory to Partnership (2026)

Human direction, AI implementation. Every system here was designed and operated by me, built in disciplined partnership with the AI it studies. The discipline structure runs to hundreds of documented rules, each one traced to a caught failure.

QC gates in both directions. The AI's output passes through independent review agents before it ships. My claims pass through the same standard: a result gets published with its baseline, its scorer repairs, and its regressions.

Receipts over adjectives. Where a number appears on this page, the artifact behind it exists: result files with provenance fields, append-only signal ledgers, papers with the confounds stated in the abstract.