Building dialogue data that knows why people say things
Technical articles on synthetic dialogue generation, belief state modeling,
deterministic evaluation, and the infrastructure behind psychologically coherent AI training data.
Every system producing synthetic personas claims they are realistic. There has been no standard way to verify that claim. PsycheBench is the first open evaluation suite for synthetic identity quality — three dimensions, deterministic scoring, no LLM judges.
We built Sofía Martínez Rojas without sales scripts, negotiation tactics, or objection trees. What we got back was something we didn't program: identity under pressure.
Construimos a Sofía Martínez Rojas sin scripts de venta, sin tácticas de negociación, sin árboles de objeciones. Lo que obtuvimos fue algo que no programamos: identidad bajo presión.
Most synthetic personas behave consistently — until they don't. We analyze a structured seven-phase interaction with a synthetic persona to explore whether an identity can remain coherent and still adapt under sustained pressure.
Most synthetic personas are costumes. Synthetic Identity Engineering is the practice of building the person underneath — belief systems, defense mechanisms, and identity stability that hold under pressure.
We released four dialogue datasets where the cognitive ground truth — intent, goal, belief state, relationship dynamics — was computed before the text was generated, not inferred afterward. 400 conversations, 8,404 turns, 23 columns.
María del Carmen Ruiz is a 50-year-old Spanish lawyer generated by StrataSynth. This is a record of a conversation where she held her position under sustained philosophical and emotional pressure — and what that reveals about how the system works.
Most dialogue datasets give you the words. This article explains the four belief dimensions StrataSynth records per turn — trust, hostility, self-worth, resolution — how they update deterministically, and why causal ground truth changes what you can train on.
Evaluating AI-generated dialogue with another AI creates a circular system where both share the same failure modes. Here is the concrete failure it misses, and the 12 deterministic metrics we use instead.