Internal Technical Review

Synilly: Data Grounding & Improvement Paths

How our synthetic personas are currently grounded, where the gaps are, and what we can build to close them.

50 Personas 5 Improvement Options Phased Roadmap

Swipe or tap arrows to navigate

Part 1 — Current System

Architecture Overview

The end-to-end flow from research brief to final report.

Research Brief
Topic, objectives, target audience

↓

Panel Matching
Relevance scoring against 50 personas

↓

Pre-Session Briefing
Study history review, angle identification

↓

Discussion Simulation
4-phase moderated conversation

↓

Report Generation
Findings, confidence scores, recommendations

Key point: Every step currently uses pre-authored static data. No LLM calls happen during a session — the entire flow is deterministic and repeatable.

Part 1 — Current System

Persona Data Model

The MaintainedPersona interface — what we store per person.

interface MaintainedPersona { id, name, age, avatar, occupation, location memberSince, studyCount, consistencyScore lastActiveDate, keyTraits[], communicationStyle demographics: { income, household, education, career, pets, living // 6 fields } personality: { summary, lifeStory, keyLifeEvents[], // year + event + impact values[], contradictions[] // 5 fields } beliefs: { brandRelationships[], // brand + relationship + note decisionMakingStyle, techAttitude, priceProcessing, mediaChannels[] // 5 fields } studyHistory: StudyRecord[] // per-study memory }

16

Top-level fields

6

Demographics

5

Personality

5

Beliefs

~6

Per study record

50+

Total structured fields

Part 1 — Current System

Persona Depth: Maya vs Derek

Same interface, radically different behavior profiles.

Maya Chen, 31

Decision style: Emotional first, rational justification second

Price processing: Anchors to perceived value. $50/mo feels fine for "Mochi's health"

Contradiction: Claims to be skeptical of "premium" marketing but consistently buys premium

Consistency: 82%

Derek Thompson, 29

Decision style: Research-intensive with comparison spreadsheets. Needs 3-5 data points

Price processing: Price-insensitive when value is proven. Calculates cost-per-benefit ratios

Contradiction: Demands quantitative proof but bought a $3k espresso machine on vibes

Consistency: 91%

Why this matters: The same concept test question ("Would you pay $49/mo?") produces authentically different responses because each persona's grounding data creates distinct reasoning paths.

Part 1 — Current System

Panel Matching Flow

How we select the right personas for a study.

Research Brief Analyzed
Extract topic, product category, target demo

↓

Scan 50 Personas
Match against demographics, beliefs, brand history

↓

Score & Rank
relevanceScore (0-100) + matchReasons[]

↓

User Selects Panel
Review scores, approve/swap panelists

interface PanelCandidate { persona: MaintainedPersona; relevanceScore: number; // 0-100 matchReasons: string[]; // ["Brand overlap: Chewy", ...] isSelected: boolean; }

Filtering supports search, age range, location, traits, and sort by name/studies/consistency. All computed client-side against the full persona dataset.

Part 1 — Current System

Discussion Simulation

Pre-authored conversation with 4 phases and live UX affordances.

Warmup

Personas introduce themselves and their context. Establishes character voice.

Deep Dive

Probing questions on concerns, fears, desires. Emotional territory.

Concept Test

Present the product concept. Capture reactions, objections, pricing feedback.

Wrap-up

"One thing that would make you sign up?" Final asks for conviction testing.

UX simulation: Messages appear sequentially with typing indicators and variable delays (600-2000ms). Live "insights" sidebar updates as themes emerge. All pre-scripted — no LLM in the loop.

Part 1 — Current System

Longitudinal Consistency

Personas remember past studies and maintain behavioral coherence.

interface StudyRecord { id: string; date: string; topic: string; role: string; // "Primary" | "Secondary" keyTakeaways: string[]; notableQuotes: string[]; }

Maya Chen's Study History (4 studies)

Oct '24: Dog food concept — flagged $49 as borderline, wanted guarantee
Jan '25: Meal kit study — drew parallels to pet food subscription fatigue

Cross-study signal: Her price sensitivity was consistent across both studies, and she connected the two topics unprompted. Consistency score: 82%

Derek Thompson (5 studies, 91% consistency)

Always demands published data / clinical evidence
Always calculates ROI before committing
Pet monitoring study: "If I can't export the data, what's the point?"

Part 1 — Current System

Report Generation

Structured findings with confidence scores, quotes, and prioritized recommendations.

Finding: Price sensitivity is #1 barrier

80% confidence

4/5 panelists flagged $49/mo as too high. Acceptable range: $29-39/mo.

"At $49 I'd try it once and probably cancel"

— Maya Chen, Enthusiast Owner

Finding: Trust requires proof, not promises

90% confidence

Every panelist demanded concrete evidence. Type of proof varies by segment.

Recommendation (High Priority)

Introduce a "Guidance Only" tier at $15-20/mo — separating nutritionist from food delivery.

Part 1 — Current System

What Makes It "Grounded"

The structural properties that prevent personas from being shallow caricatures.

50+ structured fields per persona
Not just age/income — life stories, values, contradictions, brand histories, media habits, communication styles.

Psychological contradictions
Each persona has 3+ built-in inconsistencies that mirror real human behavior. Maya is "premium-skeptical" but buys premium. Derek is "purely rational" but bought a $3k espresso machine on vibes.

Brand relationship topology
Loyal / lapsed / curious / hostile — not just "uses Brand X" but why and how they feel about it.

Study memory & consistency tracking
Cross-study behavioral coherence scored 0-100. Contradictions across studies are flagged, not hidden.

Life events with causal impact
Events have explicit impact statements: "Pixel's health scare → triggered deep dive into nutrition research."

Part 1 — Current System

Current Limitations

Honest assessment of where the system falls short.

1

Static personas. All 50 personas are hand-authored TypeScript objects. No generation, no variation, no adaptation to new domains.

2

Pre-scripted discussions. Every conversation is deterministic. Same brief = same discussion = same report. No emergent insights.

3

No real-time data. Personas can't reference current prices, trends, or market conditions. Their world is frozen at authoring time.

4

No statistical validation. Confidence scores are hand-assigned (80%, 90%). No sample size warnings, no margin of error, no bias detection.

5

No LLM reasoning. Zero AI calls during a session. The "AI-powered" claim currently refers to the concept, not the implementation.

6

Domain-locked. Personas are deeply grounded in pet products. Expanding to fintech or healthcare requires authoring 50 new personas from scratch.

Part 2 — Option A: RAG-Grounded Personas

Concept: Real Data In, Realistic Personas Out

Replace hand-authored personas with AI-generated ones grounded in real demographic data.

The Problem

Our 50 personas are plausible but not validated. Maya's $95k income, Portland location, and UX career were authored by a human guessing at realistic combinations. We have no way to know if this profile actually exists in meaningful numbers.

The Solution

Feed real data sources (census, BLS, market research) into an LLM persona generator. Each generated persona comes with citations back to the source data. Demographics are statistically representative, not imagined.

Data Sources

US Census / ACS — demographics, income, geography
Bureau of Labor Statistics — occupations, salary bands
APPA National Pet Owners Survey — pet spending data
Pew Research — media habits, tech adoption

Part 2 — Option A: RAG-Grounded Personas

Data Flow

From raw data to validated, citable personas.

Census API

BLS Data

APPA Survey

Pew Research

↓

Chunk & Embed
Split data into demographic facts, embed into vector DB

↓

Persona Generator (LLM)
System prompt + retrieved context = persona

↓

Validation Layer
Check: does income match occupation for this region?

↓

Validated Persona + Citations
Every field traceable to source data

Part 2 — Option A: RAG-Grounded Personas

System Prompt Example

How retrieved real-world data gets injected into persona generation.

// Simplified persona generation prompt const systemPrompt = `You are a persona generator for user research. Generate a realistic persona using ONLY the provided demographic data as grounding. RETRIEVED CONTEXT: - Portland, OR median income: $73,340 (ACS 2024) - UX Designers, Portland: $85k-$110k (BLS) - 67% of millennials own pets (APPA 2024) - Pet owners 25-34 spend avg $1,480/yr (APPA) RULES: 1. Income MUST fall within BLS range for occupation 2. Pet spending must align with APPA demographics 3. Every demographic field must cite its source 4. Personality can be creative but must not contradict demographic grounding`; const result = await llm({ system: systemPrompt, prompt: "Generate: 30s UX designer, Portland, dog owner" });

Part 2 — Option A: RAG-Grounded Personas

Before / After

Current: Hand-Authored

income: "~$95k" education: "BFA + MFA" monthlySpend: "~$180/mo"

Plausible but unvalidated
No citations
Author's intuition
Fixed at write time

RAG-Grounded

income: "$97k" // BLS P50 education: "BFA + MFA" // 72% UX monthlySpend: "$123/mo" // APPA

Statistically grounded
Every field has a source
Can regenerate for any domain
Update when data updates

Key gain: Clients can ask "is this persona representative?" and we can point to real data instead of saying "we think so."

Part 2 — Option B: Multi-Agent Discussions

Concept: Personas as Independent Agents

Replace pre-scripted conversations with live multi-agent LLM discussions.

The Problem

Current discussions are fully deterministic. Same brief always produces the same conversation. No follow-up questions, no emergent insights, no ability to probe deeper on surprising responses.

The Solution

Each persona becomes an independent LLM agent with:

A system prompt built from their full data model (50+ fields)
Conversation memory (within session + cross-study)
A Moderator Agent that manages turn-taking, probes, and phase transitions

What Changes

Same brief, different run = different conversation. Personas can disagree, build on each other's points, or reveal unexpected connections. The moderator can follow up on surprising responses in real time.

Part 2 — Option B: Multi-Agent Discussions

Architecture

Moderator orchestrating N persona agents with shared context.

Moderator Agent
Controls phase flow, selects next speaker, probes deeper

↓ assigns turn

Maya
Agent

Derek
Agent

Priya
Agent

Tom
Agent

↓ responses feed back

Shared Conversation Context
All agents see full discussion history

↓

Live Insight Extraction
Separate LLM call: theme detection, sentiment, contradictions

Part 2 — Option B: Multi-Agent Discussions

Agent System Prompt Construction

How persona data becomes agent behavior.

function buildAgentPrompt(p: MaintainedPersona) { return `You are ${p.name}, ${p.age}, ${p.occupation}. Location: ${p.location} PERSONALITY: ${p.personality.summary} LIFE STORY: ${p.personality.lifeStory} CONTRADICTIONS (embody these naturally): ${p.personality.contradictions.map(c => '- ' + c).join('\n')} BRAND RELATIONSHIPS: ${p.beliefs.brandRelationships.map(b => '- ' + b.brand + ': ' + b.relationship + ' — ' + b.note ).join('\n')} COMMUNICATION STYLE: ${p.communicationStyle} RULES: 1. Stay in character at all times 2. Reference your life events naturally 3. React to other panelists' statements 4. Your contradictions should surface organically 5. Never break character or acknowledge being AI`; }

The full persona data model becomes the agent's behavioral constitution. Contradictions aren't bugs — they're features that make responses human-like.

Part 2 — Option B: Multi-Agent Discussions

Before / After

Current: Pre-Scripted

24 messages, fixed order
Same output every time
Can't follow up on surprises
Host questions pre-authored
No persona-to-persona reactions
Insights are predetermined

Multi-Agent: Dynamic

Variable length, organic flow
Different output each run
Moderator probes emergent themes
Questions adapt to responses
Personas react to each other
Insights emerge from conversation

Example: What Multi-Agent Enables

Tom says "$49 is absurd." Maya responds: "Wait, I spend $50/mo on Farmer's Dog and it doesn't bother me... why does this feel different?" The moderator picks up on this and probes: "Maya, can you unpack that? What makes one $50 feel okay and another feel wrong?" This exchange can't happen in a pre-scripted system.

Part 2 — Option C: Calibration

Concept: Calibrate Against Real Research

Run synthetic studies alongside real ones. Measure divergence. Adjust.

The Problem

We have no way to measure if synthetic research outputs are accurate. Our confidence scores (80%, 90%) are hand-assigned, not empirically validated. A client has no reason to trust them over their own intuition.

The Solution

Partner with clients running real focus groups. Run the same brief through Synilly. Compare outputs. Build a calibration dataset that measures where synthetic research agrees with real research — and where it diverges.

What This Unlocks

Empirically-backed confidence scores
Known weak spots (e.g., "synthetic personas underestimate price sensitivity by 15%")
Correction weights we can apply to future outputs
A credibility story for enterprise sales

Part 2 — Option C: Calibration

Calibration Pipeline

How the feedback loop works.

Real Focus Group
5-8 real participants

Synilly Session
Same brief, synthetic panel

↓ both produce

Real Findings

Synthetic Findings

↓ compare

Delta Analysis
Theme overlap, sentiment alignment, price sensitivity match, missed insights

↓

Weight Adjustment
Update persona behavior parameters, adjust confidence calculation

Metric examples: Theme recall (did synthetic find all real themes?), sentiment polarity match, price band accuracy, recommendation overlap.

Part 2 — Option C: Calibration

Before / After

Current: Uncalibrated

finding: "Price is #1 barrier" confidence: 80 // hand-assigned basis: "4/5 panelists said so"

No empirical basis for the 80% number. Could be 50% or 95% in reality.

Calibrated

finding: "Price is #1 barrier" confidence: 74 // empirical calibrationNote: "Synilly tends to overweight price sensitivity by ~12% vs real panels (n=23 studies)"

Confidence is earned, not declared. Known biases are disclosed upfront.

Trust equation: A tool that says "we're 74% confident, and here's why" is more trustworthy than one that says "we're 90% confident" with no backing.

Part 2 — Option D: Real-Time Data

Concept: Ground Responses in Live Data

Pull current market data, pricing, reviews, and social sentiment into persona responses.

The Problem

Our personas exist in a frozen world. Maya's opinions about The Farmer's Dog were written months ago. She can't reference a recent price increase, a viral TikTok controversy, or a competitor's new product launch.

The Solution

Before each discussion, pull live context relevant to the research topic:

Pricing: Current subscription prices from competitor sites
Reviews: Recent sentiment from Reddit, Amazon, Trustpilot
Social: Trending conversations on TikTok, Instagram
News: Relevant industry developments

Inject this as additional context into each agent's prompt. Maya doesn't just have opinions — she has opinions informed by what's actually happening right now.

Part 2 — Option D: Real-Time Data

Data Flow

Live context injection pipeline.

Research Brief
Topic: "Premium dog food subscription at $49/mo"

↓ extract entities

Reddit API

Price Scraper

News API

Social API

↓ aggregate

Market Context Document
Summarized, timestamped, source-linked

↓ inject into agent prompts

Grounded Agent Responses
Personas reference current data naturally

Per-persona filtering: Maya sees Instagram/TikTok content (her media channels). Derek sees Reddit/YouTube reviews (his channels). Each persona gets context filtered through their own media diet.

Part 2 — Option D: Real-Time Data

Before / After

Current: Static Context

"At $49 I'd try it for a month to see if Mochi likes it."

— Maya (pre-scripted, context-free)

No reference to current market. No awareness of competitor moves. Generic price reaction.

Real-Time Grounded

"I saw Farmer's Dog just raised to $52/mo and people on Reddit are furious. So $49 actually feels competitive now — but I've been looking at Spot & Tango since they dropped to $39."

— Maya (agent + live market data)

Response anchored to real pricing and real conversations.

Research value: The insight shifts from "people think $49 is a lot" (obvious) to "people think $49 is competitive vs Farmer's Dog but weak vs Spot & Tango" (actionable).

Part 2 — Option E: Statistical Validation

Concept: Add Statistical Rigor

Confidence intervals, sample size warnings, and bias detection.

The Problem

We present findings as if 5 synthetic personas constitute a valid sample. "4/5 panelists agree" sounds compelling, but N=5 has no statistical power. Experienced researchers will immediately question this.

The Solution

Run multiple sessions: Same brief, 10 different panels of 5 = 50 data points
Confidence intervals: "Price concern appeared in 78% of sessions (CI: 65-88%)"
Sample size warnings: Flag when N is too low for claim strength
Bias detection: Check if persona demographics over/underrepresent populations
Sensitivity analysis: Would the finding change with different persona selection?

Key insight: The cost of running 10 synthetic sessions is trivial (minutes + API cost). We should exploit this advantage over real research where N is expensive.

Part 2 — Option E: Statistical Validation

Before / After

Current Output

title: "Price is #1 barrier" confidence: 80 label: "4/5 panelists"

Single run, single panel, hand-scored confidence. No margin of error. No acknowledgment that N=5 is tiny.

Statistically Validated

title: "Price is #1 barrier" confidence: 78 ci: [65, 88] sessions: 10 totalPanelists: 50 biasWarning: "Panel skews urban, high-income. Finding may not hold for rural demographics."

Multi-run aggregation with honest uncertainty bounds.

The paradox: Showing less certainty (confidence intervals, bias warnings) actually increases trust with sophisticated research buyers.

Part 3 — Comparison & Roadmap

Options Compared

Option	Effort	Impact	Data Needs	Risk
A. RAG Personas	Medium	High	Census, BLS, surveys	Low
B. Multi-Agent	High	Very High	Existing persona data	Medium
C. Calibration	Medium	Very High	Real study partners	High
D. Real-Time Data	Medium	Medium	APIs, scraping infra	Medium
E. Statistical	Low	High	None (multi-run logic)	Low

Effort = engineering time to MVP. Impact = how much it improves research fidelity. Risk = dependency on external factors (partners, APIs, data access).

Part 3 — Comparison & Roadmap

Recommended Phased Approach

Build credibility first, then capability, then differentiation.

Phase 1: Foundation (Weeks 1-4)

B. Multi-Agent Discussions — This is the single biggest unlock. Moving from pre-scripted to dynamic conversations transforms the product from a demo into an actual research tool. Use existing persona data.

E. Statistical Validation — Low effort, high credibility gain. Run multiple sessions per brief and aggregate results. Can ship alongside multi-agent.

Phase 2: Grounding (Weeks 5-10)

A. RAG-Grounded Personas — Now that discussions are dynamic, make personas statistically representative. Build the embedding pipeline and citation system.

D. Real-Time Data — Add live market context to agent prompts. Personas become time-aware.

Phase 3: Validation (Weeks 11-16+)

C. Calibration — Requires real research partners. Start collecting calibration data from Phase 1. Full pipeline when enough data exists. This is the long game for enterprise credibility.

Part 3 — Comparison & Roadmap

Target Architecture

How all improvements fit together.

Research Brief

↓

RAG Persona Gen
Option A

Live Market Data
Option D

↓

Multi-Agent Discussion Engine
Option B: Moderator + N persona agents

↓

Statistical Validation
Option E

Calibration Layer
Option C

↓

Validated Research Report
Grounded findings + confidence intervals + bias warnings + citations

Each layer adds a measurable improvement to research fidelity. They compose — RAG personas performing better in multi-agent discussions with live data produces fundamentally different output than any single improvement alone.

Part 3 — Summary

Key Takeaways

1. The foundation is strong

50+ structured fields per persona, psychological contradictions, longitudinal memory, brand relationship topology. This data model is genuinely deeper than most competitors.

2. The critical gap is execution

The data exists but isn't being used by an LLM at runtime. Moving from pre-scripted to multi-agent discussions is the single highest-impact change.

3. Credibility requires honesty

Statistical validation and calibration are less exciting than AI agents, but they're what turns "interesting demo" into "tool a researcher would trust." Show uncertainty to earn trust.

4. The moat is the data pipeline

Anyone can prompt an LLM to role-play a persona. RAG grounding + calibration against real studies + statistical validation creates a defensible system that improves with every use.

Phase 1: Multi-Agent + Stats Phase 2: RAG + Live Data Phase 3: Calibration