Internal Technical Review
Synilly: Data Grounding & Improvement Paths
How our synthetic personas are currently grounded, where the gaps are, and what we can build to close them.
50 Personas
5 Improvement Options
Phased Roadmap
Swipe or tap arrows to navigate
Part 1 — Current System
Architecture Overview
The end-to-end flow from research brief to final report.
Research Brief
Topic, objectives, target audience
↓
Panel Matching
Relevance scoring against 50 personas
↓
Pre-Session Briefing
Study history review, angle identification
↓
Discussion Simulation
4-phase moderated conversation
↓
Report Generation
Findings, confidence scores, recommendations
Key point: Every step currently uses pre-authored static data. No LLM calls happen during a session — the entire flow is deterministic and repeatable.
Part 1 — Current System
Persona Data Model
The MaintainedPersona interface — what we store per person.
interface MaintainedPersona {
id, name, age, avatar, occupation, location
memberSince, studyCount, consistencyScore
lastActiveDate, keyTraits[], communicationStyle
demographics: {
income, household, education,
career, pets, living
}
personality: {
summary, lifeStory,
keyLifeEvents[],
values[], contradictions[]
}
beliefs: {
brandRelationships[],
decisionMakingStyle,
techAttitude, priceProcessing,
mediaChannels[]
}
studyHistory: StudyRecord[]
}
50+
Total structured fields
Part 1 — Current System
Persona Depth: Maya vs Derek
Same interface, radically different behavior profiles.
Maya Chen, 31
Decision style: Emotional first, rational justification second
Price processing: Anchors to perceived value. $50/mo feels fine for "Mochi's health"
Contradiction: Claims to be skeptical of "premium" marketing but consistently buys premium
Consistency: 82%
Derek Thompson, 29
Decision style: Research-intensive with comparison spreadsheets. Needs 3-5 data points
Price processing: Price-insensitive when value is proven. Calculates cost-per-benefit ratios
Contradiction: Demands quantitative proof but bought a $3k espresso machine on vibes
Consistency: 91%
Why this matters: The same concept test question ("Would you pay $49/mo?") produces authentically different responses because each persona's grounding data creates distinct reasoning paths.
Part 1 — Current System
Panel Matching Flow
How we select the right personas for a study.
Research Brief Analyzed
Extract topic, product category, target demo
↓
Scan 50 Personas
Match against demographics, beliefs, brand history
↓
Score & Rank
relevanceScore (0-100) + matchReasons[]
↓
User Selects Panel
Review scores, approve/swap panelists
interface PanelCandidate {
persona: MaintainedPersona;
relevanceScore: number;
matchReasons: string[];
isSelected: boolean;
}
Filtering supports search, age range, location, traits, and sort by name/studies/consistency. All computed client-side against the full persona dataset.
Part 1 — Current System
Discussion Simulation
Pre-authored conversation with 4 phases and live UX affordances.
Warmup
Personas introduce themselves and their context. Establishes character voice.
Deep Dive
Probing questions on concerns, fears, desires. Emotional territory.
Concept Test
Present the product concept. Capture reactions, objections, pricing feedback.
Wrap-up
"One thing that would make you sign up?" Final asks for conviction testing.
UX simulation: Messages appear sequentially with typing indicators and variable delays (600-2000ms). Live "insights" sidebar updates as themes emerge. All pre-scripted — no LLM in the loop.
Part 1 — Current System
Longitudinal Consistency
Personas remember past studies and maintain behavioral coherence.
interface StudyRecord {
id: string;
date: string;
topic: string;
role: string;
keyTakeaways: string[];
notableQuotes: string[];
}
Maya Chen's Study History (4 studies)
- Oct '24: Dog food concept — flagged $49 as borderline, wanted guarantee
- Jan '25: Meal kit study — drew parallels to pet food subscription fatigue
Cross-study signal: Her price sensitivity was consistent across both studies, and she connected the two topics unprompted. Consistency score: 82%
Derek Thompson (5 studies, 91% consistency)
- Always demands published data / clinical evidence
- Always calculates ROI before committing
- Pet monitoring study: "If I can't export the data, what's the point?"
Part 1 — Current System
Report Generation
Structured findings with confidence scores, quotes, and prioritized recommendations.
Finding: Price sensitivity is #1 barrier
80% confidence
4/5 panelists flagged $49/mo as too high. Acceptable range: $29-39/mo.
"At $49 I'd try it once and probably cancel"
— Maya Chen, Enthusiast Owner
Finding: Trust requires proof, not promises
90% confidence
Every panelist demanded concrete evidence. Type of proof varies by segment.
Recommendation (High Priority)
Introduce a "Guidance Only" tier at $15-20/mo — separating nutritionist from food delivery.
Part 1 — Current System
What Makes It "Grounded"
The structural properties that prevent personas from being shallow caricatures.
50+ structured fields per persona
Not just age/income — life stories, values, contradictions, brand histories, media habits, communication styles.
Psychological contradictions
Each persona has 3+ built-in inconsistencies that mirror real human behavior. Maya is "premium-skeptical" but buys premium. Derek is "purely rational" but bought a $3k espresso machine on vibes.
Brand relationship topology
Loyal / lapsed / curious / hostile — not just "uses Brand X" but why and how they feel about it.
Study memory & consistency tracking
Cross-study behavioral coherence scored 0-100. Contradictions across studies are flagged, not hidden.
Life events with causal impact
Events have explicit impact statements: "Pixel's health scare → triggered deep dive into nutrition research."
Part 1 — Current System
Current Limitations
Honest assessment of where the system falls short.
1
Static personas. All 50 personas are hand-authored TypeScript objects. No generation, no variation, no adaptation to new domains.
2
Pre-scripted discussions. Every conversation is deterministic. Same brief = same discussion = same report. No emergent insights.
3
No real-time data. Personas can't reference current prices, trends, or market conditions. Their world is frozen at authoring time.
4
No statistical validation. Confidence scores are hand-assigned (80%, 90%). No sample size warnings, no margin of error, no bias detection.
5
No LLM reasoning. Zero AI calls during a session. The "AI-powered" claim currently refers to the concept, not the implementation.
6
Domain-locked. Personas are deeply grounded in pet products. Expanding to fintech or healthcare requires authoring 50 new personas from scratch.
Part 2 — Option A: RAG-Grounded Personas
Concept: Real Data In, Realistic Personas Out
Replace hand-authored personas with AI-generated ones grounded in real demographic data.
The Problem
Our 50 personas are plausible but not validated. Maya's $95k income, Portland location, and UX career were authored by a human guessing at realistic combinations. We have no way to know if this profile actually exists in meaningful numbers.
The Solution
Feed real data sources (census, BLS, market research) into an LLM persona generator. Each generated persona comes with citations back to the source data. Demographics are statistically representative, not imagined.
Data Sources
- US Census / ACS — demographics, income, geography
- Bureau of Labor Statistics — occupations, salary bands
- APPA National Pet Owners Survey — pet spending data
- Pew Research — media habits, tech adoption
Part 2 — Option A: RAG-Grounded Personas
Data Flow
From raw data to validated, citable personas.
Census API
BLS Data
APPA Survey
Pew Research
↓
Chunk & Embed
Split data into demographic facts, embed into vector DB
↓
Persona Generator (LLM)
System prompt + retrieved context = persona
↓
Validation Layer
Check: does income match occupation for this region?
↓
Validated Persona + Citations
Every field traceable to source data
Part 2 — Option A: RAG-Grounded Personas
System Prompt Example
How retrieved real-world data gets injected into persona generation.
const systemPrompt = `You are a persona generator for
user research. Generate a realistic persona using
ONLY the provided demographic data as grounding.
RETRIEVED CONTEXT:
- Portland, OR median income: $73,340 (ACS 2024)
- UX Designers, Portland: $85k-$110k (BLS)
- 67% of millennials own pets (APPA 2024)
- Pet owners 25-34 spend avg $1,480/yr (APPA)
RULES:
1. Income MUST fall within BLS range for occupation
2. Pet spending must align with APPA demographics
3. Every demographic field must cite its source
4. Personality can be creative but must not
contradict demographic grounding`;
const result = await llm({
system: systemPrompt,
prompt: "Generate: 30s UX designer, Portland, dog owner"
});
Part 2 — Option A: RAG-Grounded Personas
Before / After
Current: Hand-Authored
income: "~$95k"
education: "BFA + MFA"
monthlySpend: "~$180/mo"
- Plausible but unvalidated
- No citations
- Author's intuition
- Fixed at write time
RAG-Grounded
income: "$97k"
education: "BFA + MFA"
monthlySpend: "$123/mo"
- Statistically grounded
- Every field has a source
- Can regenerate for any domain
- Update when data updates
Key gain: Clients can ask "is this persona representative?" and we can point to real data instead of saying "we think so."
Part 2 — Option B: Multi-Agent Discussions
Concept: Personas as Independent Agents
Replace pre-scripted conversations with live multi-agent LLM discussions.
The Problem
Current discussions are fully deterministic. Same brief always produces the same conversation. No follow-up questions, no emergent insights, no ability to probe deeper on surprising responses.
The Solution
Each persona becomes an independent LLM agent with:
- A system prompt built from their full data model (50+ fields)
- Conversation memory (within session + cross-study)
- A Moderator Agent that manages turn-taking, probes, and phase transitions
What Changes
Same brief, different run = different conversation. Personas can disagree, build on each other's points, or reveal unexpected connections. The moderator can follow up on surprising responses in real time.
Part 2 — Option B: Multi-Agent Discussions
Architecture
Moderator orchestrating N persona agents with shared context.
Moderator Agent
Controls phase flow, selects next speaker, probes deeper
↓ assigns turn
Maya
Agent
Derek
Agent
Priya
Agent
Tom
Agent
↓ responses feed back
Shared Conversation Context
All agents see full discussion history
↓
Live Insight Extraction
Separate LLM call: theme detection, sentiment, contradictions
Part 2 — Option B: Multi-Agent Discussions
Agent System Prompt Construction
How persona data becomes agent behavior.
function buildAgentPrompt(p: MaintainedPersona) {
return `You are ${p.name}, ${p.age}, ${p.occupation}.
Location: ${p.location}
PERSONALITY: ${p.personality.summary}
LIFE STORY: ${p.personality.lifeStory}
CONTRADICTIONS (embody these naturally):
${p.personality.contradictions.map(c => '- ' + c).join('\n')}
BRAND RELATIONSHIPS:
${p.beliefs.brandRelationships.map(b =>
'- ' + b.brand + ': ' + b.relationship + ' — ' + b.note
).join('\n')}
COMMUNICATION STYLE: ${p.communicationStyle}
RULES:
1. Stay in character at all times
2. Reference your life events naturally
3. React to other panelists' statements
4. Your contradictions should surface organically
5. Never break character or acknowledge being AI`;
}
The full persona data model becomes the agent's behavioral constitution. Contradictions aren't bugs — they're features that make responses human-like.
Part 2 — Option B: Multi-Agent Discussions
Before / After
Current: Pre-Scripted
- 24 messages, fixed order
- Same output every time
- Can't follow up on surprises
- Host questions pre-authored
- No persona-to-persona reactions
- Insights are predetermined
Multi-Agent: Dynamic
- Variable length, organic flow
- Different output each run
- Moderator probes emergent themes
- Questions adapt to responses
- Personas react to each other
- Insights emerge from conversation
Example: What Multi-Agent Enables
Tom says "$49 is absurd." Maya responds: "Wait, I spend $50/mo on Farmer's Dog and it doesn't bother me... why does this feel different?" The moderator picks up on this and probes: "Maya, can you unpack that? What makes one $50 feel okay and another feel wrong?" This exchange can't happen in a pre-scripted system.
Part 2 — Option C: Calibration
Concept: Calibrate Against Real Research
Run synthetic studies alongside real ones. Measure divergence. Adjust.
The Problem
We have no way to measure if synthetic research outputs are accurate. Our confidence scores (80%, 90%) are hand-assigned, not empirically validated. A client has no reason to trust them over their own intuition.
The Solution
Partner with clients running real focus groups. Run the same brief through Synilly. Compare outputs. Build a calibration dataset that measures where synthetic research agrees with real research — and where it diverges.
What This Unlocks
- Empirically-backed confidence scores
- Known weak spots (e.g., "synthetic personas underestimate price sensitivity by 15%")
- Correction weights we can apply to future outputs
- A credibility story for enterprise sales
Part 2 — Option C: Calibration
Calibration Pipeline
How the feedback loop works.
Real Focus Group
5-8 real participants
Synilly Session
Same brief, synthetic panel
↓ both produce
Real Findings
Synthetic Findings
↓ compare
Delta Analysis
Theme overlap, sentiment alignment, price sensitivity match, missed insights
↓
Weight Adjustment
Update persona behavior parameters, adjust confidence calculation
Metric examples: Theme recall (did synthetic find all real themes?), sentiment polarity match, price band accuracy, recommendation overlap.
Part 2 — Option C: Calibration
Before / After
Current: Uncalibrated
finding: "Price is #1 barrier"
confidence: 80
basis: "4/5 panelists said so"
No empirical basis for the 80% number. Could be 50% or 95% in reality.
Calibrated
finding: "Price is #1 barrier"
confidence: 74
calibrationNote: "Synilly tends to
overweight price sensitivity by ~12%
vs real panels (n=23 studies)"
Confidence is earned, not declared. Known biases are disclosed upfront.
Trust equation: A tool that says "we're 74% confident, and here's why" is more trustworthy than one that says "we're 90% confident" with no backing.
Part 2 — Option D: Real-Time Data
Concept: Ground Responses in Live Data
Pull current market data, pricing, reviews, and social sentiment into persona responses.
The Problem
Our personas exist in a frozen world. Maya's opinions about The Farmer's Dog were written months ago. She can't reference a recent price increase, a viral TikTok controversy, or a competitor's new product launch.
The Solution
Before each discussion, pull live context relevant to the research topic:
- Pricing: Current subscription prices from competitor sites
- Reviews: Recent sentiment from Reddit, Amazon, Trustpilot
- Social: Trending conversations on TikTok, Instagram
- News: Relevant industry developments
Inject this as additional context into each agent's prompt. Maya doesn't just have opinions — she has opinions informed by what's actually happening right now.
Part 2 — Option D: Real-Time Data
Data Flow
Live context injection pipeline.
Research Brief
Topic: "Premium dog food subscription at $49/mo"
↓ extract entities
Reddit API
Price Scraper
News API
Social API
↓ aggregate
Market Context Document
Summarized, timestamped, source-linked
↓ inject into agent prompts
Grounded Agent Responses
Personas reference current data naturally
Per-persona filtering: Maya sees Instagram/TikTok content (her media channels). Derek sees Reddit/YouTube reviews (his channels). Each persona gets context filtered through their own media diet.
Part 2 — Option D: Real-Time Data
Before / After
Current: Static Context
"At $49 I'd try it for a month to see if Mochi likes it."
— Maya (pre-scripted, context-free)
No reference to current market. No awareness of competitor moves. Generic price reaction.
Real-Time Grounded
"I saw Farmer's Dog just raised to $52/mo and people on Reddit are furious. So $49 actually feels competitive now — but I've been looking at Spot & Tango since they dropped to $39."
— Maya (agent + live market data)
Response anchored to real pricing and real conversations.
Research value: The insight shifts from "people think $49 is a lot" (obvious) to "people think $49 is competitive vs Farmer's Dog but weak vs Spot & Tango" (actionable).
Part 2 — Option E: Statistical Validation
Concept: Add Statistical Rigor
Confidence intervals, sample size warnings, and bias detection.
The Problem
We present findings as if 5 synthetic personas constitute a valid sample. "4/5 panelists agree" sounds compelling, but N=5 has no statistical power. Experienced researchers will immediately question this.
The Solution
- Run multiple sessions: Same brief, 10 different panels of 5 = 50 data points
- Confidence intervals: "Price concern appeared in 78% of sessions (CI: 65-88%)"
- Sample size warnings: Flag when N is too low for claim strength
- Bias detection: Check if persona demographics over/underrepresent populations
- Sensitivity analysis: Would the finding change with different persona selection?
Key insight: The cost of running 10 synthetic sessions is trivial (minutes + API cost). We should exploit this advantage over real research where N is expensive.
Part 2 — Option E: Statistical Validation
Before / After
Current Output
title: "Price is #1 barrier"
confidence: 80
label: "4/5 panelists"
Single run, single panel, hand-scored confidence. No margin of error. No acknowledgment that N=5 is tiny.
Statistically Validated
title: "Price is #1 barrier"
confidence: 78
ci: [65, 88]
sessions: 10
totalPanelists: 50
biasWarning: "Panel skews
urban, high-income. Finding may
not hold for rural demographics."
Multi-run aggregation with honest uncertainty bounds.
The paradox: Showing less certainty (confidence intervals, bias warnings) actually increases trust with sophisticated research buyers.
Part 3 — Comparison & Roadmap
Options Compared
| Option |
Effort |
Impact |
Data Needs |
Risk |
| A. RAG Personas |
Medium |
High |
Census, BLS, surveys |
Low |
| B. Multi-Agent |
High |
Very High |
Existing persona data |
Medium |
| C. Calibration |
Medium |
Very High |
Real study partners |
High |
| D. Real-Time Data |
Medium |
Medium |
APIs, scraping infra |
Medium |
| E. Statistical |
Low |
High |
None (multi-run logic) |
Low |
Effort = engineering time to MVP. Impact = how much it improves research fidelity. Risk = dependency on external factors (partners, APIs, data access).
Part 3 — Comparison & Roadmap
Recommended Phased Approach
Build credibility first, then capability, then differentiation.
Phase 1: Foundation (Weeks 1-4)
B. Multi-Agent Discussions — This is the single biggest unlock. Moving from pre-scripted to dynamic conversations transforms the product from a demo into an actual research tool. Use existing persona data.
E. Statistical Validation — Low effort, high credibility gain. Run multiple sessions per brief and aggregate results. Can ship alongside multi-agent.
Phase 2: Grounding (Weeks 5-10)
A. RAG-Grounded Personas — Now that discussions are dynamic, make personas statistically representative. Build the embedding pipeline and citation system.
D. Real-Time Data — Add live market context to agent prompts. Personas become time-aware.
Phase 3: Validation (Weeks 11-16+)
C. Calibration — Requires real research partners. Start collecting calibration data from Phase 1. Full pipeline when enough data exists. This is the long game for enterprise credibility.
Part 3 — Comparison & Roadmap
Target Architecture
How all improvements fit together.
Research Brief
↓
RAG Persona Gen
Option A
Live Market Data
Option D
↓
Multi-Agent Discussion Engine
Option B: Moderator + N persona agents
↓
Statistical Validation
Option E
Calibration Layer
Option C
↓
Validated Research Report
Grounded findings + confidence intervals + bias warnings + citations
Each layer adds a measurable improvement to research fidelity. They compose — RAG personas performing better in multi-agent discussions with live data produces fundamentally different output than any single improvement alone.
Part 3 — Summary
Key Takeaways
1. The foundation is strong
50+ structured fields per persona, psychological contradictions, longitudinal memory, brand relationship topology. This data model is genuinely deeper than most competitors.
2. The critical gap is execution
The data exists but isn't being used by an LLM at runtime. Moving from pre-scripted to multi-agent discussions is the single highest-impact change.
3. Credibility requires honesty
Statistical validation and calibration are less exciting than AI agents, but they're what turns "interesting demo" into "tool a researcher would trust." Show uncertainty to earn trust.
4. The moat is the data pipeline
Anyone can prompt an LLM to role-play a persona. RAG grounding + calibration against real studies + statistical validation creates a defensible system that improves with every use.
Phase 1: Multi-Agent + Stats
Phase 2: RAG + Live Data
Phase 3: Calibration