Funded by the Wellcome Trust

Digital Twins for Mental Health Clinician Training

STELLAR — Steering-Vector Enhanced LLM Agents for Realistic Digital Twins in Mental Health

AI-powered patient simulations using large language models enhanced with Conceptors — advanced steering vectors enabling precise control over psychiatric symptom presentations — for scalable, equitable clinician training worldwide.

New York University University of Pennsylvania Linguistic Data Consortium

The Problem & Our Approach

Mental health training faces a fundamental bottleneck: trainees need diverse clinical exposure, but access to patients with rare symptom combinations and cultural variations is limited by ethics, logistics, and cost. STELLAR addresses this through controllable digital patient twins.

Conceptor-Steered Digital Twins

We use Conceptors — a mathematical framework for manipulating internal LLM representations — to create patient simulations with precisely calibrated psychiatric symptoms across anxiety, depression, psychosis, and fear spectra. Unlike prompting, Conceptors enable continuous intensity control and composability via Boolean operations.

Philadelphia Neurodevelopmental Cohort

STELLAR leverages the PNC, a community sample of ~10,000 genotyped youth with structured clinical interviews, neurocognitive assessments, and neuroimaging. The recorded clinical interviews provide both ground-truth diagnostic labels and natural language exemplars of symptom expression.

Diagnostic Tool Robustness

Controlled digital twins serve as reproducible testbeds for evaluating emerging language-based diagnostic tools before clinical deployment — measuring demographic invariance and identifying bias across sex/gender, age, ethnicity, SES, and language proficiency.

Shareability & Global Reach

Once validated, symptom-specific Conceptors can be distributed to other institutions without retraining, accelerating adoption worldwide. All outputs — Conceptor libraries, code, and evaluation frameworks — will be released under open-source licenses.

Investigators & Collaborators

A collaboration uniting machine learning, psychiatry, speech science, and lived experience across NYU, Penn, and the Linguistic Data Consortium.

João Sedoc Lead PI

João Sedoc

New York University
ML researcher with expertise in Conceptor methods for bias mitigation and conversational AI in healthcare. Leads overall project coordination and Conceptor-based steering vector development.
Lead PI — UPenn & LDC

Neville Ryant

Linguistic Data Consortium
Leads all LDC annotation and assessment — manual transcription, human preference judgments, digital twin evaluation, and quality control across speech and language data.
Raquel Gur Local PI — Penn Medicine

Raquel Gur

University of Pennsylvania · Psychiatry
Founder of the Philadelphia Neurodevelopmental Cohort. Provides psychiatric expertise in symptom presentation and maintains connections with lived experience communities.
Sharath Chandra Guntuku Local PI — Penn Engineering

Sharath Chandra Guntuku

University of Pennsylvania · CIS
Directs the Computational Social Listening Lab. Leads integration of behavioral and social media-derived patterns into Conceptor development, clinical validation, and technical infrastructure.
Monica Calkins Co-PI

Monica Calkins

University of Pennsylvania · Psychiatry
Expert in research program management, psychiatric assessment, and the PNC clinical dataset. Co-implements the Lived Experience Integration Team and leads stakeholder surveys and focus groups.
Tyler Moore Co-PI

Tyler Moore

University of Pennsylvania · Psychiatry
Quantitative methodology expert. Establishes psychometric benchmarks for clinical fidelity, develops validation protocols, and optimizes Conceptor composition for realistic multi-dimensional presentations.
Co-PI

Sunghye Cho

Linguistic Data Consortium
Extracts speech-based digital biomarkers from clinical interview recordings and validates ASR outputs. Leads acoustic-linguistic integration for multimodal Conceptor development.
Co-PI

Rachel Gordon

Filmmaker & Data Scientist
Black filmmaker and former data scientist whose work centers stigma, race, motherhood, and mental health. Co-chairs the Lived Experience Integration Team, bringing deep expertise on barriers to care.
Collaborator

Mark Liberman

University of Pennsylvania · Linguistics
Christopher H. Browne Distinguished Professor of Linguistics and founding director of the Linguistic Data Consortium. Research spans phonetics, speech technology, and computational linguistics.
Ruben Gur Collaborator

Ruben Gur

University of Pennsylvania · Psychiatry
Director of the Brain Behavior Laboratory and Center for Neuroimaging in Psychiatry. Developed the computerized neurocognitive battery used in the PNC and deep phenotyping tools deployed worldwide.
Dominic A. Sisti Collaborator

Dominic A. Sisti

University of Pennsylvania · Medical Ethics & Health Policy
Directs the Scattergood Program for the Applied Ethics of Behavioral Health Care at Penn Medicine. Brings expertise in the ethics of mental health care, psychiatric classification, and equitable access to care — guiding the ethical framework for STELLAR's clinical deployment.

Foundational Work

Key publications by team members that underpin STELLAR's approach.

2025

Unique Signatures in Verbal Fluency Task Performance in Schizophrenia and Depression

Cho S, Cong Y, Mehta A, et al.
Schizophrenia Research: Cognition, 43, 100407
doi.org/10.1016/j.scog.2025.100407 →
2025

The Philadelphia Neurodevelopmental Cohort: Perspective, Lessons, and Future Directions

Gur RE, Calkins ME, Gur RC
Schizophrenia Bulletin, 51(4), 852–857
doi.org/10.1093/schbul/sbaf029 →
2024

Large Language Models Could Change the Future of Behavioral Healthcare: A Proposal for Responsible Development and Evaluation

Stade EC, Stirman SW, Ungar LH, et al.
npj Mental Health Research, 3(1), 12
doi.org/10.1038/s44184-024-00056-z →
2024

Key Language Markers of Depression on Social Media Depend on Race

Rai S, Stade EC, Giorgi S, et al.
Proceedings of the National Academy of Sciences, 121(14), e2319837121
doi.org/10.1073/pnas.2319837121 →
2024

Speech Markers of Depression Dimensions Across Cognitive Status

Soleimani L, Ouyang Y, Cho S, et al.
Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring, 16(3)
doi.org/10.1002/dad2.12604 →
2023

An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives

Cho Y, Rai S, Ungar L, Sedoc J, Guntuku SC
Proceedings of EMNLP 2023, 11346–11369
doi.org/10.18653/v1/2023.emnlp-main.698 →
2023

Conceptor-Aided Debiasing of Large Language Models

Li Y, Ungar L, Sedoc J
Proceedings of EMNLP 2023, 10703–10727
doi.org/10.18653/v1/2023.emnlp-main.661 →
2021

The Third DIHARD Diarization Challenge

Ryant N, Singh P, Krishnamohan V, et al.
Proceedings of Interspeech 2021, 3570–3574
doi.org/10.21437/interspeech.2021-1208 →
2021

Natural Language Processing Methods Are Sensitive to Sub-Clinical Linguistic Differences in Schizophrenia Spectrum Disorders

Tang SX, Kriz R, Cho S, et al.
npj Schizophrenia, 7(1), 25
doi.org/10.1038/s41537-021-00154-3 →
2019

Burden of Environmental Adversity Associated With Psychopathology, Maturation, and Brain Behavior Parameters in Youths

Gur RE, Moore TM, Rosen AFG, et al.
JAMA Psychiatry, 76(9), 966
doi.org/10.1001/jamapsychiatry.2019.0943 →
2019

The Second DIHARD Diarization Challenge: Dataset, Task, and Baselines

Ryant N, Church K, Cieri C, et al.
Proceedings of Interspeech 2019, 978–982
doi.org/10.21437/interspeech.2019-1268 →
2015

The Philadelphia Neurodevelopmental Cohort: Constructing a Deep Phenotyping Collaborative

Calkins ME, Merikangas KR, Moore TM, et al.
Journal of Child Psychology and Psychiatry, 56(12), 1356–1369
doi.org/10.1111/jcpp.12416 →

Open Positions

We're hiring across both sites. STELLAR is a two-year project spanning NLP, speech processing, clinical psychiatry, and human-centered AI. We're looking for people who are excited about building tools that make mental health training more equitable and scalable.

Postdoctoral Researcher — NLP & Steering Vectors

NYU or Penn

Develop and validate Conceptor-based steering methods for LLMs. Strong background in representation learning, mechanistic interpretability, or activation engineering. Experience with clinical NLP a plus.

Research Scientist — Speech & Multimodal AI

Penn / Linguistic Data Consortium

Work on acoustic biomarker extraction, speaker diarization, and multimodal Conceptor integration from clinical interview recordings. Experience with wav2vec, Whisper, or clinical speech analysis preferred.

Research Engineer — Platform Development

NYU or Penn

Build the digital twin training platform — integrating Conceptor-steered LLMs with avatar rendering (SoulMachines), session recording, and automated feedback systems. Full-stack experience with ML deployment.

Research Coordinator — Clinical Evaluation

University of Pennsylvania

Coordinate stakeholder surveys, focus groups, and the trainee-tool-trainer implementation readiness study. Experience with IRB protocols, mixed-methods research, and mental health populations.

Interested? Fill out this brief interest form with your CV and a note on what draws you to this work. We'll be in touch.