Back to Resource Hub Article

Audio surveillance of MADRS interviews: what 3,736 ratings reveal

Jun 19, 2026

Article

Participant dropout strategies in anti-obesity medication clinical trials

Audio-digital surveillance of site-based MADRS interviews is a validated strategy for quality assurance in MDD clinical trials. Across 3,736 paired ratings from 5 Phase II/III studies, site-independent raters achieved an intraclass correlation of 0.947 with site-based scores, and blinded remote ratings predicted treatment response with 91.2% accuracy.

What this article covers

Across 3,736 paired MADRS interviews in 5 MDD trials, site-independent ratings correlated at ICC 0.947 with site-based scores, confirming audio surveillance as a reliable quality assurance method.
Only 6.7% of paired interviews showed scoring deviations greater than 6 points, and telephone remediation resolved the majority of those discordances in subsequent visits.
Rater outliers were identifiable through paired scoring patterns. Remediation improved concordance in almost every case; three raters were removed when improvement was not achievable.
Blinded remote ratings correctly predicted site-based treatment response in 91.2% of 215 paired baseline-to-endpoint cases, suggesting a potential secondary role beyond quality assurance.
The full peer-reviewed analysis, published in Contemporary Clinical Trials Communications (2019), contains item-level ICC data, visit-type breakdowns, and interview length effects not covered in this summary.

Why rater quality failure has consequences for MDD endpoints

For a participant in an MDD trial, the rating administered at each visit is not a administrative task. It is the primary record of whether the treatment is working. When a rater rushes an interview, misapplies a scoring convention, or inflates severity to meet eligibility criteria, that participant's data no longer reflects their clinical reality. Their contribution to the study is compromised before the visit ends.

Rater inconsistency in psychiatry trials is well-documented. An analysis of 63 published papers found that few examined the reliability of ratings conducted during a study (Kobak et al., 2002). Inter-rater scoring differences in MADRS assessments arise from clinical judgement variation, incomplete interview technique, and failure to apply instrument conventions consistently. Without a surveillance mechanism, these errors accumulate undetected across sites and visits.

This is the gap that audio-digital surveillance is designed to address. The FDA's guidance on clinical outcome assessments in drug development has consistently emphasized the importance of rater training, qualification, and ongoing monitoring as safeguards for endpoint integrity. That expectation applies to trials in design today, not only to studies already underway.

What the paired scoring data shows about site-based rater performance

Sponsors have historically been cautious about surveillance programs, for defensible reasons. Adding a remote scoring layer requires infrastructure, introduces a new class of rater, and raises questions about how discordances should be handled operationally without compromising blinding.

Targum and Catania address these barriers directly. The analysis, conducted using data from 5 Phase II/III MDD studies run between 2009 and 2017 under vendor grants to Bracket Global LLC (now part of Signant Health), examined 3,736 MADRS interviews scored in parallel by 397 certified site-based raters and 42 site-independent raters. The site-independent raters were blinded to visit type, trial site, and site-based scores.

The overall ICC was 0.947. Scoring deviations greater than 6 points occurred in only 6.7% of interviews. Total MADRS severity affected the direction of deviations systematically: site-based raters scored higher-severity participants (MADRS 40 or above) higher than site-independent raters, and lower-severity participants (MADRS below 20) lower. This pattern held across all 5 studies.

Telephone remediation of identified outliers produced improved concordance in almost every case. Three raters were removed when remediation did not succeed. The predictive accuracy of blinded remote scores for treatment response, 91.2% across 215 paired cases, raises a further question the authors identify for future exploration: whether remote ratings could serve as a confirmatory measure in studies where functional unblinding is a risk.

Sponsors designing MDD studies with MADRS as a primary endpoint, and CRO program managers evaluating quality monitoring approaches, will find the full item-level analysis and visit-type breakdowns in the published paper at https://doi.org/10.1016/j.conctc.2019.100317.

"This analysis affirms the utility of audio-digital recording of site-based interviews as a surveillance strategy for quality assurance, monitoring and remediation. The high predictive value of blinded remote ratings to replicate site-based treatment outcomes may be useful to affirm primary site-based results when there is a potential of functional unblinding." - Steven D. Targum, MD, Scientific Director, Bracket Global LLC (at time of publication); Contemporary Clinical Trials Communications, 2019

Yes. Across 3,736 paired MADRS interviews in 5 MDD trials, audio-digital surveillance achieved an overall ICC of 0.947 between site-based and site-independent raters. Rater outliers identified through scoring deviation patterns showed improved concordance following telephone remediation in almost every case. Three raters were removed when improvement was not achieved. (Targum & Catania, 2019)

Evidence from 5 MDD trials indicates yes. Blinded site-independent MADRS scores, derived from audio recordings of site-based interviews, correctly predicted site-based treatment response in 91.2% of 215 paired baseline-to-endpoint cases. The authors note this raises the possibility of using remote ratings as a confirmatory measure where functional unblinding is a concern. (Targum & Catania, 2019)

AUTHOR BIO

Name: Steven D. Targum
Title and Credentials: MD, Psychiatrist
Bio: Steven D. Targum, MD, is a psychiatrist with extensive experience in CNS clinical trial methodology, rater training, and endpoint quality assurance. He has published a trial design across depression, schizophrenia, and bipolar disorder. This analysis was conducted using data from 5 Phase II/III MDD studies conducted under vendor grants to Bracket Global LLC, a predecessor organization within the current Signant Health group.

READ THE ARTICLE

Designing ePRO implementation for an oncology program and working through device, modality, and usability strategy? Speak to a Signant eCOA scientist about oncology-specific ePRO design, BYOD suitability, and implementation best practices for cancer clinical trials.

Clinical Data Collection

Clinical IP Management

Patient Engagement

Clinical Data and Analytics

Clinical Consulting

Applications For

Audio surveillance of MADRS interviews: what 3,736 ratings reveal

What this article covers

Does audio surveillance of MADRS interviews improve rater concordance in MDD trials?

Can blinded remote MADRS ratings predict treatment response in MDD studies?

AUTHOR BIO

Clinical Data Collection

Clinical IP Management

Patient Engagement

Clinical Data and Analytics

Clinical Consulting

Applications For

Audio surveillance of MADRS interviews: what 3,736 ratings reveal

What this article covers

Does audio surveillance of MADRS interviews improve rater concordance in MDD trials?

Can blinded remote MADRS ratings predict treatment response in MDD studies?

AUTHOR BIO

Explore Our Content

Why Do Mood Disorder Trials Enroll the Wrong Participants?

PROs for Early Phase Oncology Dose Selection: Full Research

Reasons CNS Trials Fail to Separate Drug from Placebo

Get notified on new marketing insights