Novoic Announces AccuRater™: Using Clinical AI to Automate Central Rating and Central Review in CNS Trials
Today Novoic announces AccuRater™, an AI-based system for fully automated rating and review of clinical assessments, for use in CNS clinical trials and clinical care. Data presented today on the main stage at the Clinical Trial in Alzheimer’s Disease (CTAD) conference in Boston demonstrated performance at or above the level of trained human raters, in rating common performance based outcome assessments. AccuRater™ is another step towards achieving Novoic’s mission of providing gold-standard brain assessment to a billion people.
Rating of clinical trial outcomes is time consuming and errorful
CNS clinical trials rely on clinical assessments as outcome measures. These assessments are typically manually administered and scored (“rated”) by trained human raters. But human raters make unintentional (and sometimes intentional – e.g. score inflation to get a patient into a trial) errors, and there’s inter-rater differences. This introduces measurement error in study outcomes.
Current solutions to this include Central Review and Central Rating, where an independent set of humans review audio recordings of the clinical assessment administrations and make an independent assessment. But this approach is slow and costly, and even the largest well-funded studies cannot do Central Review on 100% of assessments due to cost. For smaller trials and biotech companies, cost is even more prohibitive.
Enabling broad access to gold standard assessment
In clinical care the problem is even more pronounced. Even in the most prosperous countries, there are far too few trained neuropsychologists and neurologists: Twenty U.S. states have been termed "dementia neurology deserts," which means that they are projected to have fewer than 10 neurologists per 10,000 people with dementia in 2025 (source). Primary care physicians typically don’t have either the time or the expertise to administer the more advanced clinical assessments required for a high-quality diagnosis. This is a major barrier to broad and equitable access to healthcare and prevention, even in the richest countries – and even more so in developing countries.
For example, in Alzheimer’s disease, in the US today the detection rate of MCI has been reported to be only around 8% (source). Put differently, 7.4m of 8m (92%) expected MCI cases in the US are undetected. 1 in 5 born today is expected to develop Alzheimer’s disease, with the estimated health care costs in the US of Alzheimer’s today being $305 billion, projected to rise to >$1.1 trillion in 2050 (source).
We believe to address this problem at the scale required, and at a cost that will not break healthcare systems globally, will require technological breakthroughs to automate much of this time consuming process. Novoic’s first product, Storyteller, was an example of this – as the first test shown to be able to detect early signs of Alzheimer’s disease, that could be self-administered in about 10 minutes at home, without the need for a trained clinician.
A New Paradigm: AccuRater™ Automated Rating Using Clinical AI
Building on this work, Novoic has been developing an advanced, general-purpose system for automating rating and review of clinical assessments called AccuRater™. AccuRater™ can perform ratings directly from raw audio recordings of clinical assessments.
It uses Clinical AI to automatically transcribe and rate these recordings, to provide an output analogous to what a human rater would do. A key design constraint for the AccuRater™ system was that it didn’t only have to have high intra-rater reliability – and be faster and cheaper than human rating – it also had to provide ratings of a similar quality.
A common saying about predictive models is “garbage in, garbage out”. Making sure that the input to the AI system is of highest quality possible is a key part of the quality we’ve been able to achieve with AccuRater™ Rating. One important part of solving this problem was developing better audio and text processing to produce high-quality, segmented transcripts of the audio recording. Most prior clinical work involving speech analysis rely on either manual transcription (which is costly and slow), or the ASR models from large cloud computing providers – but we’ve found the quality inadequate, and have instead developed a custom clinical transcription model.
In other work presented at CTAD, we compared Novoic’s Clinical Automated Transcription with Google Speech-To-Text:
|Participant||Manual transcription||Off-the-shelf automated transcription (Google)||Custom automated transcription|
|Male, age 67, cognitively unimpaired||“Maria lives in Glasgow in Scotland and runs a pottery shop. Over time, the footfall in that part of Glasgow had reduced. The number of visitors have- to her shop had gone down”||“Maria lives in Plaza, sculpture and runs the country show. Over time, the footfall in that part of guys go had reduced the number of visitors, have to her shop and gone down”||“Maria lives in Glasgow, in Scotland, and runs a pottery shop. Over time, the footfall in that part of Glasgow had reduced. The number of visitors had... to her shop had gone down”|
|Female, age 77, cognitively unimpaired||“Um, Glasgow. This lady was from Glasgow and I've lost her- her name. But she- she had a- a pottery store making cups and other things in ceramics, stoneware, and, um, other pottery.”||“Glasgow. This lady was from Glasgow and I've lost her and her name, but she she had a pottery store making cups , and other things in ceramic stoneware and other Pottery”||“Um... Glasgow. This lady was from Glasgow and I've lost her, her name. But she, she had a, a pottery store making cups and other things in ceramics, stoneware, and, um, other pottery.”|
|Male, age 77, MCI||“Um, Maria of Glasgow had a pottery shop, um, selling coffee. Ah, um, I can't remember that bit. Uh, I think she started giving them away, the cups or se- selling the cups and, uh, she- she probably start to giving them away”||“Maria of Glasgow had a pottery shop selling coffee. I can remember that bit. I think she started giving them away the cops or selling the cops and probably started giving them away”||“Uh, Maria of Glasgow had a pottery shop, um, selling coffee. Um... Um... I can remember that bit. Uh, I think she started giving them away, the cups, or sell-selling the cups and, uh, she probably started giving them away”|
|Female, age 57, MCI||“Allison ma- likes making pottery. She did different kind of pottery and she was doing very well. She did ceramics, porcelain, and there is a third one that she did. Ceramic, porcelain.”||“Alison make likes making Pottery. She did different kind of poetry and she was doing very well. She did Sarah makes puzzling. and, There is a third one that she did ceramic puzzling.”||“Alison m-m-makes... likes making pottery. She did different kind of pottery and she was doing very well. She did ceramics, um, porcelain, and, um, there is a third one that she did, ceramic porcelain.”|
The custom transcription system produces higher quality transcripts, with minimal nonsensical errors (unlike ASR systems from large cloud computing providers), and better captures disfluencies and pauses.
This system builds on Novoic’s expertise and prior work combining large scale AI models, in particular discriminative and generative language models (such as our work in ACL 2021– the first model in history shown to be able to predict underlying Alzheimer’s pathology in early stage patients purely from speech). Large general-purpose AI models can be powerful tools, but are on their own inadequate for high quality clinical applications, as has been broadly reported by others. Consumer applications of large language models have opened the eyes of many to the capabilities of these models but also their limitations: hallucinations, lack of output verification, lack of quality assurance.
Underlying AccuRater™ is an advanced software infrastructure that addresses these issues, leveraging AI models as part of some processing pipelines, but combining that with clinical domain knowledge, classic natural language processing (NLP), and an extensive set of systems for enforcing robustness, verifying outputs, assuring quality, and provide explainability.
In our work we found that neither the most advanced AI models, classic NLP methods, or hardcoded algorithms – on their own – could solve the problem of automated rating to our quality standards. Only a clinically-informed meta-system, exploiting all these methods could.
Validation against committee ratings on common performance based outcome assessments
Data presented today on the main stage at the Clinical Trial in Alzheimer’s Disease (CTAD) conference in Boston evaluated the performance of AccuRater™, as compared to trained human raters, in rating common performance based outcome assessments. AccuRater™ demonstrated performance at or above the level of trained human raters in rating common performance based outcome assessments.
One of the evaluated tests was the Weschler Logical Memory Test, a story recall task where the rater need to score what elements of a story a patient correctly remembers.
AccuRater™ performed statistically significantly better than highly trained human raters, as compared to ground truth committee ratings. The correlation coefficient between site raters with committee ratings was 0.937 (95% CI=0.916-0.953), for AccuRater™ it was 0.982 (95% CI=0.976-0.987).
Application, implementation and use cases – CNS Trials
The general-purpose nature of the system underlying AccuRater™ means it can be deployed for most existing clinical assessments, with broad language availability. The system is HIPAA compliant, available via API, and can be integrated with existing software systems, such as eCOA platforms.
Use cases for AccuRater™ include:
- Automated Quality Assurance, for example as part of Database Monitoring.
- Lower-cost alternative or supplement to Central Review, allowing users to get 100% data coverage.
- Automated Central Rating during screening.
Scaling access to gold standard assessment to a billion people
More broadly, AccuRater™ is a step towards being able to provide broad scalable access to gold standard clinical assessment, that can be deployed cheaply at scale. We think this will be needed to provide broad and equitable access, and today we’re taking another step towards that.
If you want to learn more, “Book Demo” (top right hand corner) on our website (https://novoic.com).