How are editorial classification labels developed and what insight do they provide?
Rather than taking a snapshot view of clinical trials, we track thousands of daily changes across multiple data sources to find the most crucial updates. Each of those events are classified and ranked by noteworthy criteria defined by STAT journalists and other expert sources.
By deriving and aggregating proprietary events, and continuously looking at these new data points longitudinally, event detection algorithms are able to surface trends that provide an up-to-date view of emerging activity that others may miss.
“The new tool gathers vital information and context on clinical trials far more efficiently than I can do myself — and the algorithm’s ability to surface otherwise invisible trends is truly remarkable,” said STAT News Kate Sheridan, one of the journalists who contributed to the effort.
For example, consider these two trials below, both with recent status updates to 'Recruiting':
You’ll see that the update to the same status in these trials received two different classifications. That’s due to the editorial considerations of the trials’ patterns.
The first trial, experiencing a ‘Recruitment Strategy’ event, changed from ‘Enrolling by Invitation’ to ‘Recruiting.’ A change from a specific, invitation-based strategy to a broader, more general recruiting strategy may indicate that the trial is struggling with enrollment goals, or that they've expanded their intended population to a larger group.
The second trial, experiencing a ‘Re-Entering Recruitment’ event, changed from ‘Active, not recruiting’ to ‘Recruiting.’ This may indicate that, thus far, the data collected isn’t substantive enough to prove statistically significant efficacy. In this case, in the hopes of improving these metrics, the trial is enrolling an increased number of patients.
This is how we use aggregate patterns to assign editorial labels. Furthermore, we can mine unstructured data from clinical trial events to assign classification. For example, a trial stopping early can have different implications based on the reason it stopped:
Once we’ve defined the parameters of an editorial classification or signal to be surfaced, the models are trained to detect and classify these editorial definitions at scale on an ongoing basis.
How are editorial frameworks implemented at scale with real-time data?
How do we assign these labels on a consistent and reliable basis as new data comes in each day? Machine learning algorithms. We train our algorithms on domain-specific datasets through an iterative process with our expert network, so the algorithm can identify the nuances of events and their context in the landscape.
We strongly believe in transparency and human involvement in the implementation of artificial intelligence. For that reason, you’ll be seeing this new icon around the tool whenever something is machine-generated:
As part of the ongoing process of accuracy-training our algorithms and providing the ever-important human touch to AI, you will always have the opportunity to provide feedback on machine-generated classification. This will increase the accuracy of the system as well as better personalize the curation of your experience on the tool.
Where can I learn about the meaning of each type of update included on the platform?
Our team of data scientists and biotech journalists have published an exclusive, detailed guide, defining each event type and contextualizing for real-world use cases. This publication is only available for paid STAT Trials Pulse subscribers.