Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error: reducing workload in a preclinical review of animal studies and reducing human screening error

Alexandra Bannach-Brown, Piotr Przybyła, James Thomas, Andrew S. C. Rice, Sophia Ananiadou, Jing Liao, Malcolm Robert Macleod

Research output: Contribution to journalReview articleResearchpeer-review

2 Citations (Scopus)
53 Downloads (Pure)

Abstract

Background
Here, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review.

Methods
We applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis).

Results
ML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm.

Conclusions
This work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology.
Original languageEnglish
Article number23
Number of pages12
JournalSystematic Reviews
Volume8
Issue number1
DOIs
Publication statusPublished - 15 Jan 2019

Fingerprint

Workload
Machine Learning
Sensitivity and Specificity
Automation
Animal Models
Learning
Depression

Cite this

@article{df9aa11c1ebb4625babcf54aef9fe363,
title = "Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error: reducing workload in a preclinical review of animal studies and reducing human screening error",
abstract = "BackgroundHere, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review.MethodsWe applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95{\%} sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis).ResultsML approaches reached 98.7{\%} sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2{\%}. The highest level of specificity reached was 86{\%}. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3{\%} improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm.ConclusionsThis work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology.",
author = "Alexandra Bannach-Brown and Piotr Przybyła and James Thomas and Rice, {Andrew S. C.} and Sophia Ananiadou and Jing Liao and Macleod, {Malcolm Robert}",
year = "2019",
month = "1",
day = "15",
doi = "10.1186/s13643-019-0942-7",
language = "English",
volume = "8",
journal = "Systematic Reviews",
issn = "2046-4053",
publisher = "BMC",
number = "1",

}

Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error : reducing workload in a preclinical review of animal studies and reducing human screening error. / Bannach-Brown, Alexandra; Przybyła, Piotr; Thomas, James; Rice, Andrew S. C.; Ananiadou, Sophia; Liao, Jing; Macleod, Malcolm Robert.

In: Systematic Reviews, Vol. 8, No. 1, 23, 15.01.2019.

Research output: Contribution to journalReview articleResearchpeer-review

TY - JOUR

T1 - Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error

T2 - reducing workload in a preclinical review of animal studies and reducing human screening error

AU - Bannach-Brown, Alexandra

AU - Przybyła, Piotr

AU - Thomas, James

AU - Rice, Andrew S. C.

AU - Ananiadou, Sophia

AU - Liao, Jing

AU - Macleod, Malcolm Robert

PY - 2019/1/15

Y1 - 2019/1/15

N2 - BackgroundHere, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review.MethodsWe applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis).ResultsML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm.ConclusionsThis work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology.

AB - BackgroundHere, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review.MethodsWe applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis).ResultsML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm.ConclusionsThis work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology.

UR - https://doi.org/10.1101/255760

U2 - 10.1186/s13643-019-0942-7

DO - 10.1186/s13643-019-0942-7

M3 - Review article

VL - 8

JO - Systematic Reviews

JF - Systematic Reviews

SN - 2046-4053

IS - 1

M1 - 23

ER -