Better duplicate detection for systematic reviewers: Evaluation of Systematic Review Assistant-Deduplication Module

John Rathbone, Matt Carter, Tammy Hoffmann, Paul Glasziou

Research output: Contribution to journalArticleResearchpeer-review

28 Citations (Scopus)
41 Downloads (Pure)

Abstract

Background: A major problem arising from searching across bibliographic databases is the retrieval of duplicate citations. Removing such duplicates is an essential task to ensure systematic reviewers do not waste time screening the same citation multiple times. Although reference management software use algorithms to remove duplicate records, this is only partially successful and necessitates removing the remaining duplicates manually. This time-consuming task leads to wasted resources. We sought to evaluate the effectiveness of a newly developed deduplication program against EndNote. Methods: A literature search of 1,988 citations was manually inspected and duplicate citations identified and coded to create a benchmark dataset. The Systematic Review Assistant-Deduplication Module (SRA-DM) was iteratively developed and tested using the benchmark dataset and compared with EndNote's default one step auto-deduplication process matching on ('author', 'year', 'title'). The accuracy of deduplication was reported by calculating the sensitivity and specificity. Further validation tests, with three additional benchmarked literature searches comprising a total of 4,563 citations were performed to determine the reliability of the SRA-DM algorithm. Results: The sensitivity (84%) and specificity (100%) of the SRA-DM was superior to EndNote (sensitivity 51%, specificity 99.83%). Validation testing on three additional biomedical literature searches demonstrated that SRA-DM consistently achieved higher sensitivity than EndNote (90% vs 63%), (84% vs 73%) and (84% vs 64%). Furthermore, the specificity of SRA-DM was 100%, whereas the specificity of EndNote was imperfect (average 99.75%) with some unique records wrongly assigned as duplicates. Overall, there was a 42.86% increase in the number of duplicates records detected with SRA-DM compared with EndNote auto-deduplication. Conclusions: The Systematic Review Assistant-Deduplication Module offers users a reliable program to remove duplicate records with greater sensitivity and specificity than EndNote. This application will save researchers and information specialists time and avoid research waste. The deduplication program is freely available online.

Original languageEnglish
Article number6
JournalSystematic Reviews
Volume4
Issue number1
DOIs
Publication statusPublished - 14 Jan 2015

Fingerprint

Benchmarking
Sensitivity and Specificity
Bibliographic Databases
Information Services
Software
Research Personnel
Research
Datasets

Cite this

@article{8e14fbfc3a304ebdb6d2c1ccbe8bf5a8,
title = "Better duplicate detection for systematic reviewers: Evaluation of Systematic Review Assistant-Deduplication Module",
abstract = "Background: A major problem arising from searching across bibliographic databases is the retrieval of duplicate citations. Removing such duplicates is an essential task to ensure systematic reviewers do not waste time screening the same citation multiple times. Although reference management software use algorithms to remove duplicate records, this is only partially successful and necessitates removing the remaining duplicates manually. This time-consuming task leads to wasted resources. We sought to evaluate the effectiveness of a newly developed deduplication program against EndNote. Methods: A literature search of 1,988 citations was manually inspected and duplicate citations identified and coded to create a benchmark dataset. The Systematic Review Assistant-Deduplication Module (SRA-DM) was iteratively developed and tested using the benchmark dataset and compared with EndNote's default one step auto-deduplication process matching on ('author', 'year', 'title'). The accuracy of deduplication was reported by calculating the sensitivity and specificity. Further validation tests, with three additional benchmarked literature searches comprising a total of 4,563 citations were performed to determine the reliability of the SRA-DM algorithm. Results: The sensitivity (84{\%}) and specificity (100{\%}) of the SRA-DM was superior to EndNote (sensitivity 51{\%}, specificity 99.83{\%}). Validation testing on three additional biomedical literature searches demonstrated that SRA-DM consistently achieved higher sensitivity than EndNote (90{\%} vs 63{\%}), (84{\%} vs 73{\%}) and (84{\%} vs 64{\%}). Furthermore, the specificity of SRA-DM was 100{\%}, whereas the specificity of EndNote was imperfect (average 99.75{\%}) with some unique records wrongly assigned as duplicates. Overall, there was a 42.86{\%} increase in the number of duplicates records detected with SRA-DM compared with EndNote auto-deduplication. Conclusions: The Systematic Review Assistant-Deduplication Module offers users a reliable program to remove duplicate records with greater sensitivity and specificity than EndNote. This application will save researchers and information specialists time and avoid research waste. The deduplication program is freely available online.",
author = "John Rathbone and Matt Carter and Tammy Hoffmann and Paul Glasziou",
year = "2015",
month = "1",
day = "14",
doi = "10.1186/2046-4053-4-6",
language = "English",
volume = "4",
journal = "Systematic Reviews",
issn = "2046-4053",
publisher = "BMC",
number = "1",

}

Better duplicate detection for systematic reviewers : Evaluation of Systematic Review Assistant-Deduplication Module. / Rathbone, John; Carter, Matt; Hoffmann, Tammy; Glasziou, Paul.

In: Systematic Reviews, Vol. 4, No. 1, 6, 14.01.2015.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Better duplicate detection for systematic reviewers

T2 - Evaluation of Systematic Review Assistant-Deduplication Module

AU - Rathbone, John

AU - Carter, Matt

AU - Hoffmann, Tammy

AU - Glasziou, Paul

PY - 2015/1/14

Y1 - 2015/1/14

N2 - Background: A major problem arising from searching across bibliographic databases is the retrieval of duplicate citations. Removing such duplicates is an essential task to ensure systematic reviewers do not waste time screening the same citation multiple times. Although reference management software use algorithms to remove duplicate records, this is only partially successful and necessitates removing the remaining duplicates manually. This time-consuming task leads to wasted resources. We sought to evaluate the effectiveness of a newly developed deduplication program against EndNote. Methods: A literature search of 1,988 citations was manually inspected and duplicate citations identified and coded to create a benchmark dataset. The Systematic Review Assistant-Deduplication Module (SRA-DM) was iteratively developed and tested using the benchmark dataset and compared with EndNote's default one step auto-deduplication process matching on ('author', 'year', 'title'). The accuracy of deduplication was reported by calculating the sensitivity and specificity. Further validation tests, with three additional benchmarked literature searches comprising a total of 4,563 citations were performed to determine the reliability of the SRA-DM algorithm. Results: The sensitivity (84%) and specificity (100%) of the SRA-DM was superior to EndNote (sensitivity 51%, specificity 99.83%). Validation testing on three additional biomedical literature searches demonstrated that SRA-DM consistently achieved higher sensitivity than EndNote (90% vs 63%), (84% vs 73%) and (84% vs 64%). Furthermore, the specificity of SRA-DM was 100%, whereas the specificity of EndNote was imperfect (average 99.75%) with some unique records wrongly assigned as duplicates. Overall, there was a 42.86% increase in the number of duplicates records detected with SRA-DM compared with EndNote auto-deduplication. Conclusions: The Systematic Review Assistant-Deduplication Module offers users a reliable program to remove duplicate records with greater sensitivity and specificity than EndNote. This application will save researchers and information specialists time and avoid research waste. The deduplication program is freely available online.

AB - Background: A major problem arising from searching across bibliographic databases is the retrieval of duplicate citations. Removing such duplicates is an essential task to ensure systematic reviewers do not waste time screening the same citation multiple times. Although reference management software use algorithms to remove duplicate records, this is only partially successful and necessitates removing the remaining duplicates manually. This time-consuming task leads to wasted resources. We sought to evaluate the effectiveness of a newly developed deduplication program against EndNote. Methods: A literature search of 1,988 citations was manually inspected and duplicate citations identified and coded to create a benchmark dataset. The Systematic Review Assistant-Deduplication Module (SRA-DM) was iteratively developed and tested using the benchmark dataset and compared with EndNote's default one step auto-deduplication process matching on ('author', 'year', 'title'). The accuracy of deduplication was reported by calculating the sensitivity and specificity. Further validation tests, with three additional benchmarked literature searches comprising a total of 4,563 citations were performed to determine the reliability of the SRA-DM algorithm. Results: The sensitivity (84%) and specificity (100%) of the SRA-DM was superior to EndNote (sensitivity 51%, specificity 99.83%). Validation testing on three additional biomedical literature searches demonstrated that SRA-DM consistently achieved higher sensitivity than EndNote (90% vs 63%), (84% vs 73%) and (84% vs 64%). Furthermore, the specificity of SRA-DM was 100%, whereas the specificity of EndNote was imperfect (average 99.75%) with some unique records wrongly assigned as duplicates. Overall, there was a 42.86% increase in the number of duplicates records detected with SRA-DM compared with EndNote auto-deduplication. Conclusions: The Systematic Review Assistant-Deduplication Module offers users a reliable program to remove duplicate records with greater sensitivity and specificity than EndNote. This application will save researchers and information specialists time and avoid research waste. The deduplication program is freely available online.

UR - http://www.scopus.com/inward/record.url?scp=84939152797&partnerID=8YFLogxK

U2 - 10.1186/2046-4053-4-6

DO - 10.1186/2046-4053-4-6

M3 - Article

VL - 4

JO - Systematic Reviews

JF - Systematic Reviews

SN - 2046-4053

IS - 1

M1 - 6

ER -