TY - JOUR
T1 - Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning
AU - Susnjak, Teo
AU - Hwang, Peter
AU - Reyes, Napoleon
AU - Barczak, Andre L.C.
AU - Mcintosh, Timothy
AU - Ranathunga, Surangika
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/3/11
Y1 - 2025/3/11
N2 - This research pioneers the use of fine-tuned Large Language Models (LLMs) to automate Systematic Literature Reviews (SLRs), presenting a significant and novel contribution in integrating AI to enhance academic research methodologies. Our study employed advanced fine-tuning methodologies on open sourced LLMs, applying textual data mining techniques to automate the knowledge discovery and synthesis phases of an SLR process, thus demonstrating a practical and efficient approach for extracting and analyzing high-quality information from large academic datasets. The results maintained high fidelity in factual accuracy in LLM responses, and were validated through the replication of an existing PRISMA-conforming SLR. Our research proposed solutions for mitigating LLM hallucination and proposed mechanisms for tracking LLM responses to their sources of information, thus demonstrating how this approach can meet the rigorous demands of scholarly research. The findings ultimately confirmed the potential of fine-tuned LLMs in streamlining various labor-intensive processes of conducting literature reviews. As a scalable proof-of-concept, this study highlights the broad applicability of our approach across multiple research domains. The potential demonstrated here advocates for updates to PRISMA reporting guidelines, incorporating AI-driven processes to ensure methodological transparency and reliability in future SLRs. This study broadens the appeal of AI-enhanced tools across various academic and research fields, demonstrating how to conduct comprehensive and accurate literature reviews with more efficiency in the face of ever-increasing volumes of academic studies while maintaining high standards.
AB - This research pioneers the use of fine-tuned Large Language Models (LLMs) to automate Systematic Literature Reviews (SLRs), presenting a significant and novel contribution in integrating AI to enhance academic research methodologies. Our study employed advanced fine-tuning methodologies on open sourced LLMs, applying textual data mining techniques to automate the knowledge discovery and synthesis phases of an SLR process, thus demonstrating a practical and efficient approach for extracting and analyzing high-quality information from large academic datasets. The results maintained high fidelity in factual accuracy in LLM responses, and were validated through the replication of an existing PRISMA-conforming SLR. Our research proposed solutions for mitigating LLM hallucination and proposed mechanisms for tracking LLM responses to their sources of information, thus demonstrating how this approach can meet the rigorous demands of scholarly research. The findings ultimately confirmed the potential of fine-tuned LLMs in streamlining various labor-intensive processes of conducting literature reviews. As a scalable proof-of-concept, this study highlights the broad applicability of our approach across multiple research domains. The potential demonstrated here advocates for updates to PRISMA reporting guidelines, incorporating AI-driven processes to ensure methodological transparency and reliability in future SLRs. This study broadens the appeal of AI-enhanced tools across various academic and research fields, demonstrating how to conduct comprehensive and accurate literature reviews with more efficiency in the face of ever-increasing volumes of academic studies while maintaining high standards.
UR - http://www.scopus.com/inward/record.url?scp=105002570614&partnerID=8YFLogxK
U2 - 10.1145/3715964
DO - 10.1145/3715964
M3 - Article
AN - SCOPUS:105002570614
SN - 1556-4681
VL - 19
SP - 1
EP - 39
JO - ACM Transactions on Knowledge Discovery from Data
JF - ACM Transactions on Knowledge Discovery from Data
IS - 3
M1 - 68
ER -