Abstract
Hazards at construction sites can lead to severe accidents, posing significant risks to worker safety, financial stability, and public confidence in industry safety standards. As a result, understanding and preventing these accidents has become increasingly critical. Although previous studies have examined historical accidents through detailed reports, few have systematically applied automated natural language processing (NLP) techniques to un cover hidden topics and patterns in large datasets without manual intervention. This study addresses this gap by applying topic modeling to 22,623 accident reports from the Occupa tional Safety and Health Administration (OSHA) spanning 2004 to 2023. The results demon strate that BERTopic substantially outperforms the traditional LDA model across multiple accident datasets, achieving higher topic coherence and topic diversity. Leveraging contex tual embeddings, BERTopic identifies nuanced risk scenarios, occupation–accident patterns, and temporal trends that earlier text-mining approaches often overlooked. The findings also generate actionable managerial insights, including peak accident periods, vulnerable worker groups, and scenario-specific risk factors. Overall, this study provides a clearer and more data-driven understanding of construction accident mechanisms through advanced topic modeling. Applying BERTopic for topic extraction and content analysis introduces a novel and effective approach to analyzing construction accident reports. The insights derived provide valuable guidance for decision-makers in risk mitigation and accident prevention, while helping to rebuild public confidence in safety standards. Moreover, the approach’s reproducibility and potential for broader safety applications contribute to fostering a safer construction environment.
| Original language | English |
|---|---|
| Article number | 10 |
| Pages (from-to) | 1-32 |
| Number of pages | 32 |
| Journal | Buildings |
| Volume | 16 |
| Issue number | 1 |
| Early online date | 19 Dec 2025 |
| DOIs | |
| Publication status | Published - Jan 2026 |