Abstract
Over 700,000 people die due to suicide annually according to the WHO, and prior attempts are regarded as one of the strongest risk factors. Emergency departments are a key point of contact for patients in mental health crises, but effective measurement of suicidal presentations to emergency departments (EDs) is hindered by documented low sensitivity of standardised tools and International Classification of Disease codes for these presentations. Recent research has applied combinations of natural language processing and modelling techniques to automate the manual review of unstructured fields, such as clinical notes. Research has not yet investigated how the performance of these automated tools varies with respect to the complexity or interpretability of text encoding and modelling techniques, which is particularly important in this sensitive health setting. The objective of this work is to determine the trade-off between performance and interpretability of techniques in this setting. We empirically assess the predictive performance of automated tools as a product of text encoding and modelling techniques on a manually coded dataset of 91,778 ED presentations to two public teaching hospitals on the Gold Coast (Australia). For text encoding, we consider phrase matching derived from expert input, term-frequency inverse document frequency, static embeddings via algorithms like word2vec, and contextual embeddings generated by a pretrained transformer (clinicalBERT). For modelling techniques, we consider logistic regression, decision trees, support vector machines, ensembles, custom neural networks, and finetuned transformers. Predictive performance is assessed via stratified five-fold cross-validation with accuracy, sensitivity, precision-recall curves and Hosmer-Lemeshow tests. The research will provide a framework for hospital decision-makers to determine the appropriate trade-off between competing needs for predictive accuracy and interpretability, supporting initiatives to improve measurement of suicidality. Findings highlight the improved predictive performance associated with both more complex text representations and modelling techniques.
Original language | English |
---|---|
Pages | 1-1 |
Number of pages | 1 |
Publication status | Published - Sept 2023 |
Event | Royal Statistical Society International Conference 2023 - Harrogate Convention Centre, Harrogate, Yorkshire, United Kingdom Duration: 4 Sept 2023 → 7 Sept 2023 https://rss.org.uk/training-events/conference-2023/ |
Conference
Conference | Royal Statistical Society International Conference 2023 |
---|---|
Country/Territory | United Kingdom |
City | Harrogate, Yorkshire |
Period | 4/09/23 → 7/09/23 |
Other | The RSS International Conference regularly attracts more than 500 attendees from over 30 countries providing one of the best opportunities for anyone interested in statistics and data science to come together to share knowledge and network. We are delighted to be holding the conference in Harrogate, Yorkshire for the first time in 2023. As usual, the conference programme will feature top keynote speakers, invited topic sessions, professional development workshops, contributed and rapid-fire talks, and poster presentations, as well as many opportunities for networking. |
Internet address |