Text Message Analysis Using Machine Learning to Assess Predictors of Engagement With Mobile Health Chronic Disease Prevention Programs: Content Analysis

Harry Klimis, Joel Nothman, Di Lu, Chao Sun, N Wah Cheung, Julie Redfern, Aravinda Thiagalingam, Clara K Chow

Research output: Other contributionDiscipline Preprint RepositoryResearch



Text messages, as a form of mobile health (mHealth), are increasingly being used to support individuals with chronic disease in novel ways that leverage the mobility and capabilities of mobile phones. However, there are knowledge gaps with mHealth including how to maximise engagement.


The aims were (1) to develop machine learning (ML) models to categorise program text messages and participant replies, and (2) to examine whether message charateristics were associated with: a) premature program stopping, and b) engagement.


We assessed communication logs from text message-based chronic disease prevention studies that encouraged unidirectional (SupportMe/ITM) and bidirectional (TEXTMEDS) communication. Outgoing messages were manually categorised into five message intents (informative, instructional, motivational, supportive, and notification) and replies into seven groups (stop, thanks, questions, reporting healthy, reporting struggle, general comment, and other). Grid search with 10-fold cross validation was implemented to identify the best perfoming ML models and evaluated using nested cross-validation. Regression models with interaction terms were used to compare the association of message intent with a) premature program stopping, and b) engagement (replied at least three times and did not prematurely stop), in SupportMe/ITM and TEXTMEDS.


We analysed a total of 1,550 messages and 4,071 participant replies. 145/2642 (5.5%) participants responded ‘stop’, and 309/2642 (11.7%) participants were engaged. Our optimal ML model correctly classified program outgoing message intent with 76.6% (95% CI 63.5–89.8) and replies with 77.8% (95% CI 74.1–81.4) balanced accuracy. Overall, “supportive” (OR 0.53; 95% CI 0.35-0.81) messages were associated with reduced chance of stopping, as were “informative” messages in SupportMe/ITM (OR 0.35; 95% CI 0.20-0.60) but not in TEXTMEDS (P for interaction <0.001). “Notification” messages were associated with a higher chance of stopping in SupportMe/ITM (OR 5.76; 95% CI 3.66-9.06) but not TEXTMEDS (P for interaction=0.01). Overall, “Informative” (OR 1.76; 95% CI 1.46-2.12) and “instructional” (OR 1.47; 95% CI 1.21-1.80) messages were associated with higher engagement, but not “motivational” messages (P=0.37). For “supportive” messages, the association with engagement was opposite with SupportMe/ITM (OR 1.77; 95% CI 1.21-2.58) compared to TEXTMEDS (OR 0.77; 95% CI 0.60-0.98)(P for interaction <0.001). “Notification” messages were associated with reduced engagement in both SupportMe/ITM (OR 0.07; 95% CI 0.05-0.10) and TEXTMEDS (OR 0.28; 95% CI 0.20-0.39), but the strength of the association was greater in SupportMe/ITM (P for interaction <0.001).


The ML models enable monitoring and detailed characterisation of program messages and participant replies. Outgoing message intent may influence premature program stopping and engagement, although the strength and direction of association appears to vary by program type. Future studies will need to examine whether modifying message characteristics can optimise engagement, and whether this leads to behaviour change.
Original languageEnglish
PublisherJMIR Preprints
Number of pages43
Publication statusSubmitted - 6 Feb 2021


Dive into the research topics of 'Text Message Analysis Using Machine Learning to Assess Predictors of Engagement With Mobile Health Chronic Disease Prevention Programs: Content Analysis'. Together they form a unique fingerprint.

Cite this