Identifying Risk Factors for Premature Birth in the UK Millennium Cohort Using a Random Forest Decision-Tree Approach

Research output: Contribution to journalArticleResearchpeer-review

74 Downloads (Pure)


Prior research on causes of preterm birth has tended to focus on pathophysiological processes while acknowledging the role of socioeconomic indicators. The present research explored a wide range of factors plausibly associated with preterm birth informed by pathophysiological and evolutionary life history perspectives on gestation length. To achieve this, a machine learning ensemble classification data analysis approach, random forest (RF), was applied to the UK Millennium Cohort (18,201 births). The results highlighted the importance of socioeconomic variables and parental age in predicting preterm (before 37 completed weeks) and very preterm (before 32 weeks) birth. Infants born in households with low income and with young fathers had an increased risk of both very preterm and preterm birth. Maternal health and health problems during pregnancy were not found to be useful predictors. The best-performing algorithm was for very preterm birth and had 93% sensitivity and 100% specificity using six variables. Algorithms predicting preterm birth before 37 weeks showed increased error, with out-of-bag error rates of about 7% versus only 1% for those predicting very preterm birth. The poorer performance of algorithms predicting preterm births to 37 weeks of gestation suggests that some preterm birth may not result from pathology related to poor maternal health or social or economic disadvantage, but instead represents normal life-history variation.
Original languageEnglish
Pages (from-to)320-333
JournalReproductive Medicine
Issue number4
Publication statusPublished - 9 Dec 2022


Dive into the research topics of 'Identifying Risk Factors for Premature Birth in the UK Millennium Cohort Using a Random Forest Decision-Tree Approach'. Together they form a unique fingerprint.

Cite this