Suicide is a leading cause of death in the United States, but the models that have been used to predict suicide rates weight risk factors equally and rely on data for large geographic areas, limiting the precision of the predictions, according to Penn State researchers. Now, the researchers have developed a machine learning-based model that uses their newly developed suicide-vulnerability index, which weights risk factors, to identify at-risk communities at the U.S. county level.
The approach recently was published in npj Mental Health Research, a Nature Portfolio journal.
“Our goal was to develop a novel suicide-vulnerability index for U.S. counties with the help of a machine leaning-based suicide prediction model,” said paper co-author Soundar Kumara, Allen E. Pearce and Allen M. Pearce Professor of Industrial Engineering at Penn State, who is also affiliated with the College of Information Sciences and Technology. “By identifying the counties at higher risk for increased suicide rates, the model could help prompt targeted intervention programs.”
The researchers analyzed data from 2010-19 at the county level in the 3,140 U.S. counties, the smallest possible geographic classification available in the Centers for Disease Control and Prevention’s database. They identified 17 characteristics used to predict suicide rates and that could be categorized under demographics, socio-economic factors and health. The researchers suspected that some of these 17 characteristics would impact suicide rates more than others, and they set out to determine which factors affected suicide rates and by how much.
To identify the impact of each factor, the researchers used SHapley Additive exPlanations (SHAP), a game theory-based approach that explains how each variable contributes to the model’s prediction.
“SHAP values explore the impact of each feature by comparing the prediction results with and without that feature,” said co-author of the paper Kristin Sznajder, assistant professor of public health sciences at Penn State College of Medicine, who is also affiliated with the Huck Institutes of the Life Sciences and the Population Research Institute. “Using the SHAP values, the importance of all 17 features used in the prediction model training set were identified. By identifying and isolating the top five important features from our analysis, we developed the suicide vulnerability index. In earlier work, such indexes were created by including all the variables without considering their effects on the output.”
The top five county level features driving suicide-prediction results were population, percent African American population, percent white population, median age, and percent female population, where higher population, percent white population and median age correlated with an increase in suicide rates while higher percent African American population and percent female population saw a decrease in suicide rates.