Every March, millions of people turn their attention to filling out brackets for the NCAA basketball tournament, trying to predict who will make the Final Four. A University of Kansas professor has turned his statistical expertise toward the tournament, developing a model that can help predict who will win games, based on a wealth of data accumulated throughout the season.
Jonathan Templin, associate professor of educational psychology, has developed a statistical model in time for the tourney, and like so many coaches have preached before, consistency is key. The model has data for all 351 Division I basketball teams and assigns ratings based on how teams score, how well they defend, their home court advantage, and most importantly, how consistently they perform.
“In statistics, we’re not just interested in the prediction but how accurate the prediction is,” Templin said. “This model takes uncertainty and predicts how far off it is from the average with the same factors that predict the score.”
Given that he works at traditional basketball powerhouse KU, Templin uses the example of the Jayhawks’ last regular season game vs. Oklahoma State. The model predicted a final score of 87-85 in favor of Kansas, and the actual final score was 90-85. The data the model used was the scores from each game the teams had played.
“For each game there are two scores: One team’s offense playing against another’s defense and vice versa,” Templin said.
The model uses the strength of the scoring and defending teams along with home court advantage in predicting the scores of a game.
“The way it works is, essentially, when a team plays at home they get the advantage of scoring a few more points and giving up a few less than their opponents,” Templin said. “Each team has its own version of the home effect.”
Beyond the prediction, the model provides an estimate of how consistent each team is in scoring that many more points and allowing that amount fewer per game. If a team is good at consistently outscoring opponents and consistently defending well, it gets a higher overall rating, or estimate of team strength. At the end of the regular season, KU had the second-highest overall rating of 3.04, meaning the team was just over three standard deviations better than the average team.
Poor teams can be as many as 2.23 standard deviations worse than the average team. Florida was top rated at the end of the year, followed by Kansas, Gonzaga, Villanova, Maryland, UCLA, Saint Mary’s, Arizona, Louisville and Notre Dame to round out the top 10. That doesn’t necessarily mean they are the best teams, or that they are guaranteed to beat a team rated lower, but that they are the best at consistently matching their positive offensive and defensive outputs. In essence, it blends team strengths with team consistency.
As part of the analysis, 10,000 random tournament brackets are simulated. An algorithm draws scores of all tournament games randomly and provides the probability of a given team winning each game and advancing to a given round. It factors in the fact that games are played at neutral sites, thereby eliminating the home court effect.
The model is fed by data from sportsreference.com. Users can search the model by team to learn how their favorite ranks, who is best in each category, who is most consistent, how teams compare head-to-head and what specific teams’ strengths and weaknesses are.
The model can be handy for picking which team might win a given tournament game, but Templin cautions it is not a guarantee and urges users to view it as a source of information, not as a magic gambling bullet.
For example, were KU to meet Gonzaga in the tournament, the model shows Gonzaga outscores opponents on average by 14.85 and allows 11.34 points fewer than opponents, to KU’s 12.38 and 9.6 in those categories, respectively. However, KU is significantly more consistent in scoring and nearly as consistent in defending, giving the Jayhawks a 3.04 overall rating to Gonzaga’s 2.75. That means, were the game to be played multiple times, KU would win the majority. However, in a single game, anything can happen, Templin said.
The model can also be a handy way to try to predict upsets. For example, 12 seeds frequently beat 5 seeds. If one were to find a 12 seed that scores a lot but is inconsistent, one might pick them in hopes they are “on” that game and have one of their high-scoring performances. It also could be used to determine which less familiar teams play consistently good defense.
Templin plans to use the model in future research. He envisions publications for peer-reviewed sports science journals, specifically around the question of the importance of consistency as a factor in statistical models.
“Does standardization matter?” Templin said. “Research is ongoing into how you make a predictions system that accounts for all of these factors, and consistency is a key factor.”
If the model proves successful during the tournament, Templin said it could be further tested to see how accurately it can predict past tournament winners, based on all of the same factors and compared to actual outcomes. He also hopes to add new factors to future versions, including how much rest teams had before a game, the difference between neutral sites and more.
A lifelong sports fan, Templin said his work in statistics led him to apply models used in educational, clinical and other settings to sports. He is a co-mentor of a recently formed KU student club for those interested in sports analytics, and he co-authored a study looking at variation of movement patterns in people with and without mild Alzheimer’s disease to see if the patterns could predict likelihood of the disease. He also co-authored a book examining how psychometrics and statistics can help teachers better understand math scores and teach more effectively.