In the movie “Moneyball,” a young economics graduate and a cash-strapped Major League Baseball coach introduce a new way to evaluate baseball players’ value. Their innovative idea to compute players’ statistical data and salaries enabled the Oakland A’s to recruit quality talent overlooked by other teams — completely revitalizing the team without exceeding budget.
New research at the Penn State College of Information Sciences and Technology could make a similar impact on the sport. The team has developed a machine learning model that could better measure baseball players’ and teams’ short- and long-term performance, compared to existing statistical analysis methods for the sport. Drawing on recent advances in natural language processing and computer vision, their approach would completely change, and could enhance, the way the state of a game and a player’s impact on the game is measured.
According to Connor Heaton, doctoral candidate in the College of IST, the existing family of methods, known as sabermetrics, rely upon the number of times a player or team achieves a discrete event — such as hitting a double or home run. However, it doesn’t consider the surrounding context of each action.
“Think about a scenario in which a player recorded a single in his last plate appearance,” said Heaton. “He could have hit a dribbler down the third base line, advancing a runner from first to second and beat the throw to first, or hit a ball to deep left field and reached first base comfortably but didn’t have the speed to push for a double. Describing both situations as resulting in ‘a single’ is accurate but does not tell the whole story.”
Heaton’s model instead learns the meaning of in-game events based on the impact they have on the game and the context in which they occur, then outputs numerical representations of how players impact the game by viewing the game as a sequence of events.
“We often talk about baseball in terms of ‘this player had two singles and a double yesterday,’ or ‘he went one for four,” said Heaton. “A lot of the ways in which we talk about the game just summarize the events with one summary statistic. Our work is trying to take a more holistic picture of the game and to get a more nuanced, computational description of how players impact the game.”
In Heaton’s novel method, he leverages sequential modeling techniques used in natural language processing to help computers learn the role or meaning of different words. He applied that approach to teach his model the role or meaning of different events in a baseball game — for example, when a batter hits a single. Then, he modeled the game as a sequence of events to offer new insight on existing statistics.
“The impact of this work is the framework that is proposed for what I like to call ‘interrogating the game,’” said Heaton. “We’re viewing it as a sequence in this whole computational scaffolding to model a game.”
The model’s output can effectively describe a player’s influence on the game over the short term, or their form. Displayed as 64-element vectors — obtained by adapting work from computer vision — these form embeddings capture a player’s in-game influence and can effectively be used to describe their impact in the short term, such as the span of 15 plate appearances, or averaged together to analyze longer time periods, such as over the course of the player’s career. Additionally, when combined with traditional sabermetrics, the form embeddings can predict the winner of a game with over 59% accuracy.
Heaton described how embeddings created by both his method and the traditional sabermetrics method plot the same data. When viewed over time, sabermetric-based representations of player impact can be somewhat sporadic, changing significantly from one game to the next. Heaton’s method helps “smooth out” the way players are described over time, while still allowing for fluctuation in player performance.
“Both embeddings can help differentiate good players from bad players,” said Heaton. “But ours provides much more nuance into the exact way in which the good players impact the game.”
To train their model, the researchers used data previously collected from systems installed at major league stadiums that track detailed information on every pitch thrown, such as player positioning in the field, base occupancy, and pitch velocity and rotation. They focused on two types of data: pitch-by-pitch data, to analyze information such as pitch type and launch angle; and season-by-season data, to investigate position-specific information such as walks and hits per inning pitched for pitchers and on-base-plus-slugging percentage for batters.
Each pitch in the collected dataset has three identifying features: the game in which it took place, the at-bat number within the game and the pitch number within the at-bat. By using these three pieces of information, the researchers were able to completely reconstruct the sequence of events that constitute an MLB game.
The researchers then identified 325 possible game changes that could occur when a pitch is thrown, such as changes in the ball-strike count and base occupancy. They combined this information with existing pitch-by-pitch data that describes the thrown pitch and at-bat action, then input player records from sabermetrics to be able to describe what happened, how it happened, and who was involved with each play.
The work blends Heaton’s research focus of natural language processing with his interest in the historical statistical analysis of baseball.
“There’s this whole ecosystem built up around modeling language and the sequence of words,” said Heaton. “It seems like there was potential for it to be adopted to model sequences of other things; to just generalize it a little bit. I started thinking about sports analytics and it just seemed like there was a lot that could be done to improve both our understanding of the game and how the game is modeled computationally.”
The researchers hope that their work will serve as a strong starting point toward a new way of describing how athletes in baseball and other sports impact the course of play.
“This work has the potential to significantly advance the state of the art in sabermetrics,” said Prasenjit Mitra, professor of information sciences and technology and co-author on the paper. “To the best of our knowledge, ours is the first to capture and represent a nuanced state of the game and utilize this information as the context to evaluate the individual events that are counted by traditional statistics — for example, by automatically building a model that understands key moments and clutch events.”
Heaton and Mitra presented their paper, “Using Machine Learning to Describe How Players Impact the Game in the MLB,” was one of seven finalists in the 2022 Research Paper competition at the MIT Sloan Sports Analytics Conference earlier this month.
More information on the competition, as well as links to the paper and its opensource code and data can be found at www.sloansportsconference.com/research-paper-competition.
In related news:
New metric more accurately evaluates physician performance
Inspired by the advanced analytics used in sports, a Northwestern University-led research team has developed a new approach for measuring physician adherence to best practices. The study also shows that high-performing individuals are more likely to be aware of system-level challenges.
The data-driven method takes into consideration the characteristics of the patients seen by a physician (how sick a patient is, for example) in order to estimate an adjusted performance that more accurately captures the know-how of a set of physicians. The researchers’ case study focused on patients with acute respiratory distress syndrome (ARDS), a form of lung failure.
“Our new metric could help change behavior more broadly, beyond our case study, and increase adoption of medical innovations,” said Luís A. Nunes Amaral, a data science expert and a corresponding author of the study. “Our method also is very flexible — it can be used on individual physicians, on groups of physicians, on entire hospitals or on entire hospital groups.”
Amaral is the Erastus Otis Haven Professor of chemical and biological engineering at the McCormick School of Engineering. He also is co-director of NICO (Northwestern Institute on Complex Systems).
The study was published recently by the journal BMC Medical Research Methodology.
Dr. Curtis H. Weiss, a pulmonary and critical care physician at NorthShore University HealthSystem and an adjunct assistant professor at McCormick, also is a corresponding author of the study. Amaral has long collaborated with him, using data science to improve critical care medicine.
Evaluating ventilator management
Amaral, Weiss and their research team evaluated the ventilator management by 48 critical care physicians of 362 patients with ARDS at a large academic hospital. Because the lungs of patients with ARDS are very fragile, accepted best practice is to mechanically deliver lower breath volumes to these patients than is standard for patients on ventilators. Proper ARDS diagnosis is critical but can be challenging.
The team examined the relationship between recognition of ARDS and physician characteristics, such as demographics, social network position and attitudes toward innovation. They found the only factor associated with ARDS recognition was training background; pulmonary and critical care medicine training was associated with higher recognition. Once a difference is identified, interventions can be designed to address the problem.
“ARDS is a pulmonary disease, and so it makes sense that physicians with advanced pulmonary training would be more primed to spot ARDS,” said Meagan A. Bechel, the paper’s first author and a recent graduate of Northwestern’s Medical Scientist Training Program.
“Even after we account for the number of ARDS patients seen, the severity of disease and the number of years in clinical practice, specialty training still has the largest impact on ARDS recognition,” Bechel said. “This is important because ARDS is a diverse syndrome and is treated by physicians from several specialties — in fact, the majority of our ARDS patients were cared for by non-pulmonary physicians.”
Take home runs, as an analogy
As an analogy, consider the number of home runs hit by a baseball player as a measure of performance. Given two hitters with the same skill level, the one playing in a smaller field would be expected to hit more home runs. Paying more money to the one that hit more home runs without accounting for the difference in conditions would be a mistake.
“Similarly, physicians see patients with different characteristics,” Amaral said. “Some patients are easier to diagnose than others. If a physician sees a group of easy-to-diagnose patients while another sees patients who are very hard to diagnose, then it would be easier for the former to appear to be performing better, even though that is not the case.”
‘Squeaky wheel’ effect
Another interesting result of the study is related to the so-called “squeaky wheel.” When the researchers asked physicians how easy it was to find all the information for correctly diagnosing ARDS, they found that most physicians — except the higher-performing ones who recognized ARDS — had no complaints about the difficulty of the task.
“This suggest that users of a system might not identify problems with it because they are not engaging deeply with the system,” Amaral said. “That is, the squeaky wheel is actually telling us where the problems are. It’s a challenge for physicians to get the disparate medical information they need for an ARDS diagnosis.”
Weiss noted that severe COVID-19 can lead to ARDS, but that the data used in this study was from before the pandemic.
The title of the paper is “The first step is recognizing there is a problem: a methodology for adjusting for variability in disease severity when estimating clinician performance.” Meagan Bechel, a postdoctoral fellow in Amaral’s lab and a medical student at Northwestern University Feinberg School of Medicine, is the first author of the paper. Other authors are Adam R. Pahand Stephen D. Persell of Northwestern University.
The research was supported by the National Institute of General Medical Sciences (grant T32GM008152), the National Heart, Lung, and Blood Institute (grants K23HL118139 and R01HL140362-01A1), the Francis Family Foundation, the Department of Defense Army Research Office (grant W911NF-14-1-0259) and the National Center for Research Resources (grant 5UL1RR025741), which is now at the National Center for Advancing Translational Sciences (grant 8UL1TR000150).