October 4, 2022 — Danfeng “Daphne” Yao, a computer science professor at Virginia Tech, wants to improve the prediction accuracy of machine learning models in medical applications. The results of his research were recently published in Communications Medicine, a selective open access journal of Nature Portfolio.
Danfeng “Daphne” Yao (left), faculty Elizabeth and James E. Turner Jr. ’56, and College of Engineering CACI faculty lead the research effort with Department of Computer Science graduate students Sharmin Afrose and Wenjia Song. Credit: Peter Means.
“An inaccurate prediction can have deadly consequences,” said Yao, who is both an Elizabeth and James E. Turner Jr. ’56 faculty member and a CACI faculty member at the College of Engineering. These prediction errors could lead to miscalculation of the probability that a patient will die during an emergency room visit or survive cancer.
Many clinical data sets are inherently unbalanced, Yao said, because they are dominated by majority groups. “In the typical paradigm of a single machine learning model, racial and age disparities are likely to exist, but are not reported,” she said.
Yao and his team of researchers collaborated with Charles B. Nemeroff, a member of the National Academy of Medicine and a professor in the Department of Psychiatry and Behavioral Sciences at the University of Texas at Dell Medical School in Austin, to study how biases in training data impact prediction outcomes, particularly the effect on underrepresented patients, such as younger patients or patients of color.
“I was absolutely thrilled to collaborate with Daphne Yao, who is a world leader in advanced machine learning,” Nemeroff said. “She discussed with me the idea that new advances in machine learning could be applied to a very important problem that clinical researchers frequently encounter, namely the relatively small number of ethnic minorities who generally enroll in trials. clinics.”
He said that low enrollment translates to medical conclusions drawn largely for white patients of European descent, which may not apply to minority ethnic groups.
“This new report provides a methodology to improve the accuracy of predictions for minority groups,” Nemeroff said. “Clearly, such findings have extremely important implications for improving the clinical care of patients who are members of minority ethnic groups.”
Yao’s Virginia Tech team consists of Department of Computer Science doctoral students Sharmin Afrose and Wenjia Song, as well as Chang Lu, Fred W. Bull Professor in the Department of Chemical Engineering. To conduct their research, they performed experiments on four different prognostic tasks on two datasets using a novel dual priority (DP) bias correction method that trains personalized models for specific ethnic or age groups. .
“Our work presents a novel artificial intelligence fairness technique for correcting prediction errors,” said Song, a fourth-year Ph.D. student whose research areas include machine learning in digital health and cybersecurity. “Our DP method improves minority class performance by up to 38% and significantly reduces prediction disparities between different demographic groups, up to 88% better than other sampling methods.”
Song, along with fellow graduate student Afrose, worked with specific data sets to conduct their experiments.
The surveillance, epidemiology, and end-results dataset was used by Song for tasks on breast cancer and lung cancer survival, while Afrose, a fifth-year Ph.D. student, worked with a dataset from Beth Israel Deaconess Medical Center in Boston for tasks predicting in-hospital mortality and decompensation.
“We’re thrilled to have found a solution to reduce bias,” said Afrose, whose research focuses on machine learning in healthcare and software security. “Our DP bias correction technique will reduce potentially fatal prediction errors for minority populations.”
With these results published and freely available, the team looks forward to collaborating with other researchers to use these methods in the analysis of their own clinical data.
“Our method is easy to deploy on various machine learning models and could help improve the performance of all prognostic tasks with representational biases,” Song said.
Communications Medicine is dedicated to publishing high-quality research, reviews, and commentaries in all areas of clinical, translational, and public health research.
Source: Jenise L. Jacques, Virginia Tech
#equity #technique #important #lifesaving #implications