AI enables large-scale brain tumor studies without sharing patient data – ScienceDaily

Researchers from Penn Medicine and Intel Corporation led the largest-ever global machine learning effort to securely pull together knowledge from brain scans of 6,314 glioblastoma (GBM) patients at 71 sites around the world and develop a model that would enable the identification and prediction of Boundaries can be improved in three tumor subcompartments without compromising patient privacy. Their results were published today in nature communication.

“This is the largest and most diverse data set of glioblastoma patients ever considered in the literature and was made possible through federated learning,” said senior author Spyridon Bakas, PhD, assistant professor of pathology and laboratory medicine and radiology, at the Perelman School of Medicine at the University of Pennsylvania. “The more data we can feed into machine learning models, the more accurate they become, which in turn may improve our ability to more accurately understand, treat, and remove glioblastoma in patients.”

Researchers studying rare diseases such as GBM, an aggressive type of brain tumor, often have patient populations that are restricted to their own facility or geographic location. Due to privacy laws, such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA) in the United States and the General Data Protection Regulation (GDPR) in Europe, sharing data between institutions without compromising patient privacy is a major impediment for many healthcare providers.

A more recent machine learning approach, dubbed federated learning, offers a solution to these hurdles by bringing the machine learning algorithm to the data, rather than following the current paradigm of centralizing data in the algorithms. Federated learning — an approach first implemented by Google for keyboard autocorrection — trains a machine learning algorithm across multiple distributed devices or servers (in this case, institutions) that hold local samples of data without actually sharing them. It has already been shown to allow clinicians at institutions in different countries to collaborate on research without sharing private patient data.

Bakas led this extensive collaborative study along with first authors Sarthak Pati, MS, a senior software engineer at Penn’s Center for Biomedical Image Computing & Analytics (CBICA), Ujjwal Baid, PhD, a postdoctoral fellow at CBICA, and Brandon Edwards, PhD, a research scientist at Intel Labs and Micah Sheller, a research scientist at Intel Labs.

“Data helps drive discovery, particularly in rare cancers where available data can be scarce. The federated approach we are outlining allows access to maximum data while reducing institutional burdens on data sharing.” said Jill Barnholtz-Sloan, PhD, an associate professor at Case Western Reserve University School of Medicine.

The model followed a tiered approach. The first stage, called a public output model, was pre-trained with publicly available data from the International Brain Tumor Segmentation (BraTS) Challenge. The model was tasked with identifying the boundaries of three GBM tumor subcompartments: enhancing tumor (ET), which represents the breakdown of the vascular blood-brain barrier within the tumor; the “Tumor Core” (TC), which contains the ET and the part that kills tissue and represents the part of the tumor relevant to surgeons who remove it; and the “whole tumor” (WT), defined by the union of the TC and the infiltrated tissue, which is the entire area that would be treated with radiation.

To do this, the data from 231 patient cases from 16 sites and the resulting model were first validated using the local data at each site. The second stage, called the preliminary consensus model, used the initial public model and integrated more data from 2,471 patient cases from 35 sites, improving its accuracy. The final phase final consensus modelused the updated model and integrated the largest data set of 6,314 patient cases (3,914,680 images) at 71 sites on 6 continents to further optimize and test generalizability to invisible data.

As a control for each step, the researchers excluded 20 percent of the total cases contributed from each participating site from the model training process and used them as “local validation data.” This allowed them to assess the accuracy of the collaborative method. To further assess the generalizability of the models, six sites were not involved in any of the training phases to represent a completely unseen out-of-sample data population of 590 cases. Notably, the American College of Radiology website validated their model using data from a national clinical trial.

After model training, the final consensus model yielded significant performance improvements over the employees’ local validation data. The final consensus model had an improvement of 27% in ET boundary detection, 33% in TC boundary detection, and 16% in WT boundary detection. The improved result is a clear indication of the usefulness that can be offered by accessing more cases, not only to improve the model but also to validate it.

Looking to the future, the authors hope that due to the generic methodology of federated learning, its applications in medical research can be far-reaching and can be applied not only to other types of cancer, but also to other diseases such as neurodegeneration and beyond. They also expect more research to show that federated learning can comply with security and privacy protocols around the world.

Funding for this research was provided by the National Institutes of Health (U01CA242871, R01NS042645, U24CA189523, U24CA215109, U01CA248226, P30CA510081231, R50CA211270, UL1TR001433, R21EB0302091232, R37CA214955, R01CA233888, U10CA21661, U10CA37422, U10CA180820,1235U10CA180794, U01CA176110, R01CA082500, CA079778, CA080098 , CA180794, CA180820, 1236CA180822, CA180868) and the National Science Foundation (2040532, 2040462).

Intel Corporation provided software engineering staff and privacy expertise to the project during the development of the software used.

Leave a Reply

Your email address will not be published. Required fields are marked *