Skip to main content

Data ecosystem to be built for National Institute of Allergy and Infectious Diseases will give researchers a secure way to collaborate on research and medical advances

LA JOLLA, CA—Scientists studying COVID-19 and other diseases generate data on an unprecedented scale. Enabling the global scientific community to discover and use these data poses real technical challenges.

A new contract awarded to Scripps Research and bioinformatics firm Seven Bridges envisions a future federated data ecosystem designed to allow researchers to securely find, access and analyze research data to speed development of diagnostics, therapeutics and vaccines.

The $7.5 million NIAID Data Ecosystem (NDE) contract will enable establishment of a secure system for the sharing and analyzing of data sets among scientists who work with the National Institute of Allergy and Infectious Diseases. It’s an ambitious but vitally important project, say the Scripps Research principal investigators, all pioneers in science’s open data movement.

Laura Hughes, PhD, senior staff scientist, is the Scripps Research project lead and liaison to the scientific community. Associate professor Chunlei Wu, PhD, will address metadata pipelines; while Andrew Su, PhD, professor and director of bioinformatics for the Scripps Research Translational Institute, will focus on connecting dataset discovery with analysis.

The NDE will handle data from research on a variety of infectious and immune-mediated diseases, including COVID-19, Ebola, influenza, Zika, HIV, asthma, autoimmune disease, food allergies and organ transplantation. Researchers working with NIAID produce many petabytes of information, such as microscope imagery, clinical trial records, drug screening results and viral variant genotypes. The ability to discover data from related studies, even just to know what data already exists at the beginning of a research project, will become invaluable to the scientific community, Hughes predicts.

Pandemic drives collaboration demand

“One of the things that has been abundantly clear during the COVID-19 pandemic is how willing the world’s researchers are to work together,” Hughes says. “The tricky thing is, their data are scattered all over the world, in many formats, from many providers.”

The Scripps Research team already has dived into the pandemic’s scattered-data problem and produced widely used solutions. With the creation of outbreak.info, they integrated epidemiology data with coronavirus variant sequences provided by thousands of international sources through the open data platform GISAID. The Scripps Research team produced visualization tools that show at-a-glance, in near real-time, which variants are circulating in a given region.

More than 3.1 million SARS-CoV-2 sequences—and counting—can be analyzed through the site. Outbreak.info also tracks other datasets, enabling researchers to keep tabs on the ever-changing understanding of the virus and COVID-19 disease.

“Every day there are 10,000 to 15,000 new sequences added,” Hughes says. “COVID-19 highlighted the importance of being able to rapidly find information to foster analyses and collaborations, but the need existed long before the pandemic. The demand is really general to all of biomedical science.”

A federation for discovery

Making the scientific community’s data more readily usable has the potential to accelerate discovery in important ways, says Seven Bridge’s Jack DiGiovanna, PhD, senior vice president and program director.

“Today, over 80% of relevant data is not accessible to the research community,” DiGiovanna says. “Together we will build a federated data ecosystem that enables researchers to discover available datasets, search within those data and then seamlessly analyze cohorts of information in a secure, collaborative workspace environment.”

The planned NDE will involve two core elements. The Scripps Research team will build the discovery engine capable of finding datasets across NIAID-relevant data repositories. The Seven Bridges team will provide a secure analysis workspace, where authorized researchers will be able to access, query, analyze, and collaborate on data.

Participating research groups will continue to own and maintain their respective data sets. When they wish to share the information, the interface will allow them to make it available to authorized researchers.

“As much as possible, we wanted to leverage the work people have already done. So, this is to be a gateway to data resources, as opposed to us maintaining and updating the data,” Hughes says.

“The success of this project will not be in the number of datasets we collect but in how we impact researchers,” Hughes adds. “We are developing tools for the entire scientific community.”

The NIAID Data Ecosystem project is funded in whole or in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, ID/IQ Agreement Number 17X146F3 under contract number 75N91019D00024.