The mission of the NIH Big Data to Knowledge (BD2K) initiative is to enable biomedical scientists to capitalize more fully on the Big Data being generated by those research communities. With advances in technologies, these investigators are increasingly generating and using large, complex, and diverse datasets. Consequently, the biomedical research enterprise is increasingly becoming data-intensive and data-driven. However, the ability of researchers to locate, analyze, and use Big Data (and more generally all biomedical and behavioral data) is often limited for reasons related to access to relevant software and tools, expertise, and other factors. BD2K aims to develop the new approaches, standards, methods, tools, software, and competencies that will enhance the use of biomedical Big Data by supporting research, implementation, and training in data science and other relevant fields that will lead to:
- Appropriate access to shareable biomedical data through technologies, approaches, and policies that enable and facilitate widespread data sharing, discoverability, management, curation, and meaningful re-use;
- Development of and access to appropriate algorithms, methods, software, and tools for all aspects of the use of Big Data, including data processing, storage, analysis, integration, and visualization;
- Appropriate protections for privacy and intellectual property;
- Development of a sufficient cadre of researchers skilled in the science of Big Data, in addition to elevating general competencies in data usage and analysis across the behavioral research workforce.
Overall, the focus of the BD2K initiative is the development of innovative and transforming approaches as well as tools for making Big Data and data science a more prominent component of biomedical research.
The term 'Big Data' is meant to capture the opportunities and challenges facing all biomedical researchers in accessing, managing, analyzing, and integrating datasets of diverse data types [e.g., imaging, phenotypic, molecular (including various '–omics'), exposure, health, behavioral, and the many other types of biological and biomedical and behavioral data] that are increasingly larger, more diverse, and more complex, and that exceed the abilities of currently used approaches to manage and analyze effectively. Big Data emanate from three sources: (1) a small number of groups that produce very large amounts of data, usually as part of projects specifically funded to produce important resources for use by the entire research community; (2) individual investigators who produce large datasets, often empowered by the use of readily available new technologies; and (3) an even greater number of sources that each produce small datasets (e.g. research data or clinical data in electronic health records) whose value can be amplified by aggregating or integrating them with other data.
- Locating data and software tools.
- Getting access to the data and software tools.
- Standardizing data and metadata.
- Extending policies and practices for data and software sharing.
- Organizing, managing, and processing biomedical Big Data.
- Developing new methods for analyzing & integrating biomedical data.
- Training researchers who can use biomedical Big Data effectively.
- New policies that better encourage data and software sharing
- A catalog of research datasets that will enable researchers to find and cite datasets
- Community-based data and metadata standards
Analysis Methods and Software:
- Development and hardening of software to meet needs of the biomedical research community
- Access to large-scale computing to enable data analysis on Big Data
- Dynamic community engagement of users and developers
- Increase number of computationally and quantitatively skilled biomedical trainees
- Strengthen the computational and quantitative skills of all biomedical researchers
- Make training available to NIH staff to enhance NIH review and program oversight
Centers of Excellence:
- Investigator-initiated centers
- NIH-specified centers
Vivien Bonazzi (NHGRI)
Lisa Brooks (NHGRI)
Jennifer Couch (NCI)
Allen Dearry (NIEHS)
Leslie Derr (OD)
Michelle Dun (NCI)
Maria Giovanni (NIAID)
Susan Gregurick (NIGMS)
Mark Guyer (NHGRI)
Lynda Hardy (NINR)
Mike Huerta (NLM)
Jennie Larkin (NHLBI)
Peter Lyster (NIGMS)
Ronald Margolis (NIDDK)
Ajay Pillai (NHGRI)
Belinda Seto (NIBIB)
Christopher Wellington (NHGRI)
Eric Green (Acting ADDS & NHGRI)
James Anderson (DPCPSI)
Michael Gottesman (OIR)
Kathy Hudson (OD)
Betsy Humphreys (NLM)
Alan Koretsky (NINDS)
Michael Lauer (NHLBI)
Jon Lorsch (NIGMS)
Douglas Lowy (NCI)
John J. McGowan (NIAID)
Andrea Norris (CIT)
Sally Rockey (OER)
Belinda Seto (NIBIB)
Acting Executive Secretary:
Allison Mandich (NHGRI)
- Input on Information Resources for Data-Related Standards Widely Used in Biomedical Science(RFI)
Response Date: September 30, 2014
- Predoctoral Training in Biomedical Big Data Science (T32)
- Revisions to Add Biomedical Big Data Training to Active Institutional Training Grants (T32)