The ability to harvest the wealth of information contained in biomedical Big Data will advance our understanding of human health and disease; however, lack of appropriate tools, poor data accessibility, and insufficient training, are major impediments to rapid translational impact. To meet this challenge, the National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative in 2012.
BD2K is a trans-NIH initiative established to enable biomedical research as a digital research enterprise, to facilitate discovery and support new knowledge, and to maximize community engagement.
The BD2K initiative addresses four major aims that, in combination, are meant to enhance the utility of biomedical Big Data:
- To facilitate broad use of biomedical digital assets by making them discoverable, accessible, and citable.
- To conduct research and develop the methods, software, and tools needed to analyze biomedical Big Data.
- To enhance training in the development and use of methods and tools necessary for biomedical Big Data science.
- To support a data ecosystem that accelerates discovery as part of a digital enterprise.
Overall, the focus of the BD2K initiative is the development of innovative and transforming approaches as well as tools for making Big Data and data science a more prominent component of biomedical research.
The term 'Big Data' is meant to capture the opportunities and challenges facing all biomedical researchers in accessing, managing, analyzing, and integrating datasets of diverse data types [e.g., imaging, phenotypic, molecular (including various '–omics'), exposure, health, behavioral, and the many other types of biological and biomedical and behavioral data] that are increasingly larger, more diverse, and more complex, and that exceed the abilities of currently used approaches to manage and analyze effectively. Big Data emanate from three sources: (1) a small number of groups that produce very large amounts of data, usually as part of projects specifically funded to produce important resources for use by the entire research community; (2) individual investigators who produce large datasets, often empowered by the use of readily available new technologies; and (3) an even greater number of sources that each produce small datasets (e.g. research data or clinical data in electronic health records) whose value can be amplified by aggregating or integrating them with other data.
- Locating data and software tools.
- Getting access to the data and software tools.
- Standardizing data and metadata.
- Extending policies and practices for data and software sharing.
- Organizing, managing, and processing biomedical Big Data.
- Developing new methods for analyzing & integrating biomedical data.
- Training researchers who can use biomedical Big Data effectively.
- New policies that better encourage data and software sharing
- A catalog of research datasets that will enable researchers to find and cite datasets
- Community-based data and metadata standards
Analysis Methods and Software:
- Development and hardening of software to meet needs of the biomedical research community
- Access to large-scale computing to enable data analysis on Big Data
- Dynamic community engagement of users and developers
- Increase number of computationally and quantitatively skilled biomedical trainees
- Strengthen the computational and quantitative skills of all biomedical researchers
- Make training available to NIH staff to enhance NIH review and program oversight
Centers of Excellence:
- Investigator-initiated centers
- NIH-specified centers
Giroux, Craig (CSR)
Collier, Elaine (NCATS)
Edwards, Emmeline (NCCAM)
Couch, Jennifer (NCI)
Seto, Belinda (NEI)
Brooks, Lisa (NHGRI)
Gan, Weiniu (NHLBI)
Petanceska, Suzana (NIA)
Grant, Bridget (NIAAA)
Giovanni, Maria (NIAID)
Pai, Vinay (NIBIB)
Bures, Regina (NICHD)
Radman, Thomas (NIDA)
Miller, Roger (NIDCD)
Scholnick, Steven (NIDCR)
Margolis, Ronald (NIDDK)
Dearry, Allen (NIEHS)
Lyster, Peter (NIGMS)
Farber, Greg (NIMH)
Hunter, Joyce (NIMHD)
Gnadt, Jim (NINDS)
Hardy, Lynda (NINR)
Huerta, Mike (NLM)
Carr, Sarah (OD - OSP)
Derr, Leslie (OD - OSC)
Mabry, Patty (OD - ODP)
Phil Bourne (ADDS)
James Anderson (DPCPSI)
Michael Gottesman (OIR)
Kathy Hudson (OD)
Betsy Humphreys (NLM)
Alan Koretsky (NINDS)
Michael Lauer (NHLBI)
Jon Lorsch (NIGMS)
Douglas Lowy (NCI)
John J. McGowan (NIAID)
Andrea Norris (CIT)
Sally Rockey (OER)
Belinda Seto (NEI)
Acting Executive Secretary:
Allison Mandich (NHGRI)
- Mentored Career Development Award in Biomedical Big Data Science for Clinicians and Doctorally Prepared Scientists (K01)
Due Date: April 1, 2015
- Courses for Skills Development in Biomedical Big Data Science (R25)
Due Date: April 1, 2015
- Open Educational Resources for Biomedical Big Data (R25
Due Date: April 1, 2015
- Revisions to Add Biomedical Big Data Training to Active NLM Institutional Training Grants in Biomedical Informatics (T15)
Due Date: July 27, 2015