NIH SCIENTIFIC DATA SHARING
By: Louis Grue
The National Institutes of Health (NIH) issued the final NIH Policy for Data Management and Sharing (DMS Policy) to promote the management and sharing of scientific data generated from NIH-funded or conducted research. The Policy establishes the requirements of sub-mission of Data Management and Sharing Plans and compliance with NIH Institute, Center, or Office (ICO)-approved Plans. It also emphasizes the importance of good data management practices and establishes the expectation for maximizing the appropriate sharing of scientific data generated from NIH-funded or conducted research, with justified limitations or exceptions. The Policy applies to research funded or conducted by NIH that results in the generation of scientific data.
As the DMS Policy is released, the world is amid the COVID-19 pandemic. The recognition that more open data sharing can lead to faster advances and treatments has led to an unprecedented worldwide effort to openly share publications and data related to both SARS-CoV-2 (the novel coronavirus that causes COVID-19) and coronaviruses more generally. While this is a specific example of an urgent public health need, patients, families, and patient advocacy groups consider the diseases and conditions that affect them to be of equal urgency, as do those who research these diseases and conditions and treat affected patients. With public input, NIH has worked to develop and refine this DMS Policy, the goal of which is to increase the sharing of scientific data generated from NIH-funded research to ultimately enhance health, lengthen life, and reduce illness and disability.
The NIH looks forward to working with applicants and the funded community as they prepare to meet the DMS Policy’s requirements and expectations, as we all move toward a future in which data sharing is a community norm.
Data Management & Sharing Policy Overview
The NIH has issued the DMS Policy to promote the sharing of scientific data. Sharing scientific data accelerates biomedical research discovery, in part, by enabling validation of research results, providing accessibility to high-value datasets, and promoting data reuse for future research studies.
Under the DMS Policy, the NIH expects investigators and institutions to:
• Plan and budget for the managing and sharing of data
• Submit a DMS plan for review when applying for funding
• Comply with the approved DMS plan
Research Covered Under the 2023 Data Management & Sharing Policy
The NIH DMS Policy applies to all research, funded, or conducted in whole or in part by NIH, that results in the generation of scientific data.
This includes all NIH-supported research regardless of funding level, including:
• Extramural (grants)
• Extramural (contracts)
• Intramural research projects
• Other funding agreements
The DMS Policy does not apply to research and other activities that do not generate scientific data, for example: training, infrastructure development, and non-research activities.
Scientific Data is defined as data commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publication.
Scientific data includes any data needed to validate and replicate research findings.
Scientific data does not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects such as laboratory specimens.
Foreign Collaboration
Policies related to data sharing vary across countries. Investigators from foreign institutions and U.S. investigators collecting data in other countries should familiarize themselves with the policies governing data sharing in the countries in which they plan to work and to address any specific limitations in the plan in their application.
Considerations for Proprietary Data
The NIH understands that some scientific data generated with NIH funds may be proprietary. Under the Small Business Innovation Research (SBIR) and Small Business Technology Transfer (STTR) Program Policy Directive, effective May 2, 2019, SBIR and STTR awardees may withhold applicable data for 20 years after the award date, as stipulated in the specific SBIR/STTR funding agreement and consistent with achieving program goals. SBIR and STTR awardees are expected to submit a Data Management & Sharing Plan per DMS Policy requirements.
Issues related to proprietary data also can arise when co-funding is provided by the private sector (for example, the pharmaceutical or biotechnology industries). NIH recognizes that the extent of data sharing may be limited by restrictions imposed by licensing limitations attached to materials needed to conduct the research. Applicants should discuss projects with proposed collab-orators early to avoid agreements that prohibit or un-necessarily restrict data sharing. NIH staff will evaluate the justifications of investigators who believe that they are unable to share data.
Data Management
“Proper data management is crucial for maintaining scientific rigor and research integrity.”
Data management is the process of validating, organizing, protecting, maintaining, and processing scientific data to ensure the accessibility, reliability, and quality of the data for its users. Proper data management helps maintain scientific rigor and research integrity. Keeping good track of data and associated documentation lets researchers and collaborators use data consistently and accurately. Carefully storing and documenting data also allows more people to use the data in the future, potentially leading to more discoveries beyond the initial research.
The NIH emphasizes the importance of good data management practices and encourages data management to be reflective of practices within specific research communities.
FAIR Principles
The NIH encourages data management and sharing practices to be consistent with the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. These principles make it easier for computers to process and analyze datasets, which is important when reusing or repurposing datasets for secondary research.
Length of Time to Maintain Data
Per Section 8.4.2 of the NIH Grants Policy Statement, grantee institutions are required to keep the data for 3 years following closeout of a grant or contract agreement. Contracts may specify different time periods.
Please note that the grantee institution may have additional policies and procedures regarding the custody, distribution, and required retention period for data produced under research awards.
Metadata and Other Associated Documentation
Metadata Definition: Data that provide additional information intended to make scientific data interpretable and reusable (e.g., date, independent sample and variable construction and description, methodology, data provenance, data transformations, any intermediate or descriptive observational variables).
• Metadata and other documentation associated with a dataset allow users to understand how the data were collected and how to interpret the data. Importantly, this ensures that others can use the dataset and prevents misuse, misinterpretation, and confusion.
• The exact metadata or other associated documentation will vary by scientific area, study design, the type of data collected, and characteristics of the dataset.
• Methodology and procedures used to collect the data
• Data labels
• Definitions of variables
• Any other information necessary to reproduce and understand the data
Naming Conventions
Within a project team, agreement on naming conventions for multiple objects or files—or multiple versions of files—could be useful before embarking on a project that generates large amounts of data that need names or unique identifiers.
Common Data Elements
Common data elements (CDEs) are pieces of data com-mon to multiple datasets across different studies. NIH encourages researchers to use CDEs, which helps improve accuracy, consistency, and interoperability among datasets within various areas of health and disease research. The NIH maintains a repository of NIH CDEs.
Data Storage Format
There are many storage formats for different types and sizes of datasets. For instance, small and simple datasets can be managed in a spreadsheet program. More complicated or larger datasets may need to be managed in a database. Remember that some types of data storage incur costs, which may be part of the project budget.
Data Security
Maintaining multiple copies of data can help protect against unforeseen events. Similarly, version control can help maintain the integrity of data.
Data Preservation and Sharing Timelines
Shared scientific data should be made accessible as soon as possible, and no later than the time of an associated publication, or the end of performance period, whichever comes first. Researchers are encouraged to consider relevant requirements and expectations (e.g., data repository policies, award record retention requirements, journal policies) as guidance for the minimum time frame that scientific data should be made available, which researchers may extend.
Methods for Sharing Scientific Data
Under the 2023 Data Management and Sharing (DMS) policy, NIH encourages investigators to use an established repository. When selecting a repository, investigators should choose based on factors such as the sensitivity of the data, the size and complexity of the dataset, and the volume of requests anticipated.
Sharing Data from Human Participants
For research involving human participants, NIH has specific requirements for research staff, and policies regarding research conduct, safety monitoring, and reporting of information about research progress. Applicants need to follow all applicable federal, Tribal, state, and local laws, regulations, statutes, guidance, and institutional policies that govern research involving human participants and the sharing and use of scientific data derived from human participants. The NIH also respects Tribal sovereignty, even in the absence of written Tribal laws or policies.
The DMS Policy is consistent with federal regulations for the protection of human research participants and other NIH expectations for the use and sharing of scientific data derived from human participants.
Data Management and Sharing Plans
Researchers planning to generate scientific data are required to submit a Plan to the funding NIH ICO as part of the Budget Justification section of the application for extramural awards, as part of the technical evaluation for contracts, as determined by the Intramural Research Program for Intramural Research Projects consistent with the objectives of this Policy, or prior to release of funds for other funding agreements. Plans should explain how scientific data generated by research projects will be managed and which of these scientific data and accompanying metadata will be shared. If Plan revisions are necessary (e.g., new scientific direction, a different data repository, or a timeline revision), Plans should be updated by researchers and reviewed by the NIH ICO during regular reporting intervals or sooner. Plans from NIH-funded or conducted research may be made publicly available and should not include proprietary or private information.[7]
Award recipients must comply with any applicable laws, regulations, statutes, guidance, or institutional policies related to research with human participants and that protect participants’ privacy. The DMS Policy encourages respect for participants by encouraging researchers and award recipients to:
• Address data management and sharing plans during the informed consent process to ensure prospective participants understand how their data will be managed and shared;
16
October 2022 Edition
• Outline steps they will take for protecting the privacy, rights, and confidentiality of prospective participants (i.e., through de-identification, Certificates of Confidentiality, and other protective measures);
• Assess limitations on subsequent use of data and communicate these limitations to the individuals or entities (e.g., repositories) preserving and sharing the data; and
• Consider whether access to shared scientific data derived from humans should be controlled, even if de-identified and lacking explicit limitations on subsequent use. Sharing via controlled access may be specified by certain funding opportunity announcements (FOAs) or the funding NIH Institutes or Centers.
NIH strongly encourages investigators to plan for how data management and sharing will be addressed in the informed consent process. Investigators should communicate with prospective participants about how their scientific data are expected to be used and shared. Investigators should also consider whether scientific data derived from humans, even if de-identified and lacking explicit limitations on subsequent use, should be controlled.
In addition, NIH expects that in drafting their DMS plans, researchers will attempt to maximize scientific data sharing, but may acknowledge that certain factors (i.e., ethical, legal, or technical) may necessitate limiting sharing to some extent. Foreseeable limitations should be described when drafting DMS plans. As outlined in NIH Guide Notice Supplemental Policy Information: Elements of an NIH Data Management and Sharing Plan, a compelling rationale for limiting scientific data sharing should be provided and will be assessed by NIH.
Examples of reasons that would generally not be justifiable factors limiting scientific data sharing include:
• Data are considered to be too small
• Data researchers anticipate will not be widely used
• Data are not thought to have a suitable repository
The NIH respects and recognizes Tribal sovereignty and American Indian and Alaska Native (AI/AN) communities’ data sharing concerns, and NIH has proposed additional considerations when working with Tribes and AI/AN communities.
Plan Assessment: The NIH ICO will assess the Plan, through the following processes:
• Extramural Awards: Plans will undergo programmatic assessment by NIH as determined by the proposed NIH ICO. NIH encourages potential awardees to work with NIH staff to address any potential questions regarding Plan development prior to submission.
• Contracts: Plans will be included as part of the technical evaluation performed by NIH staff.
• Intramural Research Projects: Plans will be assessed in a manner determined to be appropriate by the Intramural Research Program.
• Other funding agreements: Plans will be assessed in the context of other funding agreement mechanisms (e.g., Other Transactions).
Managing and Sharing Scientific Data
The NIH expects that in drafting Plans, researchers will maximize the appropriate sharing of scientific data, ac-knowledging certain factors (i.e., legal, ethical, or technical) that may affect the extent to which scientific data are preserved and shared. Any potential limitations on subsequent data use should be communicated to individuals or entities (e.g., data repository managers) that will preserve and share the scientific data. The NIH ICO will assess whether Plans appropriately consider and describe these factors.
Compliance and Enforcement
During the Funding or Support Period
During the funding period, compliance with the Plan will be determined by the NIH ICO. Compliance with the Plan, including any Plan updates, may be reviewed during regular reporting intervals (e.g., at the time of annual Research Performance Progress Reports (RPPRs)).
• Extramural Awards: The Plan will become a Term and Condition of the Notice of Award. Failure to comply with the Terms and Conditions may result in an en-forcement action, including additional special terms and conditions or termination of the award, and may affect future funding decisions.
• Contracts: The Plan will become a Term and Condi-tion of the Award, and compliance with and enforce-ment of the Plan will be consistent with the award and the Federal Acquisition Regulations, as applica-ble.
• Intramural Research Projects: Compliance with and enforcement of the Plan will be consistent with ap-plicable NIH policies established by the NIH Office of Intramural Research and the NIH ICO.
• Other funding agreements: Compliance with and enforcement of the Plan will be consistent with ap-plicable NIH policies.
Post Funding or Support Period
After the end of the funding period, non-compliance with the NIH ICO-approved Plan may be taken into ac-count by NIH for future funding decisions for the recipi-ent institution (e.g., as authorized in the NIH Grants Poli-cy Statement, Section 8.5, Special Award Conditions, and Remedies for Noncompliance (Special Award Conditions and Enforcement Actions)).
Repositories for Sharing Scientific Data
In general, NIH does not endorse or require sharing data in any particular repository, although some initia-tives and funding opportunities will have individual re-quirements. Overall, NIH encourages researchers to select the repository that is most appropriate for their data type and discipline. See Selecting a Data Reposito-ry.
Browse through this listing of NIH-supported reposito-ries to learn more about some places to share scientific data. Note that this list is not exhaustive. Select the link provided in the “Data Submission Policy” column to find data submission instructions for each repository.
NIH-supported Scientific Data Repositories*
If you have any questions, Frequently Asked Questions may help.
Policy Effective Date
The effective date for the DMS Policy is January 25, 2023. Specifically, the policy applies to:
• Competing grant applications that are submitted to NIH for January 25, 2023 and subsequent receipt dates.
• Proposals for contracts that are submitted to NIH on or after January 25, 2023.
• NIH Intramural Research Projects conducted on or after January 25, 2023.
• Other funding agreements (e.g., Other Transactions) that are executed on or after January 25, 2023, un-less otherwise stipulated by NIH.
The NIH 2023 Data Management and Sharing Policy (Replaces the 2003 NIH Data Sharing Policy)