Edited by: Maaike M. H. Van Swieten, Netherlands Comprehensive Cancer Organisation (IKNL), Netherlands
Reviewed by: Leonardo Candela, National Research Council (CNR), Italy
Alexandre Rosa Franco, Nathan Kline Institute for Psychiatric Research, United States
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Neuroscience has made significant strides over the past decade in moving from a largely closed science characterized by anemic data sharing, to a largely open science where the amount of publicly available neuroscience data has increased dramatically. While this increase is driven in significant part by large prospective data sharing studies, we are starting to see increased sharing in the long tail of neuroscience data, driven no doubt by journal requirements and funder mandates. Concomitant with this shift to open is the increasing support of the FAIR data principles by neuroscience practices and infrastructure. FAIR is particularly critical for neuroscience with its multiplicity of data types, scales and model systems and the infrastructure that serves them. As envisioned from the early days of neuroinformatics, neuroscience is currently served by a globally distributed ecosystem of neuroscience-centric data repositories, largely specialized around data types. To make neuroscience data findable, accessible, interoperable, and reusable requires the coordination across different stakeholders, including the researchers who produce the data, data repositories who make it available, the aggregators and indexers who field search engines across the data, and community organizations who help to coordinate efforts and develop the community standards critical to FAIR. The International Neuroinformatics Coordinating Facility has led efforts to move neuroscience toward FAIR, fielding several resources to help researchers and repositories achieve FAIR. In this perspective, I provide an overview of the components and practices required to achieve FAIR in neuroscience and provide thoughts on the past, present and future of FAIR infrastructure for neuroscience, from the laboratory to the search engine.
香京julia种子在线播放
The transformation of neuroscience from a closed to an open science, where the entirety of research products like data and code produced during a study are routinely made available, has accelerated in recent years. Data sharing requires that the necessary human and technical infrastructure be in place to make these data broadly available. The first Human Brain Project, funded by the US National Institute of Mental Health in the 1990s, launched some of the first efforts to “database the brain,” envisioning a “paradigm shift in which primary data are openly shared with the worldwide neuroscience community” (
State of population of selected data repositories 2014 vs. 2023.
Resource name | Country / region | Type of data | Date started | Data elements 2014 | Update to resource (Feb 2023) | Data elements 2023 | Datasets added since 2014 | Provenance |
---|---|---|---|---|---|---|---|---|
NDAR | USA | Demographics, imaging, genetic, phenotypic | 2009 (oldest news archives) | >108,000 subjects (from 157 labs) | Now NDA; no longer restricted to autism | – | – | Not comparable as new data types were added |
NeuroMor pho.Org | USA | digitally reconstructed neurons | 2006 | 11,335 (reconstructio ns from 1,339 publications) | Still in existence under same stewardship | 298,387 reconstructions |
287,052 reconstructions |
|
Cell Centered Database/ CIL-Cell Image Library | USA | images, videos, and animations of cell | 2002 |
10,360 image datasets | Still in existence under same stewardship | 13,990 | 3,630 | |
FigShare | International | Various | – | > 8,000 |
Still in existence under same stewardship | 182,542 | 174,542 | query: neuroscience with dataset filter |
ModelDB | USA | computational neuroscience models | 1996 | 875 available datasets | Same stewardship; transition of leadership | 1787 | 912 | |
Open Source Brain | United Kingdom | Models | 2014 | 47 available datasets | Still in existence under same stewardship | 99 | 52 | |
CRCNS | USA | computational neuroscience | 2008 | 38 available datasets | Under same stewardship; not clear if still active | 140 | 102 | documented through NIF; Feb 2023 |
XNAT Central | USA | Neuroimaging | 2010 | 34 available datasets | Will be decommissioned in Oct 2023 | 510 | 300 | |
1,000 Functional Connecto mes Project/IN DI | International (USA, China, Germany, Spain) | fMRI, DTI, MPRAGE, psychological assessements, behavioral phenotype, demographic | 2009 | 28 datasets | Under same stewardship; also 1,000 Functional Connectomes INDI | 33 | 5 | |
OpenfMRI | USA | fMRI | 2012 | 24 datasets | Under same stewardship; changed name to Open Neuro | 805 | 781 | |
BIRN | USA | Imaging, histology | – | 21 datasets | No longer in service | – | ||
LONI Image Data Archive | USA | Imaging | – | 18 (atlas), 9 databases | Under same stewardship; changed location; hard to compare as atlases and databases are not provided | 144 | 135 | |
BrainLiner | Japan | ECoG, EEG, fMRI, MEG, Microelect rode, NIRS, Optical Imaging, PET, Other | 2011 | 10 available datasets | Platform there but does not look like it has been updated recently | 23 | 13 | |
Open Connecto me Project | USA | Serial electron Microscopy | 2011 | 9 available datasets | Now NeuroData | 24 | 15 | |
CARMEN | United Kingdom | neurophysiology | 2006 | – | No longer in service according to NIF | – | – | |
FITBIR | USA | Common data elements | 2011 | – | Same stewardship | – | – | |
INCF Dataspace | International | Various | 2012 | – | No longer in service | – | – | |
UCSF DataShare | USA | biomedical including neuroimaging, MRI, cognitive impairment, dementia, aging | 2011 | 18 datasets | No longer in service | – | – |
Update of Supplementary Table 1 from
Neuroscience started to put its first big stake in the ground for open data sharing with the commissioning of large prospective data sharing efforts where large, comprehensive data sets were collected by large teams of scientists with the goal of making them openly available. Some of early efforts include the Alzheimer’s Disease Neuroimaging Initiative (ADNI;
An updated analysis of the repositories listed in
Effective data sharing starts with the FAIR data principles (
FAIR states the minimum set of requirements for digital data for it to be useful: data should be findable, accessible, interoperable, and reusable. FAIR then lays out a set of practices that would make it more likely that data will meet these requirements. The FAIR data principles were formulated in a workshop in Leiden in 2014 (
The FAIR acronym itself is now likely better known among practicing neuroscientists, as funders and journals have started to support FAIR in their data sharing policies; but the details of FAIR as elaborated in the detailed recommendations are fairly arcane. Anyone outside the field of informatics is likely to look at these and scratch their head. Persistent identifiers? Knowledge representation languages? A plurality of relevant attributes? Thus, while the practicing neuroscientist may understand what FAIR stands for, they are often at a loss to explain exactly how to achieve it. In reality, no one can create fully FAIR data alone; it requires the interplay of data acquisition and documentation practices, infrastructure, informatics, and community consensus. FAIR is therefore best thought of as a partnership between investigators, data repositories, data aggregators and community organizations (
Major stakeholders involved in defining and implementing FAIR. Some of the major requirements for achieving FAIR are listed under each stakeholder group. The INCF is given as an example of a community organization supporting FAIR for neuroscience.
In the US National Academies of Science, Engineering and Medicine workshop on “Changing the Culture on Data Management and Sharing (
Examples of lab management practices built on the FAIR principles are given in
Some FAIR laboratory data management practices.
FAIR goal | Principle | FAIR practices | Reference |
---|---|---|---|
Findable | Unique identifiers | 1. Create identifiers that are globally unique within the lab for all key entities in the lab, e.g., subjects, experiments, reagents, via the creation of a central registry or use of an existing system, e.g., RRIDs for reagents and tools. Globally unique = no two objects have the same ID, no ID may be reused. | |
Rich metadata | Each identifier in the registry is accompanied by rich metadata that provides key details, e.g., for experiments: dates, experimenter, description, collaborators, techniques etc.; for subjects: species/strain, age, weight, etc. | ||
Use unique identifier for file names, folder names, to label physical objects like slides or slide boxes, so that all entities associated with the lab can be tied unambiguously to metadata | |||
Accessible | Authentication and authorization | Create a centralized, accessible store for data and code under a lab-wide account for lab data to ensure that files are not scattered around multiple systems or accessible only via personal accounts that may not be available after someone has left the lab. | |
Interoperable | FAIR vocabularies | Move away from idiosyncratic naming of variables and annotations towards standards like Common Data Elements and the use of community-based ontologies, atlases, and controlled vocabularies. Consistent lab, wide terminology ensures that lab members can understand what the data are about, and aids in search across and combining files. | |
Consider creating a lab-wide data dictionary where all variables used across experiments are clearly defined | |||
Reusable | Documentation | Create a “Read me” file for each dataset where notes can be captured and helpful information provided for reuse of the data | |
Community Standards | All files should be collected and stored in well supported open formats ideally to ensure long term availability. | ||
Adopt community standards within the lab where possible; a good place to identify relevant standards is to look at repositories where the data may end up. Specialized repositories usually have a list of required or recommended standards. Some repositories are providing help with developing a data management and sharing plan for grant proposals, e.g., |
|||
Provenance | Datasets should be clearly versioned and differences between them documented. Depending on the system used for storing data, formal support for versioning may be available, e.g., Google Docs, but if not, implement a file naming convention so that versions can be tracked | ||
Always keep a version of record that can be reverted to if necessary. Often when one is working with data, different versions are created rapidly and it is easy to lose track of which version is which. It is good practice to have stable versions that are easily retrievable so that there are stable points to which to return if provenance is lost. | |||
Datasets should also be accompanied by detailed experimental protocols that describe how the data were acquired and computational workflows that detail the processing steps. Use of tools designed for this purpose, e.g., |
|||
Licenses | Prepare to share: Make sure that how and when the data are to be shared is agreed upon with all collaborators early on. For clinical datasets, make sure that the consents are in place for open sharing of de-identified data. |
Examples of laboratory data management practices based on the FAIR principles.
We are starting to see neuroscience researchers sharing their experiences with developing and utilizing lab-centric data management systems. They range from tightly integrated digital infrastructures (
One of the most important steps for a researcher in ensuring that their data is FAIR for the long term is to submit their data to a trustworthy repository that supports FAIR. The new NIH data sharing policy requires researchers to indicate where they will be sharing their data as part of the data management and sharing plan. As recommended in
Understanding how the neuroscience repository landscape is organized may help in finding the right repository. Repositories are generally specialized by data type (
The number of neuroscience specialist repositories supporting different data types. The repository list and associated data types was assembled using information available through the INCF Infrastructure Portfolio and the SciCrunch Registry. The data underlying the figure is available at Zenodo, DOI: 10.5281/zenodo.8239845.
Supplementing the specialist repository landscape are the generalist repositories, data repositories that span scientific disciplines and data types (
While the investigator takes the central role in acquiring data in a manner that supports FAIR, the community repository is arguably the central player in implementing the basic requirements for achieving FAIR for the long term (
From the earliest days of neuroinformatics, it was envisioned that neuroscience would likely best be served by a decentralized system of federated databases (
When the first generation of neuroscience databases were started, there were few standard practices for designing web-accessible databases. As documented by NIF, each database had a different mode of access, different data structure, and the use of standards was very limited. It was a time of tremendous technological fluidity, with standard features we take for granted today (e.g., RESTful web APIs) still being invented. The cloud did not exist, and attempts to build resources on the early version of a cloud-like system (“the grid”) met with considerable challenges (
INCF has served as an important conduit by which the FAIR principles have permeated the construction of neuroscience data repositories and gateways. Investigators who have been active in INCF through governance, committees and working groups are involved with several of the next generation neuroscience infrastructures including EBrains, CONP, SPARC, DANDI, Open Neuro, and BRAIN/Minds.
FAIR practices across data repositories.
Principle | Function | EBRAINS | SPARC | DANDI | CONP Portal | OpenNeuro | |
---|---|---|---|---|---|---|---|
F1. Globally unique identifier | Basic core | DOI | DOI | DOI | ARK, DOI | DOI | |
F2. Rich metadata | Y | DataCite | Y | DATS | Y | ||
A1. Retrievable by identifier | Y | Y | Y | Y | Y | ||
A1.1 Free, open, universal retrieval protocol | Enhanced access | Y | Y | Y | Y | Y | |
F4. Registered in a searchable resource | KS, GDS | KS, GDS | KS, GDS | KS | KS, GDS | ||
A1.2: Authentication and authorization | Y | Y | Y | Y | Y | ||
R1.1: Clear data usage license | Y | CC-BY | CC-BY, CC0 | Y | CC0 | ||
R1.3: Community standards | Use of standards | Multiple | SDS, MIS | NWB, BIDS | Y* | BIDS | |
F3: Metadata contains identifier | Y | Y | Y | Y | Y | ||
I1: Formal knowledge representation language | Y | Y | N | Y | |||
R1: Plurality of relevant attributes | Rich(er) metadata | OpenMinds | OpenMinds, MIS | NWB | DATS | Y | |
I2: FAIR vocabularies | Y | Y | Y | Y | N | ||
I3: Qualified references to other metadata | Y | Y | Y | Y | Y | ||
R1.2: Provenance | Provenance and context | Exp Protocol | Y | N | |||
A2: Metadata persistence | Y | Y | |||||
Landing page | Additional features | Y | Y | Y | Y | Y | |
CCFs | Y | Y* | N | N | N | ||
Data citation | Y | Y | Y | Y | Y | ||
Curation | Y | Y | N | Y | N |
Comparison of FAIR features across five large brain repositories where the principal investigators have been active through the INCF. The principles are organized according to the functions they support based on an organization proposed by
A significant and positive change that is accelerating progress toward FAIR is the emergence of a set of robust standards for neuroscience data types that are starting to gain adoption. The INCF was created to help with this process of standardization and produced some early successes, e.g., the Waxholm space for registration of mouse and rat brain data (
As neuroscience standards become more mature, better supported, and more widely used, they provide the seeds for knitting the landscape of neuroscience data repositories into a true data ecosystem, where (meta)data can flow from the laboratory to repositories and from repositories to computational tools and back again.
Ecosystem of neuroscience resources emerging around standards. Network graph of neuroscience data repositories and gateways (purple) and some of the standards they support (yellow). The graph shows repositories/gateways connected via the use of a common standard. A description of how standards were determined is given in the text.
As tool support grows, standards are also making their way into the laboratory. BIDS, for example, has been estimated to have been used to organize over 100,000 datasets containing millions of images, indicating significant uptake by the research community (
Interoperability across neuroscience data has always been hampered by the multiplicity of nomenclatures and parcellation schemes from brain regions and nerve cells (
Standardized nomenclature for cellular taxonomies and transcriptionally defined cell types are also emerging from projects like the BICCN/BICAN to help deal with the plethora of new cell types that are emerging from new transcriptomics-based approaches (
Services for accessing ontologies and building them into annotation and metadata pipelines have improved significantly over the past decade, with tools such as BioPortal
As most neuroscience infrastructure is researcher-led and grant-supported, questions often arise about long-term sustainability when choosing a repository, or indeed, any infrastructure. Sustainability of individual resources remains a challenge, not just for neuroscience but for all research-led infrastructures that rely on grant funding for their operation. Of the data repositories listed in
As neuroscience data and repositories start to align around the FAIR principles, the ecosystem should become more robust as it will make it easier for other repositories to absorb data if a repository loses its funding. Merging of similar resources also makes the ecosystem more efficient. The ‘professionalization” of scientific data repositories also means that researchers are taking their role as an archive more seriously. The INCF recommendations for neuroscience infrastructure include that repositories should have an exit plan and they should clearly state their persistence policy (
In tandem with the vision of a distributed system of databases laid out by the NIH HBP was the creation of a neuroscience portal where data could be accessed via a
The more that repositories enforce consistent standards for metadata and data formats, the closer neuroscience gets toward achieving true federated search and retrieval across the entirety of the neuroscience repository ecosystem (
New tools are also becoming available that lower the barrier to making content available to search engines. For example, multiple neuroscience databases have marked up their content with
The FAIR data principles delegate a good amount of responsibility to individual communities to define what is FAIR for their domain. Community organizations play an important role as coordinators by serving as conveners to allow researchers to come to consensus about best practices and recommendations for their community. International neuroscience is currently supported by two community organizations, the INCF and the IBI. IBI is principally focused on coordination of the large international brain projects, focusing on data sharing among these projects, as well as issues such as data governance and ethics. INCF works across all neuroscience efforts, whether individual or team based, and focuses on standards, infrastructure coordination and training. Both organizations provide support for working groups that come together to tackle issues such as the development of international data governance (IBI), standards and best practices (INCF, IBI), training (INCF), and coordination of infrastructures (INCF, IBI). Any member of INCF can propose a working group and membership is open to the community, while IBI working groups are set by the Strategy Committee. The two organizations work together and with other organizations such as the IEEE Neuro Standards working group and the Global Brain Consortium.
Neuroscience has made tremendous progress over the first two decades of the 21st century in establishing the infrastructure, standards, expertise and tools for moving neuroscience significantly toward FAIR. It is now served by a set of robust international data repositories and scientific gateways specialized for neuroscience data, implementing the vision laid out in the dawn of neuroinformatics for a distributed ecosystem of repositories. The first inroads have been made in establishing FAIR practices and supporting infrastructure in the lab to manage data in a way that smooths the transition between private, semi-private, and public sharing. As best practices for FAIR are articulated, tested, and shared, we can expect that the quality of both the databases and the data will continue to improve.
A federated system allows neuroscience infrastructure to respond more rapidly to new data types and technologies as they are developed. While there are more resources to be sustained, there are also more resources from which to draw should a repository need to be decommissioned. We see from the last 20 years that there is movement in the repository landscape, with some resources ceasing operations, but others merging or changing ownership. As repositories start to align around sets of core features, both interoperability and flexibility will be increased, providing some measure of stability in an otherwise dynamic ecosystem.
While the distributed nature of neuroscience infrastructure brings many benefits, there are concomitant challenges it imposes on both those who submit their data and those that wish to use it. As the motivations and incentives for these two user groups can differ (
At the same time, competition among different data providers also can lead to a decrease in data interoperability, as repositories must compete for users. Thus, many repositories lower their requirements for standards compliance (
Finally, usability is not simply a matter of technology or documentation. As
The good news is that routine data sharing, if not exactly easy, is now at least possible across the sizes and complexities of neuroscience data. Islands of interoperation are starting to emerge among these different resources promoting federated search and shared computational platforms and services. Those of us who were involved from the beginning in attempts to “database the brain” cannot help but be impressed with how far neuroscience sharing and infrastructure has come, even as there is still quite a way to go. As the paradigm continues to shift toward open and effective data sharing in neuroscience, we will fulfill the early vision of neuroinformatics as a driver for
MM: Writing – original draft, Writing – review & editing.
The author(s) declare financial support was received for the research, authorship, and/or publication of this article. MM is supported by grants from NIH Office of the Director OT2OD030541 for the SPARC Knowledge Management and Curation Core and the US BRAIN Initiative grant U24MH130919.
I would like to thank my colleagues Anita Bandrowski and Mathew Abrams for their helpful comments.
MM is a founder and board member of SciCrunch Inc., which develops tools and services around rigor and reproducibility.
The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
1
2
3
4
5
6
3D-MMS | Metadata for 3D microscopy standard |
ADNI | Alzheimer’s Disease Neuroimaging Initiative |
BICAN | BRAIN Initiative Cell Atlas Network |
BICCN | BRAIN Initiative Cell Census Network |
BIDS | Brain Imaging Data Structure |
BIL | Brain Image Library |
BRAIN Initiative | Brain Research through Advancing Innovative Neurotechnologies |
CDE | Common data element |
CONP | Canadian Open Neuroscience Platform |
CT | Computed tomography |
DANDI | Distributed Archives for Neurophysiology Data Integration |
DATs | Data tag suite |
DBS | Deep brain stimulation |
DOI | Digital Object Identifier |
ECOG | Electrocorticography |
EEG | Electron encephalography |
EMG | Electromyography |
ERP | Event-related potential |
fMRI | Functional magnetic resonance imaging |
FORCE11 | Future of Research Communications and e-Scholarship |
HBP | Human Brain Project |
HED | Hierarchical event descriptor |
IBI | International Brain Initiative |
iEEG | Intracranial electroencephalography |
INCF | International Neuroinformatics Coordinating Facility |
MEG | Magnetoencephalography |
MIS | SPARC minimal information standard |
MRI | Magnetic resonance imaging |
NEMAR | NeuroElectroMagnetic data Archive |
NIF | Neuroscience Information Framework |
NIH | National Institutes of Health |
NWB | Neurodata Without Borders |
ODC-SCI | Open Data Commons for Spinal Cord Injury |
ODC-TBI | Open Data Commons for Traumatic Brain Injury |
PET | Positron emission tomography |
SDS | SPARC dataset structure |
SPARC | Stimulating Peripheral Activity to Relieve Conditions |
SPECT | Single-photon emission computed tomography |
URI | Uniform Resource Identifier |