Back to U19 Main Page
- U19 – brief description overall project goals, competing circuit theories being developed/studied/integrated:
Animals constantly make decisions, such as how to evaluate a potential threat or where to look for food. Yet the same environment can elicit different decisions from the same animal on different occasions, because the animal’s internal state interacts powerfully with external inputs to determine behavior. Here we aim to understand how internal states influence decisions and to identify the underlying neural mechanisms. We will study three types of internal state changes: those arising spontaneously with engagement and disengagement in a task, those resulting from changing expectations during the task, and those resulting from learning within and across experimental sessions. Our central hypothesis is that changes in internal state correspond to changes in information flow between brain regions. To test this hypothesis, we will use novel cutting-edge methods on a brainwide scale: statistical tools to infer internal states from behavior; simultaneous recordings from large populations of neurons across many regions during behavior and during optogenetic perturbations; genetic barcoding to map functionally and molecularly defined cell-type-specific, cross-region connectivity; and new deep learning approaches and circuit modeling of how cross-region neural communication depends on internal states.
- Description of how Data Science Core is complying to the criteria of the FAIR principles:
All work on this project will leverage the existing open-source data infrastructure created for the IBL. The code library is publicly accessible on Github: https://docs.internationalbrainlab.org
Our data architecture is based on two components. A central relational database (built on PostgreSQL with Django and hosted by Amazon Web Services) stores extensive metadata on all mice (e.g., genotype, lineage, age, surgical and training history, weight, water administration), experiments (e.g., experimenter, time of day, temperature), and data files (linking the file location to the experiment metadata). Binary data files recorded by the experimental apparatus are stored on a bulk data server (currently 250 TB, hosted at the Flatiron Institute in New York, soon to be moving to a 7 PB server at the San Diego Supercomputer Center; Fig. 1). The bulk data are generally stored as flat binary (.npy) files. However, the backend format is irrelevant to users, as access is provided via an API, which automatically downloads and delivers binary arrays of requested data directly to analysis software (Python or MATLAB), caching on the user’s local computer to avoid repeated downloads. The files follow a standard naming convention that allows users to easily understand the relationships among data.
A key challenge is to ensure that metadata are comprehensive and accurate. While metadata about the experiments themselves can be collected by the recording hardware, metadata about the experimental subjects must be entered by laboratory members. Making this happen reliably is primarily a problem of social engineering rather than software, and we have solved it by creating a user-friendly, web-based client that connects from all IBL laboratories to our central database. This system functions as an electronic colony management and laboratory notebook system, storing details on all laboratories’ mice, such as age and surgical history. These metadata are critical because mouse behavior is affected by a large number of factors. The system allows metadata to be entered at the time of data collection, for example, recording each subject’s weight before every experiment, which ensures that data are entered more reliably than if they were transcribed from paper laboratory notebooks. The system also performs other functions, such as generating email notifications telling the experimenter how much supplementary water is needed after training. Other types of metadata (e.g., genotype results) are entered as soon as they are generated. The system, known as Alyx, is fully operational, is available as an open-source package and has already been adopted by several laboratories outside the IBL.
- List of Data Types in U19:
- Behavioral data: task performance metrics, video recordings, tracks of paw positions, pupil location and diameter, etc.
- Neural data: Neuropixels 2.0 recordings, Functional ultrasound imaging data, 2P imaging, widefield calcium imaging, BARseq2 datasets
- Common Data Elements in U19:
- All data will be stored and shared using the common IBL infrastructure (which will be extended as necessary for new data types)
- All analysis will be standardized and coordinated among groups. Existing standard IBL data pipelines will be used where they exist.
- Data Sharing goals in U19:
Accessing the data. To allow individual scientists to work in different ways, IBL provides (or plans to provide) users with three protocols for searching, downloading, and sharing data: Open Neurophysiology Environment (ONE), DataJoint, and Neurodata Without Borders.
ONE is a simple protocol developed by IBL2, with a workflow that closely matches the way many neuroscientists currently operate. Nearly all our scientists access local data files via one of two languages, MATLAB or Python, which they run on desktop workstations. ONE is a lightweight interface that provides users with four functions allowing them to search for experiments of interest and load required data from these experiments directly into MATLAB or Python (Fig. 1, gray square). The user need not worry about underlying file formats or network connections, and data are cached on their local machine to avoid repeated downloads. The ONE system is well established and is used daily to access our data.
ONE is an open standard that can be adopted by anyone in the community, as all the information needed to understand and implement ONE is publicly available. Furthermore, it has been designed with a view to extensibility: users may define new dataset types, using a simple grammar to define names and relationships between dataset types in a standardized way. As a result, users can add non-standard data sets using their own namespace; for example, ONE has been used to share large-scale electrophysiology recordings collected in a different task via figshare file upload. The ONE protocol can run with multiple backends, transparently interoperable to the user. For the main IBL data (including this proposal), our backend system uses our central SQL database. However, we have also provided ONE light, a backend that allows scientists to share data using ONE by simply uploading files to a web server or to figshare.
The second access protocol used by the IBL is DataJoint, a protocol that is increasingly used in neurophysiology. This workflow management system integrates a relational database with automatic processing of analyses. It provides a Python or MATLAB interface that directly operates on the database and allows intuitive and flexible queries. In IBL, DataJoint stores both experimental data and the results of analyses in a single relational database. A key capability of DataJoint is that it allows standard analyses to be run automatically on new data as they come in, thereby saving time and effort for researchers. For example, DataJoint creates daily summaries of animal behavior for all users in the collaboration to access via a browser (Fig. 2). In collaboration with the team of DataJoint Neuro (the developers of this system), we have established a DataJoint system hosting IBL data on an Amazon cloud server. Automatic analysis pipelines process our behavioral data immediately after collection, with the results viewable on a web site. This web system is used by team members to monitor progress during mouse training or to compare the performance of many animals, and also allows newcomers to the group to quickly get a sense of the behavioral data without needing to write their own analysis routines. A subset of these behavioral data are already available on the public IBL data portal, which is currently being extended to show the results of electrophysiological experiments.
The final access protocol is NeuroData Without Borders (NWB). NWB aims to be a unified data standard that is suitable for diverse neurophysiological and behavioral data. NWB is beginning to be widely adopted. For example, it has become the standard for the Allen Institute for Brain Sciences. Version 2.0 of the NWB standard is based around a suite of data-access functions that aims to allow users seamless access to neurophysiology data from multiple providers. The DataJoint Neuro team is currently working to provide NWB-compatible access to all DataJoint databases, which will allow our data to be accessed via the NWB protocol; we will extend this access to the data types generated by this U19 project.
Public data access. The IBL Data Sharing Policy states that all data will be shared publicly within a year of collection, or upon acceptance for publication of an associated manuscript, whichever comes first. The data collected during this project will be shared under the same policy. In addition to the host servers described above, once the data are generated, we plan to apply to existing hosting services on public clouds (e.g., AWS Open Data Sponsorship Program) to further encourage open use and to facilitate large-scale, cloud-based analyses of our data.
Living will. Online storage of bulk data comes with annual costs, which must be met regardless of the state of the collaboration. We anticipate continued funding for IBL through our current sponsors, the Wellcome Trust and the Simons Foundation, as well as other sources like the current U19 award. However, we must also be prepared for the unlikely event that funding for continued online storage will become unavailable at some point in the future. To this end, we have drawn up a “living will”: a plan of action that we will execute in case continued online storage of bulk data is no longer feasible. This plan involves physical storage of bulk data (raw electrophysiology and video) on tape in member laboratories, widespread distribution of smaller preprocessed datasets (spike trains, behavioral motion tracking), on physical media (e.g., hard drives), and uploads to free storage services such as figshare, Google Drive, and member universities’ websites.
- Data science tools being used in project:
- Data science approaches to be shared with other U19’s:
- All newly developed tools will be publicly available
- Data science challenges that could benefit from discussion with other U19’s:
We look forward to discussing common challenges in spike sorting, behavioral video analysis, calcium imaging analysis, and analyses of neural data spanning multiple sessions in multiple animals.
Brain PI Meeting materials
Link to Poster:
Back to U19 Main Page