Neuroimaging laboratories are faced with a number of data management challenges. Within the laboratory, data must be passed through a series of capture, quality control, processing and utilization steps. Maintaining the long-term usability and integrity of data requires investigators to maintain vigilant oversight of the data through each of these steps. In the broader scope, subsets of these data often need to be shared with specific collaborating colleagues, and limited or anonymized versions of the data must be shared with the general community according to the data sharing and privacy policies instituted by funding agencies and publishers (e.g., NIH data sharing statement, HIPAA privacy standards). Various end users – laboratory personnel, collaborators, and the general community – require appropriate tools to explore the data. These tools should make the data highly available to intended users while enforcing security protocols that restrict unintended access. These issues are particularly challenging in the context of neuroimaging because imaging data sets are large, require complex processing pipelines, and follow complicated experimental protocols. Additionally, neuroimaging studies routinely incorporate measures from a range of other experimental approaches -- genetic, clinical, neuropsychological – which must be integrated with the imaging measures into a unified data set. The Extensible Neuroimaging Archive Toolkit (XNAT) was designed to address the data management challenges that neuroimaging laboratories face. In particular, XNAT was designed to capture data from multiple sources, to maintain the data in a secure repository, and to distribute the data to approved users. User interaction with XNAT mirrors laboratory best practices for maintaining the quality and integrity of data. The XNAT user interface provides users with tools to manage the data from entry and storage through processing, access, and distribution. Data entry and upload forms, as well as scriptable command-line programs, allow data to be easily captured into the archive. Quality control tools enable users to inspect, validate, and process the data. Search, display, and download tools facilitate exploration of the archive. By maintaining direct control of the data through all of these actions, XNAT protects the overall integrity of the data and minimizes accidental or intentional misuse of the data. Backup processes can be scheduled and security protocols can be enforced. User interaction with the data is logged. Modifications to data are recorded and validated. Common processing routines can be automated and resulting derived measures can be programmatically captured back into the archive. The resulting archive is a clean and rich resource for use within the laboratory and for sharing with the broader community.
XNAT is the result of an ongoing project to support the data management requirements of the Neuroimaging Core of Washington University’s Alzheimer’s Disease Research Center (ADRC). XNAT has now been deployed in the ADRC for several years. Recognizing that many laboratories face similar challenges, XNAT has been made a free and open source project. Laboratories across the United States, including at Washington University, Harvard Univeristy, Massachusetts General Hospital, and the National Institutes of Health, have adopted XNAT as their data management platform, and the Biomedical Informatics Research Network (BIRN) has adopted XNAT as a standard component in its informatics repertoire.
The XNAT framework relies heavily on XML and XML schema for its data representation, security system, and generation of user interface content. XML provides a powerful tool for building extensible data models. This extensibility is particularly important in rapidly advancing fields like neuroimaging, where the managed data types are likely to change and evolve quickly. XML Schema has become the standard language for defining open XML data formats. As a result, many biomedical organizations have developed or are currently developing standards in XML. By focusing on XML Schema, XNAT allows laboratories to leverage these industry-standard schemas, to extend XNAT’s included neuroimaging schema, or to build their own schemas from scratch. XNAT archives using common schemas can easily interoperate with one another and with other systems that support these schemas. Using XML transformation languages like XSLT, data can also be easily imported and exported in additional formats like HTML, PDF, spreadsheets, and alternative XML models.
XNAT’s architecture follows a widely used three-tier design pattern that includes a relational database backend, Java-based middleware, and a web-based user interface . XML schema documents -- including the schema provided with XNAT and/or schemas supplied by the site standing up an XNAT instance -- define the data types that are to be handled by the deployed system. XNAT uses these schemas to generate custom components and content for each of the tiers: a relational database is generated that contains tables, columns, and relations equivalent to the structures defined in the XML schemas; middleware classes are generated that can be used by developers to implement custom functionality that utilizes the XNAT database; and user interface content is generated, including navigation menus, search options, and data tables. The resulting system represents a fully operational data management system tailored specifically to the site’s data model. During operation, a core engine within the middleware -- the extensible formatting tool (XFT) -- mediates incoming and outgoing data requests, translating as necessary between the relational database, XML, and web-based formats.
XNAT uses a hybrid storage architecture that leverages the strengths of XML, relational databases, and standard file systems. The file system is used to store the actual image files in their native format (e.g. DICOM). A relational database stores text and numeric data (e.g. metadata, derived values, related experimental measures). XML documents serve as a bridge between the database and file system. XNAT is able to automatically store these documents in its database because the database structure is built directly from the schemas that describe the documents. XML documents representing measures of brain region volumes, for instance, could be output from a processing application (or converted from a non-XML format) and then automatically imported into the XNAT database.