When developing a pipeline, it can be easy to build in assumptions about the environment and infrastructure in which it will run; these assumptions make it difficult for others to use or modify the pipeline. One goal of the Human Cell Atlas (HCA) is to create data processing pipelines that can be run by many investigators, not just within the Data Coordination Platform (DCP); we refer to this goal as data processing pipeline portability.
The HCA aims to break down barriers in data use, offering immediate and open access to data. But we also want everyone to be able to use the data. To truly break the walls that silo many data repositories we have to go beyond data access and enable data interoperability between repositories. If pipelines are portable, data within and outside the HCA project can be processed using the same methodology and used together to make scientific observations. This data interoperability enhances the impact on scientific discovery. Given the importance of portability, we have developed a service that does the following:
We use this Portability Service as part of our testing during pipeline development to ensure that pipelines developed for the HCA can be executed successfully in many environments.
The HCA is a driver project of the Global Alliance for Genomics and Health (GA4GH), and the Portability Service helps demonstrate GA4GH APIs for executing workflows in different infrastructures. In conjunction with the GA4GH and their other driver projects we are working to develop policies, standards, and tools for genomic and health-related data sharing. We welcome community involvement and feedback; contact GA4GH or the HCA DCP for more information.
The Portability Service enables you to:
HCA pipelines are automatically submitted to this service as a part of our development process. If you are interested in submitting a pipeline to the Portability Service, please contact us at firstname.lastname@example.org.