Downloading pipelines for offline use
Sometimes you may need to run an nf-core pipeline on a server or HPC system that has no internet connection. In this case you will need to fetch the pipeline files first, then manually transfer them to your system.
To make this process easier and ensure accurate retrieval of correctly versioned code and software containers, we have written a download helper tool.
The nf-core pipelines download
command will download both the pipeline code and the institutional nf-core/configs files. It can also optionally download any singularity image files that are required.
If run without any arguments, the download tool will interactively prompt you for the required information. Each option has a flag, if all are supplied then it will run without any user input needed.
Once downloaded, you will see something like the following file structure for the downloaded pipeline:
You can run the pipeline by simply providing the directory path for the workflow
folder to your nextflow run
command:
nextflow run /path/to/download/nf-core-rnaseq-dev/workflow/ --input mydata.csv --outdir results # usual parameters here
If you downloaded Singularity container images, you will need to use -profile singularity
or have it enabled in your config file.
Downloaded nf-core configs
The pipeline files are automatically updated (params.custom_config_base
is set to ../configs
), so that the local copy of institutional configs are available when running the pipeline.
So using -profile <NAME>
should work if available within nf-core/configs.
This option is not available when downloading a pipeline for use with Seqera Platform because the application manages all configurations separately.
Downloading Container Images
If you are using a container system for running you pipeline, the nf-core pipelines download
command can also fetch the required container images for you.
Currently supported container systems are Singularity (Apptainer), Docker and Podman.
To do this, select <singularity/docker>
in the prompt or specify --container-system <singularity/docker>
in the command.
Your archive / target output directory will then also include a separate folder <singularity/docker>-containers
.
For Podman images, use the docker
option and see details in the Docker section below.
If the download speeds are much slower than your internet connection is capable of, you can set the number of simultaneous downloads with the --parallel-downloads
parameter.
How Pipeline Containers are Found
The download command internally uses nextflow inspect
to find container images.
This Nextflow subcommand parses the pipeline code, both configs and scripts, and figures out what container is used in each module.
To specify what container system to fetch containers for and to include containers required for pipeline tests, the nextflow inspect
command is run with the flag -profile <singularity/docker>,test,test_full
.
The command then produces a JSON file containing URIs for each container of interest in the pipeline.
Singularity Images
If you select to download Singularity images, the downloaded workflow files are again edited to add the following line to the end of the pipeline’s nextflow.config
file:
singularity.cacheDir = "${projectDir}/../singularity-images/"
This tells Nextflow to use the singularity-containers
directory relative to the workflow for the singularity image cache directory.
All images should be downloaded there, so Nextflow will use them instead of trying to pull from the internet.
Singularity Cache Directory
We highly recommend setting the $NXF_SINGULARITY_CACHEDIR
environment variable on your system, even if that is a different system to where you will be running Nextflow.
If found, the tool will fetch the Singularity images to this directory first before copying to the target output archive / directory. Any images previously fetched will be found there and copied directly - this includes images that may be shared with other pipelines or previous pipeline version downloads or download attempts.
If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won’t have an internet connection at a later date), you can choose to only use the cache via a prompt or cli options --container-cache-utilisation amend
. This instructs nf-core pipelines download
to fetch all Singularity images to the $NXF_SINGULARITY_CACHEDIR
directory but does not copy them to the workflow archive / directory. The workflow config file is not edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly.
If you are downloading a workflow for a different system, you can provide information about the contents of its image cache to nf-core pipelines download
. To avoid unnecessary container image downloads, choose --container-cache-utilisation remote
and provide a list of already available images as plain text file to --container-cache-index my_list_of_remotely_available_images.txt
. To generate this list on the remote system, run find $NXF_SINGULARITY_CACHEDIR -name "*.img" > my_list_of_remotely_available_images.txt
. The tool will then only download and copy images into your output directory, which are missing on the remote system.
How Singularity Image Downloads work
The list of Singularity containers found in the pipeline by nextflow inspect
is processed in the following order:
- If the target image already exists, nothing is done (eg. with
$NXF_SINGULARITY_CACHEDIR
and--container-cache-utilisation amend
specified) - If found in
$NXF_SINGULARITY_CACHEDIR
and--container-cache-utilisation copy
is specified, they are copied to the output directory - If they start with
http
they are downloaded directly within Python (default 4 at a time, you can customise this with--parallel-downloads
) - If they look like a Docker image name, they are fetched using a
singularity pull
command. Choose the container libraries (registries) queried by providing one or multiple--container-library
parameter(s). For example, if you callnf-core pipelines download
with-l quay.io -l ghcr.io -l docker.io
, every image will be pulled fromquay.io
unless an error is encountered. Subsequently,ghcr.io
and thendocker.io
will be queried for any image that has failed before.- This requires Singularity/Apptainer to be installed on the system and is substantially slower
Note that compressing many GBs of binary files can be slow, so specifying --compress none
is recommended when downloading Singularity images that are copied to the output directory.
Docker and Podman images
In addition to Singularity images, the download command also can find and download a pipeline’s Docker images.
The support for Docker/Podman images is however less configurable than for Singularity images:
In the current version of the command the Docker image download only support the copy
option for the --container-cache-utilisation
flag.
An implicit cache in the form of Docker images available in the Docker daemon on the download machine is however available (see details below.
The flag --container-cache-index
is not supported either, the command will always fetch all the Docker images in the list produced by nextflow inspect
.
How the Docker/Podman image downloads work
Downloading of Docker works slightly different from downloading of Singularity images:
Docker images are always pulled into the Docker daemon on the machine you are downloading from using the docker image pull
command.
This means that the Docker images are not directly available as files as Singularity images are.
To allow transfer to an offline environment Docker instead has the docker image save
command: this command takes a Docker image available on your machine and saves it as a TAR archives which can easily be transferred to an offline machine.
Once transferred to the offline machine, the images can be loaded into the Docker daemon using the docker image load
command.
Alternatively, the Docker image TAR archives can be loaded into Podman using the podman load
command.
The download command automates the creation of TAR archives: it pulls and saves all container images found by nextflow inspect
, and puts them in docker-images
in the archive/target folder.
To allow for easy loading into Docker/Podman on the offline machine the command automatically creates two Bash scripts: docker-load.sh
and podman-load.sh
which when run inside the docker-images
folder on the offline machine will load all images found in the folder.
Adapting downloads to Seqera Platform
Seqera Platform (formerly “Nextflow Tower”) provides a graphical user interface to oversee pipeline runs, gather statistics and configure compute resources. While pipelines added to Seqera Platform are preferably hosted at a Git service, providing them as disconnected, self-reliant repositories is also possible for premises with restricted network access. Choosing the --platform
flag will download the pipeline in an appropriate form.
Subsequently, the *.git
folder can be moved to it’s final destination and linked with a pipeline in Seqera Platform using the file:/
prefix.
Also without access to Seqera Platform, pipelines downloaded with the --platform
flag can be run if the absolute path is specified: nextflow run -r 2.5 file:/path/to/pipelinedownload.git
.
Downloads in this format allow you to include multiple revisions of a pipeline in a single file, but require that the revision (e.g. -r 2.5
) is always explicitly specified.