Installation
ksrates is available as Apptainer (formerly Singularity) or Docker container, which bundle ksrates and all required external software dependencies, and as a Python package, which requires manual installation of the package and its dependencies but allows for more flexibility in integrating it into existing bioinformatics environments and toolchains. In addition to a simple and easy to use command-line interface (CLI), we also provide a user-friendly Nextflow pipeline that allows to run a complete ksrates analysis fully automated.
ksrates runs on any Linux or macOS system or on Windows with Windows Subsystem for Linux 2 (WSL2) or a virtual machine installed. However, ksrates analyses are computationally demanding, and therefore we recommend the use of a computer cluster (or cloud platform) for any but the simplest data sets.
Nextflow (recommended)
The ksrates Nextflow pipeline makes it easy to execute the individual steps of a ksrates analysis as a single command. Nextflow also makes it easy to configure the execution of the pipeline on a variety of computer clusters (see the Nextflow configuration file section).
To install Nextflow and its dependencies, follow the commands below (or the official Nextflow installation instructions).
Make sure you have Bash 3.2 (or later) installed.
If you do not have Java installed, install Java 11 or later; on Linux you can for example use:
sudo apt-get install default-jdk
Note
Nextflow versions before
22.01.x-edgerequire Java version 8 up to 15 for their execution.
Then install Nextflow using either:
wget -qO- https://get.nextflow.io | bash
or:
curl -fsSL https://get.nextflow.io | bash
This creates the
nextflowexecutable file in the current directory.
Note
As from ksrates
v2.0.0, the Nextflow pipeline has been ported to DSL2 syntax and requires at least Nextflow version22.03.0-edge. Concerning older ksrates versions written with DSL1, fromv1.1.3tov1.1.5they support from22.03.0-edgeuntil before22.12.0-edge, whilev1.1.2and previous ones require earlier versions than22.03.0-edge.
Optionally make the
nextflowexecutable accessible by your$PATHvariable, for example by moving it:sudo mv nextflow /usr/local/bin
The ksrates Nextflow pipeline itself does not need to be installed and will be automatically downloaded and set up simply when you execute the launch of the ksrates Nextflow pipeline for the first time. The same applies to the ksrates package and its dependencies when using the ksrates Apptainer or Docker container (see below). In other words, if you plan on only using the ksrates Nextflow pipeline with a container it is not necessary to manually download or install ksrates itself, the only other software you may need to install is the Apptainer or Docker software itself (next section). (Note that the ksrates Nextflow pipeline code is however also included in the ksrates GitHub repository and can thus also be executed from a manually installed ksrates package (see the Manual installation section below).)
Apptainer and Docker containers
Containers are standalone portable runtime environments that package everything needed to run a software, including application code, external software dependencies and operating system libraries and runtime, and are thus executable in any computing environment for which Apptainer and Docker container engines are available. This comes in handy, for example, when local installation of ksrates and its software dependencies are not possible, for instance due to permission issues, or for deploying ksrates to a computer cluster or cloud.
Availability and dependencies
Apptainer runs natively only on Linux. On Windows it requires either WSL2 (recommended; see Note below) or a virtual machine (VM). On macOS it is available as a beta version or it also requires a VM. Apptainer has the advantage over Docker of always producing output files with non-root permissions.
Note
WSL2 (Windows Subsystem for Linux 2) is a native Windows 10 feature that allows to run a GNU/Linux terminal without the use of a VM. It can be installed following the official documentation.
Docker runs natively on both Linux and Windows, while on macOS it can be installed as an application that makes use of a VM under the hood. When working on Linux machines, Docker produces output files that require root permissions to be handled (e.g. to delete them), which is an issue for users who don’t have root permissions. Running Docker on Windows and macOS does not have such problems because the user has more control on output file permissions.
The table below summarizes relevant differences between Apptainer and Docker containers/engines:
Feature |
Apptainer |
Docker |
|---|---|---|
Runs on Linux |
✓ |
✓ |
Runs on Windows |
✓ (WSL2 or VM) |
✓ |
Runs on macOS |
✓ (beta or VM) |
✓ (VM) |
Root privilege needed |
✗ |
✓ |
Apptainer (recommended)
When using the ksrates Apptainer container, either to run the ksrates CLI or Nextflow pipeline, the machine (i.e. a local computer or a remote computer cluster or cloud node) needs to have Apptainer installed (ksrates has been tested with version 1.4.0). More information can be found on the Apptainer Quick Start page. For a Linux installation we suggest to follow the Install from Source section. For up-to-date and version-specific instructions, please refer to the official documentation.
Note
To allow users to run the pipeline from any directory in a cluster (i.e. not necessarily from their home directory), the bind path feature needs to be left active during Apptainer installation [Default: “YES”].
When using the ksrates Nextflow pipeline with the ksrates Apptainer container, the container will be automatically downloaded from the vibpsb/ksrates repository on Docker Hub on first launch (this may take a while depending on your Internet connection speed since the container has a size of about 1 GB) and will then be stored and reused for successive runs.
Docker
When using the ksrates Docker container, either to run the ksrates CLI or Nextflow pipeline, the machine (i.e. a local computer or a remote computer cluster or cloud node) needs to have Docker installed. More information can be found on the Docker installation page.
When using the ksrates Nextflow pipeline with the ksrates Docker container, the container will be automatically downloaded from the vibpsb/ksrates repository on Docker Hub on first launch (this may take a while depending on your Internet connection speed since the container has a size of about 1 GB) and will then be stored and reused for successive runs.
Manual installation
When not using or not being able to use one of the ksrates containers, for example to integrate the tool into existing bioinformatics environments and toolchains, the installation of the ksrates Python package and its dependencies can or has to be carried out manually. The following commands guide you through the installation on a Linux machine. Windows users can carry out the installation with the same commands by using either WSL2 (recommended; see Note below) or a virtual machine (VM) with Linux installed. macOS users can for example use Homebrew instead of apt-get.
Note
WSL2 (Windows Subsystem for Linux 2) is a native Windows 10 feature that allows to run a GNU/Linux terminal without the use of a VM. It can be installed following the official documentation.
Most of the non-Python dependencies can be installed with the following commands:
sudo apt-get update && sudo apt-get -yq install python3-pip default-jdk build-essential ncbi-blast+ muscle fasttree mcl phyml | bash
Install PAML 4.9j from source (for more information see PAML installation page) to avoid compatibility issues:
wget http://abacus.gene.ucl.ac.uk/software/paml4.9j.tgz tar -xzf paml4.9j.tgz cd paml4.9j/src && make -f Makefile
Then make the executable
codemlavailable through the$PATHvariable (the downloaded PAML directory can be deleted):Either move
codemlto a directory already present in$PATH, e.g.usr/local/bin:sudo mv codeml usr/local/bin
Or move
codemlto another directory (here assumed to be~/bin) and add this directory to$PATH, for the Bash shell by copying the following line to the shell initialization file (e.g..bashrc):export PATH=$PATH:~/bin
Install i-ADHoRe 3.0 from its GitHub page (required only for collinearity analysis of genome data for the focal species).
Clone the ksrates repository from GitHub and install the package and its Python dependencies:
git clone https://github.com/VIB-PSB/ksrates # Starting from v2.0.0, when running outside a container, download also this compressed file: wget https://zenodo.org/records/15225340/files/original_angiosperm_sequences.tar.gz -P ksrates/ksrates/reciprocal_retention # Move to the ksrates subdirectory and install the package with ``pip`` cd ksrates pip3 install .
Testing your installation
Note
WSL2 users can enter the Windows file system from the terminal through e.g. cd mnt/c/Users/your_username.
Clone the ksrates repository from GitHub in order to access the
testdataset:git clone https://github.com/VIB-PSB/ksrates cd ksrates/test
Launch the Nextflow ksrates pipeline with Apptainer (the execution will take few minutes):
nextflow run VIB-PSB/ksrates --test -profile apptainer --config config_files/config_elaeis.txt --expert config_files/config_expert.txt
The first time the command is executed, Nextflow downloads a local copy of the ksrates Nextflow pipeline from the
VIB-PSB/ksratesGitHub repository and stores it in the$HOME/.nextflowdirectory. Parameter--testis mandatory when running the test dataset. Parameter-profilepulls the Apptainer (or Docker) container from Docker Hub.Note
Since the Apptainer image is by default stored in the launching folder under
work/singularity, it is recommended to specify a “centralized” destination path throughapptainer.cacheDirin the Nextflow configuration file located in thetestdirectory (nextflow.config, automatically detected). See Nextflow configuration file section.Alternatively to the Nextflow pipeline, test by executing the individual steps of the manual pipeline (with or without container):
apptainer exec docker://vibpsb/ksrates ksrates init config_files/config_elaeis.txt --expert config_files/config_expert.txt apptainer exec docker://vibpsb/ksrates ksrates paralogs-ks --test config_files/config_elaeis.txt --expert config_files/config_expert.txt --n-threads 4 ...
Argument
--testis mandatory forparalogs-ksandparalogs_ks_multiksrates commands. More details in the Run example case as a manual pipeline section.Remove the cloned repository once the test is successful, or keep it if you want to run the example pipeline as well (see Run example case as a Nextflow pipeline (recommended)).
Updating your installation
To update the ksrates Nextflow pipeline to the latest release, run the following command:
nextflow pull VIB-PSB/ksrates
To update the Docker container image, run the following command to pull the new image from Docker Hub:
docker pull vibpsb/ksrates:latest
To update the Apptainer container image, first remove the old image (when using the ksrates Nextflow pipeline the image is stored in the
cacheDirdirectory set in thenextflow.configor, if not set, by default inwork/singularityin the project folder):rm vibpsb-ksrates-latest.img
The next time the pipeline is launched, Nextflow will automatically pull the new image from Docker Hub.
Alternatively, run the following command in the same directory of the old image to manually pull the new image from Docker Hub:
apptainer pull vibpsb-ksrates-latest.img docker://vibpsb/ksrates:latest
To update your manual installation, uninstall the old version of ksrates package, clone the ksrates repository from GitHub and re-install the package:
pip3 uninstall ksrates git clone https://github.com/VIB-PSB/ksrates # Starting from v2.0.0, when running outside a container, download also this compressed file: wget https://zenodo.org/records/15225340/files/original_angiosperm_sequences.tar.gz -P ksrates/ksrates/reciprocal_retention # Move to the ksrates subdirectory and install the package with ``pip`` cd ksrates pip3 install .