Educational Cluster Applications

The applications offered on the HPC Educational Cluster, Centaurus and GPU, are a subset of what is available on the Research Cluster. This is mainly due to limitations related to licensing.

Faculty who will be teaching a course are encouraged to check this list to see if the application (or applications) they would like to use for their classes are available on the Educational Cluster. They may request software or codes to be installed, but the request must be done prior to the start of the semester that the class will be taught so that we have time to configure, build, install, and test the application in our environment. We typically do not install new applications or update existing applications during the semester, so that the environment is consistent throughout the whole semester.

URC software

Applications

abaqus
ABAQUS is used for both the modeling and analysis of mechanical components and assemblies (pre-processing) and visualizing the finite element analysis result.
versions available: 2021, 2022, 2024

abyss
ABySS, Assembly by Short Sequences, is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
versions available: 2.2.5, 2.3.7

admixture
ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.
versions available: 1.3.0

alphafold
An implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP14 and published in Nature.
versions available: 2.3.1

aria
ARIA (Ambiguous Restraints for Iterative Assignment) is a software for automated NOE assignment and NMR structure calculation. It speeds up and automatizes the assignment process through the use of an iterative structure calculation scheme. Additionally, a refinement in explicit water improves the quality of the calculated structures, validation tests help spectroscopists to judge the quality of the final structures, and the support of the CCPN data model simplifies the exchange of information with other NMR software packages.
versions available: 2.3.2

artic
ARTIC is a pipeline and set of accompanying tools for working with viral nanopore sequencing data, generated from tiling amplicon schemes. It is designed to help run the artic bioinformatics protocols; for example the SARS-CoV-2 coronavirus protocol. There are 2 workflows baked into this pipeline, one which uses signal data (via nanopolish) and one that does not (via medaka).
versions available: 1.2.4

augustus
AUGUSTUS is a gene prediction program for eukaryotes. It can be used as an ab initio program, which means it bases its prediction purely on the sequence.
versions available: 3.4.0, 3.5.0

aws-cli
The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command-line shell. With minimal configuration, the AWS CLI enables you to start running commands that implement functionality equivalent to that provided by the browser-based AWS Management Console from the command prompt in your terminal program
versions available: 2.15.34

bamtools
BamTools is a project that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.
versions available: 2.4.1, 2.5.2

bayesase
BayesASE is a complete bioinformatics pipeline that incorporates state-of-the-art error reduction techniques and a flexible Bayesian approach to estimating Allelic Imbalance (AI) and formally comparing levels of AI between conditions. AI indicates the presence of functional variation in cis regulatory regions. Detecting cis regulatory differences using AI is widespread, yet there is no formal statistical methodology that tests whether AI differs between conditions.
versions available: 21.1.13

beagle
Beagle is a software package for phasing genotypes and imputing ungenotyped markers. Beagle has improved memory and computational efficiency when analyzing large sequence data sets.
versions available: 5.4

bedtools2
Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome.
versions available: 2.29.0, 2.31.1

blast
NCBI BLAST (Basic Local Alignment Search Tool) is a suite of programs for aligning query sequences against those present in a selected target database.
versions available: 2.11.0+, 2.15.0+

blatsuite
Blat produces two major classes of alignments: 1) at the DNA level between two sequences that are of 95% or greater identity, but which may include large inserts, 2) at the protein or translated DNA level between sequences that are of 80% or greater identity and may also include large inserts. (v37 / 64-bit)
versions available: 36, 37, 38

bowtie2
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.
versions available: 2.5.1, 2.5.3

bracken
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample. Braken uses the taxonomy labels assigned by Kraken, a highly accurate metagenomics classification algorithm, to estimate the number of reads originating from each species present in a sample.
versions available: 2.9

braker
From demultiplexing to consensus for Nanopore amplicon data, Decona can process multiple samples in one line of code: Mixed samples containing multiple species from bulk and eDNA, Mixed amplicons in one barcode, Multiplexed barcodes, Multiple samples in one run, Outputs Medaka polished consensus sequences
versions available: 3.0.7

busco
BUSCO provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB v9.
versions available: 3.0.2, 4.1.4, 5.7.1

bwa
BWA (Burrows-Wheeler Aligner) is a software package for mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.
versions available: 0.7.17

canu
Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing, such as the PacBio RS II/Sequel or Oxford Nanopore MinION.
versions available: 2.1.1, 2.2

cd-hit
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.
versions available: 4.8.1

checkm
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage. CheckM also provides tools for identifying genome bins that are likely candidates for merging based on marker set compatibility, similarity in genomic characteristics, and proximity within a reference genome tree.
versions available: 1.2.2

clustal-omega
Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. In addition, the quality of alignments is superior to previous versions.
versions available: 1.2.4

comsol
COMSOL Multiphysics is a finite element analysis, solver, and simulation software package for various physics and engineering applications, especially coupled phenomena and multiphysics. The software facilitates conventional physics-based user interfaces and coupled systems of partial differential equations (PDEs).
versions available: 6.2

cufflinks
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
versions available: 2.2.1

ddocent
dDocent is simple bash wrapper to QC, assemble, map, and call SNPs from almost any kind of RAD sequencing. If you have a reference already, dDocent can be used to call SNPs from almost any type of NGS data set.
versions available: 2.9.8

decona
From demultiplexing to consensus for Nanopore amplicon data, Decona can process multiple samples in one line of code: Mixed samples containing multiple species from bulk and eDNA, Mixed amplicons in one barcode, Multiplexed barcodes, Multiple samples in one run, Outputs Medaka polished consensus sequences
versions available: 1.3.1

degenprime
DeGenPrime selects the top PCR primer pairs for one or more phylogenetically similar DNA sequences which are aligned or not aligned on the basis of minimizing melting temperature difference for forward and reverse primers which pass the following filter checks: Low Degeneracy, Few Deletions, GC content within the 40-60% range, Low repetition, Non-complementary ends, Minimal risk of hairpins or self and cross-dimerization, Melting temperature within specified range, and the range of melting temperatures for PCR primers can be specified by the user but must be within the absolute range of 50.0 – 65.0 degrees Celsius. DeGenPrime runs off hard filters with no exceptions. If no primers are found that can pass all of these filters the program will warn the user that no suitable primers were found.
versions available: 0.1.2

diamond
DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are: Pairwise alignment of proteins and translated DNA at 100x-10,000x speed of BLAST; Frameshift alignments for long read analysis; Low resource requirements and suitable for running on standard desktops or laptops; Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.
versions available: 2.0.9, 2.1.9

dm_control
DeepMind’s software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo physics.
versions available: 1.0.16

dram
DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, PFAM, dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases.
versions available: 1.5.0

emboss
EMBOSS (European Molecular Biology Open Software Suite) is a software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web.
versions available: 6.6.0

examl
This code implements the popular RAxML search algorithm for maximum likelihood based inference of phylogenetic trees. It uses a radically new MPI parallelization approach that yields improved parallel efficiency, in particular on partitioned multi-gene or whole-genome datasets.
versions available: 3.0.21, 3.0.22

exonerate
Exonerate is a generic tool for sequence alignment
versions available: 2.4.0

famsa
FAMSA is Fast and Accurate Multiple Sequence Alignment of large protein families. It first determines the longest common subsequences and has a unique way to compute gap costs. It proceeds progressively to add sequences into the alignments using a novel iterative approach.
versions available: 2.2.2

fastqc
FastQC is a quality control tool for high throughput sequence data. It takes a FastQ file and runs a series of tests on it to generate a comprehensive QC report. FastQC can be run either as an interactive GUI app, or in a non-interactive way (say as part of a pipeline) which will generate an HTML report for each file you process.
versions available: 0.11.9, 0.12.1

ffmpeg
FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created. It supports the most obscure ancient formats up to the cutting edge. It contains libavcodec, libavutil, libavformat, libavfilter, libavdevice, libswscale and libswresample which can be used by applications. As well as ffmpeg, ffserver, ffplay and ffprobe which can be used by end users for transcoding, streaming and playing.
versions available: 4.2.1, 6.1.1, 7.0.1

fluent
Ansys Fluent is a general-purpose computational fluid dynamics (CFD) software used to model fluid flow, heat and mass transfer, chemical reactions, and more. Also known for its efficient HPC scaling, large models can easily be solved in Fluent on multiple processors on either CPU or GPU.
versions available: 2022

flye
Flye is a de novo assembler for single-molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PacBio / ONT reads as input and outputs polished contigs. Flye also has a special mode for metagenome assembly.
versions available: 2.9.3

freyja
Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational ‘barcodes’ derived from the UShER global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.
versions available: 1.4.9

genemark
GeneMark developed in 1993 was the first gene finding method recognized as an efficient and accurate tool for genome projects. GeneMark was used for annotation of the first completely sequenced bacteria, Haemophilus influenzae, and the first completely sequenced archaea, Methanococcus jannaschii.
versions available: 4.72

gromacs
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions, many groups are also using it for research on non-biological systems, e.g. polymers.
versions available: 2023.4, 2023.4-cuda, 2023.4-mpi, 2023.4-mpi-cuda, 2024.2, 2024.2-cuda, 2024.2-mpi, 2024.2-mpi-cuda

gtdbtk
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB. It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes.
versions available: 2.3.2

guppy
Guppy is a data processing toolkit that contains the Oxford Nanopore Technologies’ basecalling algorithms, and several bioinformatic post-processing features. Early downstream analysis components such as barcoding/demultiplexing, adapter trimming and alignment are contained within Guppy.
versions available: 6.0.6, 6.0.6-cuda11.8, 6.3.4, 6.3.4-cuda11.8, 6.5.7, 6.5.7-cuda11.8

hhsuite
The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs). It contains HHsearch and HHblits among other programs and utilities. HHsearch takes as input a multiple sequence alignment (MSA) or profile HMM and searches a database of HMMs (e.g. PDB, Pfam, or InterPro) for homologous proteins.
versions available: 3.3.0

hicexplorer
HiCExplorer facilitates the creation of contact matrices, correction of contacts, TAD detection, A/B compartments, merging, reordering or chromosomes, conversion from different formats including cooler and detection of long-range contacts. Moreover, it allows the visualization of multiple contact matrices along with other types of data like genes, compartments, ChIP-seq coverage tracks (and in general any type of genomic scores), long range contacts and the visualization of viewpoints.
versions available: 3.7.3

hmmer
HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
versions available: 3.3.2, 3.4

homopolish
Homopolish is a genome polisher originally developed for Nanopore and subsequently extended for PacBio CLR. It generates a high-quality genome (>Q50) for virus, bacteria, and fungus. Nanopore/PacBio systematic errors are corrected by retreiving homologs from closely-related genomes and polished by an SVM.
versions available: 0.4.1

humann
HUMAnN is the HMP Unified Metabolic Analysis Network. HUMAnN is a method for efficiently and accurately profiling the abundance of microbial metabolic pathways and other molecular functions from metagenomic or metatranscriptomic sequencing data.
versions available: 3.8

interproscan
InterPro is a database which integrates together predictive information about proteins’ function from a number of partner resources, giving an overview of the families that a protein belongs to and the domains and sites it contains.
versions available: 5.60-92.0, 5.67-99.0

iphop
iPHoP stands for integrated Phage Host Prediction. It is an automated command-line pipeline for predicting host genus of novel bacteriophages and archaeoviruses based on their genome sequences.
versions available: 1.3.3

iqtree
A fast and effective stochastic algorithm to infer phylogenetic trees by maximum likelihood. IQ-TREE compares favorably to RAxML and PhyML in terms of likelihoods with similar computing time
versions available: 1.6.12, 2.1.2, 2.3.4

i-tasser
I-TASSER is an integrated package for protein structure and function predictions. For a given sequence, I-TASSER first identifies template proteins from the Protein Data Bank (PDB) by multiple threading techniques (LOMETS).
versions available: 5.1, 5.2

jellyfish
JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the ‘compare-and-swap’ CPU instruction to increase parallelism.
versions available: 2.3.0, 2.3.1

kraken2
Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
versions available: 2.1.2, 2.1.3

lammps
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Packages built: ASPHERE ATC AWPMD BOCS BODY CLASS2 COLLOID COLVARS COMPRESS CORESHELL DIFFRACTION DIPOLE DRUDE EFF EXTRA-MOLECULE FEP GRANULAR H5MD KIM KSPACE MANIFOLD MANYBODY MC MEAM MGPT MISC MOFFF MOLECULE MOLFILE MPIIO OPT PERI PHONON POEMS PTM PYTHON QEQ QTB REAXFF REPLICA RIGID SHOCK SMTBQ SPH SPIN SRD TALLY UEF VORONOI
versions available: 02Aug23-cuda, 02Aug23-mpi, 23Jun22-cuda, 23Jun22-mpi

liggghts
LIGGGHTS(R)-PUBLIC is an Open Source Discrete Element Method Particle Simulation Software based on LAMMPS. LIGGGHTS (R) stands for LAMMPS improved for general granular and granular heat transfer simulations. LIGGGHTS (R) aims to improve the capabilities of LAMMPS with the goal to apply it to industrial applications.
versions available: 3.8.0

longstitch
A genome assembly correction and scaffolding pipeline using long reads, consisting of up to three steps: 1) Tigmint cuts the draft assembly at potentially misassembled regions, 2) ntLink is then used to scaffold the corrected assembly, and 3) followed by ARKS for further scaffolding (optional extra step of scaffolding)
versions available: 1.0.5

mafft
MAFFT is a Multiple alignment program for amino acid or nucleotide sequences. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
versions available: 7.487woe, 7.525woe

maker
MAKER is a portable and easily configurable genome annotation pipeline. Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values.
versions available: 3.01.03

masurca
The MaSuRCA (Maryland Super Read Cabog Assembler) genome assembly and analysis toolkit contains of MaSuRCA genome assembler, QuORUM error corrector for Illumina data, POLCA genome polishing software, Chromosome scaffolder, jellyfish mer counter, and MUMmer aligner. The MaSuRCA assembler combines the benefits of deBruijn graph and Overlap-Layout-Consensus assembly approaches. MaSuRCA supports hybrid assembly with short Illumina reads and long high error PacBio/MinION data.
versions available: 4.1.0, 4.1.1

mathematica
Mathematica is a software package which is ideal for communicating scientific ideas, whether this is visualization of a concept in an intro-level course, or creating a simulation of a new idea related to research.
versions available: 12.3.1, 13.3.0

mcr
The MATLAB Compiler Runtime is a standalone set of shared libraries that enables the execution of compiled MATLAB applications or components on computers that do not have MATLAB installed. When used together, MATLAB, MATLAB Compiler, and the MATLAB Runtime enable you to create and distribute numerical applications or software components quickly and securely.
versions available: R2018a, R2018b, R2019b

mega
The objective of the MEGA (Molecular Evolutionary Genetics Analysis) software has been to provide tools for exploring, discovering, and analyzing DNA and protein sequences from an evolutionary perspective. MEGA is designed to facilitate extensive sequence data analysis from an evolutionary perspective using a single program package. At the same time, the overlap between the methods implemented in MEGA and those in other existing evolutionary analysis programs has been consciously avoided. This is reflected in the exclusion of the maximum likelihood method (PHYLIP) and in the absence of extensive options for the maximum parsimony method (PAUP and MacClade.
versions available: 10.2.6, 11.0.13

megadock
MEGADOCK is an ultra-high-performance FFT-grid-based protein-protein docking for heterogeneous supercomputers that takes advantage of the massively parallel CUDA architechture of NVIDIA GPUs and multiple computation nodes.
versions available: 4.1.1, 4.1.1-mpi

megalodon
Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transcriptome.
versions available: 2.5.0

mercat2
MerCat2 is python code for versatile k-mer counter for database independent property analysis (DIPA) for omic analysis.
versions available: 1.4.1

metabolic
METABOLIC enables the prediction of metabolic and biogeochemical functional trait profiles to any given genome datasets. METABOLIC has two main implementations, which are METABOLIC-G and METABOLIC-C. METABOLIC-G.pl allows for generation of metabolic profiles and biogeochemical cycling diagrams of input genomes and does not require input of sequencing reads. METABOLIC-C.pl generates the same output as METABOLIC-G.pl, but as it allows for the input of metagenomic read data, it will generate information pertaining to community metabolism.
versions available: 4.0

metacerberus
MetaCerberus transforms raw sequencing (i.e. genomic, transcriptomics, metagenomics, metatranscriptomic) data into knowledge. It is a start to finish python code for versatile analysis of the Functional Ontology Assignments for Metagenomes (FOAM), KEGG, CAZy/dbCAN, VOG, pVOG, PHROG, COG, and a variety of other databases including user customized databases via Hidden Markov Models (HMM) for functional annotation for complete metabolic analysis across the tree of life (i.e., bacteria, archaea, phage, viruses, eukaryotes, and whole ecosystems).
versions available: 1.3.1

metaphlan
MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.
versions available: 4.1.0

microbeannotator
MicrobeAnnotator uses an iterative approach to annotate microbial genomes (Bacteria, Archaea and Virus) starting from proteins predicted using your favorite ORF prediction tool, e.g. Prodigal. The iterative approach is composed of three or five main steps, depending on the flavor of MicrobeAnnotator you run.
versions available: 2.0.5

minialign

versions available: 0.6.0

mira
MIRA is a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads). It can be seen as a Swiss army knife of sequence assembly developed and used in the past 16 years to get assembly jobs done efficiently – and especially accurately.
versions available: 5rc2

mitobim
The MITObim procedure (mitochondrial baiting and iterative mapping) represents a highly efficient approach to assembling novel mitochondrial genomes of non-model organisms directly from total genomic DNA derived NGS reads. Labor intensive long-range PCR steps prior to sequencing are no longer required.
versions available: 1.9.1

mrbayes
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
versions available: 3.2.7

msprime
msprime is a population genetics simulator of ancestry and DNA sequence evolution based on tskit. msprime can simulate ancestral histories for a sample of individuals, consistent with a given demography under a range of different models and evolutionary processes. It can also simulate mutations on a given ancestral history (which can be produced by msprime ancestry simulations or other programs supporting tskit) under a variety of different models of genome sequence evolution.
versions available: 1.3.1

multiqc
MultiQC is a tool to create a single report with interactive plots for multiple bioinformatics analyses across many samples. Use MultiQC to aggregate results from bioinformatics analyses across many samples into a single report MultiQC searches a given directory for analysis logs and compiles a HTML report. It’s a general use tool, perfect for summarising the output from numerous bioinformatics tools.
versions available: 1.21

namd
NAMD (2.14 x86_64 mpi) is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations.
versions available: 2.13-mcore, 2.13-mcore-cuda, 2.13-mpi, 2.14-mcore, 2.14-mcore-cuda, 2.14-mpi, 3.0b6-mcore, 3.0b6-mcore-cuda, 3.0b6-mpi

nanopolish
Software package for signal-level analysis of Oxford Nanopore sequencing data. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more.
versions available: 0.14.0

netlogo
NetLogo is a programmable modeling environment for simulating natural and social phenomena. NetLogo is particularly well suited for modeling complex systems developing over time.
versions available: 6.2.2, 6.4.0

nextstrain
Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. We provide a continually-updated view of publicly available data alongside powerful analytic and visualization tools for use by the community. Our goal is to aid epidemiological understanding and improve outbreak response.
versions available: 8.2.0

nf-core
Nextflow is an incredibly powerful and flexible workflow language. Nextflow lets you run nf-core pipelines on virtually any computing environment. nf-core pipelines adhere to strict guidelines – if one works, they all will.
versions available: 2.13.1

node.js
Node.js is a JavaScript runtime built on Chrome’s V8 JavaScript engine. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient. As an asynchronous event-driven JavaScript runtime, Node.js is designed to build scalable network applications.
versions available: 18.20.2, 20.12.2

openbabel
Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It’s an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.
versions available: 3.1.1

openfoam
OpenFOAM is the free, open source CFD software released and developed primarily by the OpenFOAM Foundation. OpenFOAM has an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to acoustics, solid mechanics and electromagnetics. We offer versions from both OpenCFD Ltd and the OpenFOAM Foundation.
versions available: 11, 12

orp
The Oyster River Protocol for (eukaryotic) transcriptome assembly is an actively developed, evidenced based method for optimizing transcriptome assembly. The protocol assembles the transcriptome using a multi-kmer multi-assembler approach, then merges those assemblies into 1 final assembly. Version 2.3.3u1 is an update to ORP 2.3.3 based on Anaconda3-2023.03-1, along with the following updated components: trinity 2.15.1, salmon 1.10.1, spades 3.15.5, busco 5.1.3, rcorrector 1.0.5, samtools 1.17, cd-hit 4.8.1, diamond 2.1.6.
versions available: 2.3.3u1

orthofinder
OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics. It finds orthogroups and orthologs, infers rooted gene trees for all orthogroups and identifies all of the gene duplcation events in those gene trees.
versions available: 2.4.0, 2.5.5

pacbio
The PacBio tools distributed via Bioconda are pre-release versions, not necessarily ISO compliant, intended for Research Use Only and not for use in diagnostic procedures. This module includes several of the PacBio open source tools, including blasr, python-consensuscore, genomicconsensus, bam2fastx, bax2bam, isoseq, jasmine, recalladapters, trgt, isoseq3, lima, perl-yaml, pbmm2, pbpigeon, pbskera, pbsv, pbtk, pb-falcon, pb-dazzler, pb-assembly, pbccs, pbcore, pbcommand, pbcoretools, pbalign, pbbam, pbaa, pbcopper, pbfusion, pbmarkdup
versions available: 2024.05

pairtools
pairtools is a simple and fast command-line framework to process sequencing data from a Hi-C experiment. pairtools process pair-end sequence alignments and perform the following operations: detect ligation junctions (a.k.a. Hi-C pairs) in aligned paired-end sequences of Hi-C DNA molecules, sort .pairs files for downstream analyses, detect, tag and remove PCR/optical duplicates, generate extensive statistics of Hi-C datasets, select Hi-C pairs given flexibly defined criteria, restore .sam alignments from Hi-C pairs
versions available: 1.0.3

paml
PAML (Phylogenetic Analysis by Maximum Likelihood) is a program package for model fitting and phylogenetic tree reconstruction using DNA and protein sequence data. The programs are written in ANSI C.
versions available: 4.10.7

pangolin
Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages) was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences.
versions available: 4

parallel
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input.
versions available: 20240422

paraview
ParaView is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques.
versions available: 5.12.1-mpi, 5.12.1-osmesa-mpi

parcels
The OceanParcels project develops Parcels (Probably A Really Computationally Efficient Lagrangian Simulator), a set of Python classes and methods to create customisable particle tracking simulations using output from Ocean Circulation models. Parcels can be used to track passive and active particulates such as water, plankton, plastic and fish.
versions available: 2.1.5, 3.0.3

pcangsd
PCAngsd is a framework for analyzing low-depth next-generation sequencing (NGS) data in heterogeneous/structured populations using principal component analysis (PCA). Population structure is inferred by estimating individual allele frequencies in an iterative approach using a truncated SVD model. The covariance matrix is estimated using the estimated individual allele frequencies as prior information for the unobserved genotypes in low-depth NGS data.
versions available: 1.2

plumed2
PLUMED is a Command line tool to perform analysis on trajectories saved in most of the existing formats.
versions available: 2.9.1

poy
POY is a phylogenetic analysis program that supports multiple kinds of data (e.g. morphology, nucleotides, genes and gene regions, chromosomes, whole genomes, etc). POY is particular in that it can perform true alignment and phylogeny inference (i.e. input sequences need not to be prealigned).
versions available: 5.1.2

prokka
Prokka is rapid prokaryotic genome annotation. Whole genome annotation is the process of identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files.
versions available: 1.14.6

purge_haplotigs
A simple pipeline for reassigning primary contigs that should be labeled as haplotigs. Purge Haplotigs helps with curating heterozygous diploid genome assemblies from third-gen long-read sequencing.
versions available: 1.1.3

pytorch
PyTorch is a python package that provides two high-level features: Tensor computation (like numpy) with strong GPU acceleration, and Deep Neural Networks built on a tape-based autodiff system. Built with CUDA Toolkit 12.1.
versions available: 1.13.1-cuda11.7, 2.3.0-cuda12.1

qe
Quantum Espresso (QE) is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.
versions available: 7.1-intel-mpi, 7.3-intel-mpi

qiime2
QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency. QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.
versions available: 2024.2

quast
QUAST (Quality Assessment Tool for Genome Assemblies) evaluates genome/metagenome assemblies by computing various metrics. The current QUAST toolkit includes the general QUAST tool for genome assemblies, MetaQUAST, the extension for metagenomic datasets, QUAST-LG, the extension for large genomes (e.g., mammalians), and Icarus, the interactive visualizer for these tools.
versions available: 5.0.2

R
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Labs, by John Chambers and colleagues. R can be considered as a different implementation of S.
versions available: 4.2.2, 4.2.2-mpi, 4.3.3, 4.3.3-mpi

ragtag
RagTag, the successor to RaGOO, is a command line tool for reference-guided genome assembly improvement. Currently, the two main features are misassembly correction and scaffolding. After correction and/or scaffolding, RagTag also provides utilities to update annotations or work with AGP files.
versions available: 1.1.1, 2.1.0

raxml
RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees.
versions available: 8.2.13, 8.2.13-mpi, 8.2.4, 8.2.4-mpi

repdenovo
REPdenovo is designed for constructing repeats directly from sequence reads. It based on the idea of frequent k-mer assembly. REPdenovo provides many functionalities, and can generate much longer repeats than existing tools.
versions available: 0.1.0

repeatmasker
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns).
versions available: 4.1.2, 4.1.6

repeatmodeler
RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.
versions available: 2.0.2, 2.0.5

repeatscout
The purpose of the RepeatScout software is to identify repeat family sequences from genomes where hand-curated repeat databases (a la RepBase update) are not available.
versions available: 1.0.6

rmblast
RMBlast is a RepeatMasker compatible version of the standard NCBI blastn program. The primary difference between this distribution and the NCBI distribution is the addition of a new program ‘rmblastn’ for use with RepeatMasker and RepeatModeler.
versions available: 2.11.0, 2.14.1

rnaquast
rnaQUAST is a tool for evaluating RNA-Seq assemblies using reference genome and gene database. In addition, rnaQUAST is also capable of estimating gene database coverage by raw reads and de novo quality assessment using third-party software.
versions available: 2.3.0

rseqc
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. Some basic modules quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while RNA-seq specific modules evaluate sequencing saturation, mapped reads distribution, coverage uniformity, strand specificity, transcript level RNA integrity etc.
versions available: 5.0.3

rstudio
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
versions available: 2024.04

samtools
Samtools is a suite of programs for interacting with high-throughput sequencing data, allowing you to read/write/edit/index/view SAM/BAM/CRAM format. This module includes BCFtools, which is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.
versions available: 1.11, 1.19

seqtk
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.
versions available: 1.3, 1.4

singularity
Singularity is a free, cross-platform and open-source computer program that performs operating-system-level virtualization also known as containerization. One of the main uses of Singularity is to bring containers and reproducibility to scientific computing and the high-performance computing (HPC) world. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data.
versions available: 4.1.4

snakemake
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition.
versions available: 8.5.3

spades
SPAdes (St. Petersburg genome assembler) is intended for both standard isolates and single-cell MDA bacteria assemblies. The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. One can also provide additional contigs that will be used as long reads. SPAdes supports paired-end reads, mate-pairs and unpaired reads.
versions available: 3.15.5, 4.0.0

sra-tools
The Sequence Read Archive (SRA) stores raw sequence data from ‘next-generation’ sequencing technologies including Illumina, 454, IonTorrent, Complete Genomics, PacBio and OxfordNanopores. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Includes NCBI VDB and NGS SDK.
versions available: 2.11.0, 3.1.0

starccm
STARCCM+ is much more than just a CFD solver, STAR-CCM+ is an entire engineering process for solving problems involving flow (of fluids or solids), heat transfer and stress.
versions available: 2021.3, 2022.1, 2023.10

syri
Synteny and Rearrangement Identifier (SyRI). SyRI is a comprehensive tool for predicting genomic differences between related genomes using whole-genome assemblies (WGA). The assemblies are aligned using whole-genome alignment tools, and these alignments are then used as input to SyRI.
versions available: 1.6.3

tensorflow
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.
versions available: 2.11.1-cuda11.2, 2.16.1-cuda12.5

tophat
TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
versions available: 2.1.1

transdecoder
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
versions available: 5.7.1

treetime
TreeTime provides routines for ancestral sequence reconstruction and inference of molecular-clock phylogenies, i.e., a tree where all branches are scaled such that the positions of terminal nodes correspond to their sampling times and internal nodes are placed at the most likely time of divergence.
versions available: 0.11.2

trimmomatic
Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application.
versions available: 0.38, 0.39

trinity
Trinity assembles transcript sequences from Illumina RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.
versions available: 2.14.0, 2.15.1

trycycler
Trycycler is a tool that takes as input multiple separate long-read assemblies of the same genome (e.g. from different assemblers or different read subsets) and produces a consensus long-read assembly.
versions available: 0.5.4

usearch
USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST. USEARCH combines many different algorithms into a single package
versions available: 10.0.240, 11.0.667

velvet
Velvet is a sequence assembler for very short reads
versions available: 1.2.10

viennarna
The ViennaRNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
versions available: 2.6.4

vtk
The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, modeling, image processing, volume rendering, scientific visualization, and information visualization.
versions available: 9.3.0, 9.3.0-mpi

wrf
The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed to serve both atmospheric research and operational forecasting needs.
versions available: 4.3.3, 4.3.3-mpi, 4.6.0, 4.6.0-mpi

Development

anaconda3
Anaconda is a distribution of the Python for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment. Package versions in Anaconda are managed by the package management system conda. This package manager was spun out as a separate open-source package as it ended up being useful on its own and for things other than Python.
versions available: 2023.09

autoconf
Autoconf is an extensible package of M4 macros that produce shell scripts to automatically configure software source code packages. These scripts can adapt the packages to many kinds of UNIX-like systems without manual user intervention. Autoconf creates a configuration script for a package from a template file that lists the operating system features that the package can use, in the form of M4 macro calls.
versions available: 2.72

bazel
Bazel is Google’s own build tool. Bazel has built-in support for building both client and server software, and also provides an extensible framework that you can use to develop your own build rules.
versions available: 6.5.0, 7.1.1

cmake
CMake is a cross-platform, open-source build system. CMake is a family of tools designed to build, test and package software.
versions available: 3.29.0

cuda
The NVIDIA CUDA Toolkit provides a development environment for creating high performance GPU-accelerated applications. With the CUDA Toolkit, you can develop, optimize and deploy your applications on GPU-accelerated embedded systems, desktop workstations, enterprise data centers, cloud-based platforms and HPC supercomputers.
versions available: 11.8, 12.1, 12.4

dmd
DMD is the reference compiler for the D programming language. The D programming language has been said to be ‘what C++ wanted to be,’ which is a better C. D is developed with system level programming in mind, but brings to the table modern language design with a simple C-like syntax. For these reasons D makes for a good language choice for both performance code and application development.
versions available: 2.103.1, 2.108.0

gcc
The GNU Compiler Collection includes front ends for C, C++, Objective-C, and Fortran, as well as libraries for these languages (libstdc++, libgcj,…).
versions available: 12.3.0, 13.2.0

ghc
The Glasgow Haskell Compiler is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming language features such as type classes, which enable type-safe operator overloading, and monadic IO. It is named after logician Haskell Curry.
versions available: 9.8.2

go

versions available: 1.21.8, 1.22.1

hpc-sdk
The NVIDIA HPC Software Development Kit (SDK) includes the proven compilers, libraries and software tools essential to maximizing developer productivity and the performance and portability of HPC applications. The NVIDIA HPC SDK C, C++, and Fortran compilers support GPU acceleration of HPC modeling and simulation applications with standard C++ and Fortran, OpenACC directives, and CUDA. GPU-accelerated math libraries maximize performance on common HPC algorithms, and optimized communications libraries enable standards-based multi-GPU and scalable systems programming.
versions available: 21.3, 24.1

intel
Name: Intel® oneAPI DPC++/C++ Compiler} Version: intel/2024} Description: Intel® oneAPI C/C++ and SYCL code compiler for CPUs, GPUs and FPGAs} URL: https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html} Dependencies: tbb compiler-rt oclfpga
versions available: compiler-rt, mkl, oclfpga, tbb, 2024

julia
Julia is a high-level, high-performance dynamic programming language for numerical computing. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library.
versions available: 1.10.2

lua
Lua is a powerful, efficient, lightweight, embeddable scripting language. It supports procedural programming, object-oriented programming, functional programming, data-driven programming, and data description.
versions available: 5.4.6

mambaforge
Mamba is a reimplementation of the conda package manager in C++, which uses libsolv for much faster dependency solving and allows parallel downloading of repository data and package files using multi-threading. Mamba utilizes the same command line parser, package installation and deinstallation code and transaction verification routines as conda to stay as compatible as possible.
versions available: 23.11

nasm
The Netwide Assembler, NASM, is an 80×86 and x86-64 assembler designed for portability and modularity. It supports a range of object file formats, including Linux and `*BSD’ `a.out’, `ELF’, `COFF’, `Mach-O’, 16-bit and 32-bit `OBJ’ (OMF) format, `Win32′ and `Win64′.
versions available: 2.16

netbeans
NetBeans is a free, open source IDE that allows you to quickly and easily develop desktop, mobile and web applications with Java, HTML5, PHP, C/C++ and more.
versions available: 12.2

openjdk
OpenJDK (Open Java Development Kit) is a free and open-source implementation of the Java Platform, Standard Edition (Java SE). It is the result of an effort Sun Microsystems began in 2006. The implementation is licensed under the GNU General Public License (GNU GPL) version 2.
versions available: 21, 22

powershell
PowerShell is a cross-platform task automation solution made up of a command-line shell, a scripting language, and a configuration management framework. PowerShell is a modern command shell that includes the best features of other popular shells. Unlike most shells that only accept and return text, PowerShell accepts and returns .NET objects.
versions available: 7.3.1

scala
Scala is an acronym for ‘Scalable Language’. Scala is a pure-bred object-oriented language. Conceptually, every value is an object and every operation is a method-call. The language supports advanced component architectures through classes and traits.
versions available: 2.13.13, 3.4.1

swift
Swift is a general-purpose programming language built using a modern approach to safety, performance, and software design patterns. The goal of the Swift project is to create the best available language for uses ranging from systems programming, to mobile and desktop apps, scaling up to cloud services.
versions available: 5.10

yasm
YASM, an assembler and disassembler for the Intel x86 architecture, is a complete rewrite of the NASM assembler. YASM currently supports the x86 and AMD64 instruction sets, accepts NASM and GAS assembler syntaxes, outputs binary, ELF32, ELF64, 32 and 64-bit Mach-O, RDOFF2, COFF, Win32, and Win64 object formats, and generates source debugging information in STABS, DWARF 2, and CodeView 8 formats.
versions available: 1.3.0

Libraries

boost
Boost is a set of libraries for the C++ programming language that provide support for tasks and structures such as linear algebra, pseudorandom number generation, multithreading, image processing, regular expressions, and unit testing.
versions available: 1.84.0, 1.84.0-mpi

cudnn
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
versions available: 8.9.7-cuda11, 8.9.7-cuda12, 9.0.0-cuda11, 9.0.0-cuda12

hdf5
HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.
versions available: 1.10.7, 1.10.7-mpi, 1.12.3, 1.12.3-intel-mpi, 1.12.3-mpi, 1.14.3, 1.14.3-intel-mpi, 1.14.3-mpi

netcdf

versions available: 4.8.1, 4.8.1-mpi, 4.9.2, 4.9.2-mpi

openblas
Hierarchical Data Format (OPENBLAS4; also known as OPENBLAS) is a library and multi-object file format for storing and managing data between machines.
versions available: 0.3.27

MPI (Message Passing Interface)

intel-mpi
Name: Intel(R) MPI Library} Version: modules/2021.11} Description: Intel(R) MPI Library} URL: https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html} Dependencies: none
versions available: 2021.11

mpich
MPICH is a high-performance and widely portable implementation of the Message Passing Interface (MPI) standard MPI-1, MPI-2 and MPI-3.
versions available: 4.2.2-mpirun, 4.2.2-mpirun-intel, 4.2.2-srun, 4.2.2-srun-intel

openmpi
The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners.
versions available: 4.1.6, 4.1.6-intel, 5.0.2, 5.0.2-intel