Twitter LinkedIn
Return to Wellspring


ABySS – Assembly by Short Sequences


Project TitleABySS – Assembly by Short Sequences
Track CodeINV-10-004
Short DescriptionABySS software is a novel parallel algorithm to assemble short sequencing data
Tagsbioinformatics, biotechnology, genetics, genomics, r&d discovery, software
Posted DateMar 10, 2010 8:30 AM


To assemble the very large data sets produced by sequencing individual human genomes, we have developed ABySS (Assembly By Short Sequences). The primary innovation in ABySS is a distributed representation of a de Bruijn graph, which allows parallel computation of the assembly algorithm across a network of commodity computers.

Potential Applications

Short read assembler for very large datasets.

State of Development

Version 1.1.2 is available for download.



Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS (Assembly By Short Sequences), a parallelized sequence assembler.


To provide context to the performance of ABySS, we performed a comparison with previously published short read assemblers. We used a data set consisting of 20.8 million paired-end 36 bp Illumina reads from a 200 bp insert E. coli library (NCBI Short Read Archive, accession no. SRX000429). We performed assemblies with ABySS, and four other short read assemblers. All the assemblers were able to accurately reconstruct the majority of the E. coli genome with contigs ≥100 bp. However, there is a wide range in terms of contig size and accuracy. ABySSs performance is competitive with the other short read assemblers.


Name Price
ABySS Free Download