./biology/minimap2, Sequence alignment program for noisy, long reads

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 2.18nb1, Package name: minimap2-2.18nb1, Maintainer: pkgsrc-users

Minimap2 is a versatile sequence alignment program that aligns DNA or
mRNA sequences against a large reference database. Typical use cases
include: (1) mapping PacBio or Oxford Nanopore genomic reads to the
human genome; (2) finding overlaps between long reads with error rate
up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore
cDNA or Direct RNA reads against a reference genome; (4) aligning
Illumina single- or paired-end reads; (5) assembly-to-assembly
alignment; (6) full-genome alignment between two closely related
species with divergence below ~15%.

For ~10kb noisy reads sequences, minimap2 is tens of times faster than
mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and
GMAP. It is more accurate on simulated long reads and produces
biologically meaningful alignment ready for downstream analyses. For
>100bp Illumina short reads, minimap2 is three times as fast as
BWA-MEM and Bowtie2, and as accurate on simulated data. Detailed
evaluations are available from the minimap2 paper
(https://doi.org/10.1093/bioinformatics/bty191).


Master sites:


Version history: (Expand)


CVS history: (Expand)


   2021-10-26 12:03:45 by Nia Alarie | Files touched by this commit (73)
Log message:
biology: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes
   2021-10-07 15:19:44 by Nia Alarie | Files touched by this commit (73)
Log message:
biology: Remove SHA1 hashes for distfiles
   2021-05-29 19:35:18 by Brook Milligan | Files touched by this commit (2)
Log message:
biology/minimap2: install minimap2 program instead of python binding

The distfile for minimap2 includes two different components: (i) the
minimap2 sequence mapping program itself, and (ii) a python binding
generally referred to as mappy.  The initial version of this package
included only the python binding.  However, it is more appropriate
that the minimap2 package should contain the program of the same name,
and a new package be created with the name mappy for the python
binding.  Splitting these into two packages makes sense, because this
allows users to install the minimap2 package without python
dependencies.
   2021-05-26 20:49:20 by Brook Milligan | Files touched by this commit (4)
Log message:
biology/minimap2: add minimap 2.18

## Users' Guide

Minimap2 is a versatile sequence alignment program that aligns DNA or
mRNA sequences against a large reference database. Typical use cases
include: (1) mapping PacBio or Oxford Nanopore genomic reads to the
human genome; (2) finding overlaps between long reads with error rate
up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore
cDNA or Direct RNA reads against a reference genome; (4) aligning
Illumina single- or paired-end reads; (5) assembly-to-assembly
alignment; (6) full-genome alignment between two closely related
species with divergence below ~15%.

For ~10kb noisy reads sequences, minimap2 is tens of times faster than
mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and
GMAP. It is more accurate on simulated long reads and produces
biologically meaningful alignment ready for downstream analyses. For
>100bp Illumina short reads, minimap2 is three times as fast as
BWA-MEM and Bowtie2, and as accurate on simulated data.  Detailed
evaluations are available from the minimap2 paper or the preprint.

Release 2.18-r1015 (9 April 2021)
---------------------------------

This release fixes multiple rare bugs in minimap2 and adds additional
functionality to paftools.js.

Changes to minimap2:

 * Bugfix: a rare segfault caused by an off-by-one error (#489)

 * Bugfix: minimap2 segfaulted due to an uninitilized variable (#622 and #625).

 * Bugfix: minimap2 parsed spaces as field separators in BED (#721). This led
   to issues when the BED name column contains spaces.

 * Bugfix: minimap2 `--split-prefix` did not work with long reference names
   (#394).

 * Bugfix: option `--junc-bonus` didn't work (#513)

 * Bugfix: minimap2 didn't return 1 on I/O errors (#532)

 * Bugfix: the `de:f` tag (sequence divergence) could be negative if there were
   ambiguous bases

 * Bugfix: fixed two undefined behaviors caused by calling memcpy() on
   zero-length blocks (#443)

 * Bugfix: there were duplicated SAM @SQ lines if option `--split-prefix` is in
   use (#400 and #527)

 * Bugfix: option -K had to be smaller than 2 billion (#491). This was caused
   by a 32-bit integer overflow.

 * Improvement: optionally compile against SIMDe (#597). Minimap2 should work
   with IBM POWER CPUs, though this has not been tested. To compile with SIMDe,
   please use `make -f Makefile.simde`.

 * Improvement: more informative error message for I/O errors (#454) and for
   FASTQ parsing errors (#510)

 * Improvement: abort given malformatted RG line (#541)

 * Improvement: better formula to estimate the `dv:f` tag (approximate sequence
   divergence). See DOI:10.1101/2021.01.15.426881.

 * New feature: added the `--mask-len` option to fine control the removal of
   redundant hits (#659). The default behavior is unchanged.

Changes to mappy:

 * Bugfix: mappy caused segmentation fault if the reference index is not
   present (#413).

 * Bugfix: fixed a memory leak via 238b6bb3

 * Change: always require Cython to compile the mappy module (#723). Older
   mappy packages at PyPI bundled the C source code generated by Cython such
   that end users did not need to install Cython to compile mappy. However, as
   Python 3.9 is breaking backward compatibility, older mappy does not work
   with Python 3.9 anymore. We have to add this Cython dependency as a
   workaround.

Changes to paftools.js:

 * Bugfix: the "part10-" line from asmgene was wrong (#581)

 * Improvement: compatibility with GTF files from GenBank (#422)

 * New feature: asmgene also checks missing multi-copy genes

 * New feature: added the misjoin command to evaluate large-scale misjoins and
   megabase-long inversions.

Although given the many bug fixes and minor improvements, the core algorithm
stays the same. This version of minimap2 produces nearly identical alignments
to v2.17 except very rare corner cases.

Now unimap is recommended over minimap2 for aligning long contigs against a
reference genome. It often takes less wall-clock time and is much more
sensitive to long insertions and deletions.

(2.18: 9 April 2021, r1015)