./biology/py-biopython, Python libraries for computational molecular biology

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.81, Package name: py312-biopython-1.81, Maintainer: pkgsrc-users

The Biopython package contains high-quality, reusable modules and
scripts written in Python to make it as easy as possible to use Python
for bioinformatics. The Biopython includes the follwing: the ability
to parse bioinformatics files into python utilizable data structures,
including support for the formats such as Blast output, Clustalw,
FASTA, GenBank, PubMed and Medicine, various Expasy files, SCOP,
Rebase, UniGene, and SwissProt.


Required to run:
[devel/py-setuptools] [math/py-numpy] [lang/python37]

Required to build:
[pkgtools/cwrappers]

Master sites:

Filesize: 18871.948 KB

Version history: (Expand)


CVS history: (Expand)


   2024-11-11 08:29:31 by Thomas Klausner | Files touched by this commit (862)
Log message:
py-*: remove unused tool dependency

py-setuptools includes the py-wheel functionality nowadays
   2024-10-14 08:46:10 by Thomas Klausner | Files touched by this commit (325)
Log message:
*: clean-up after python38 removal
   2023-11-15 09:58:06 by Thomas Klausner | Files touched by this commit (1)
Log message:
py-biopython: add missing setuptools tool
   2023-11-06 00:52:20 by Thomas Klausner | Files touched by this commit (3) | Package updated
Log message:
py-biopython: update to 1.81.

12 February 2023: Biopython 1.81
===============================================

This release of Biopython supports Python 3.7, 3.8, 3.9, 3.10, 3.11. It has
also been tested on PyPy3.7 v7.3.5. We intend to drop Python 3.7 support.

The API documentation and the `Biopython Tutorial and Cookbook` have
been updated to better annotate use and application of the
``Bio.PDB.internal_coords`` module.

``Bio.Phylo`` now supports ``Alignment`` and ``MultipleSeqAlignment``
objects as input.

Several improvements and bug fixes to the snapgene parser contributes by
Damien Goutte-Gattat.

Additionally, a number of small bugs and typos have been fixed with additions
to the test suite.

18 November 2022: Biopython 1.80
================================

This release of Biopython supports Python 3.7, 3.8, 3.9, 3.10, 3.11. It has
also been tested on PyPy3.7 v7.3.5.

Functions ``read``, ``parse``, and ``write`` were added to ``Bio.Align`` to
read and write ``Alignment`` objects.  String formatting and printing output
of ``Alignment`` objects from ``Bio.Align`` were changed to support these new
functions. To obtain a string showing the aligned sequence with the appropriate
gap characters (as previously shown when calling ``format`` on an alignment),
use ``alignment[i]``, where ``alignment`` is an ``Alignment`` object and ``i``
is the index of the aligned sequence.

Because dict retains the item order by default since Python3.6, all instances
of ``collections.OrderedDict`` have been replaced by either standard ``dict``
or where appropriate by ``collections.defaultsdict``.

Robert Miller has updated the ``Bio.PDB.internal_coords`` module  to
make better use of Numpy for lossless structure assembly from dihedral
angles and related internal coordinates.  In addition to speeding the
assembly step by ~30%, this adds distance plot support (including
re-generating structures from distance plot data), coordinate space
transforms for superimposing residues and their environments, a
per-chain all-atom array for Atom coordinates, and optional default
values for all internal coordinates.  The internal coordinates module
continues to support extracting dihedral angle, bond angle and bond
length (internal coordinates) data, reading/writing structure files of
internal coordinates, and OpenSCAD output of structures for 3D CAD/3D
printing work.

The ``Bio.motifs.jaspar.db`` now returns ``tf_family`` and ``tf_class`` as a
string array since the JASPAR 2018 release.

The Local Composition Complexity functions from ``Bio.SeqUtils`` now uses
base 4 log instead of 2 as stated in the original reference Konopka (2005),
Sequence Complexity and Composition. https://doi.org/10.1038/npg.els.0005260

Append mode is now supported in ``Bio.bgzf`` (and a bug parsing blocked GZIP
files with an internal empty block fixed).

The experimental warning was dropped from ``Bio.phenotype`` (which was new in
Biopython 1.67).

Sequences now have a ``defined`` attribute that returns a boolean indicating
if the underlying data is defined or not.

The ``Bio.PDB`` module now includes a structural alignment module, using the
combinatorial extension algorithm of Shindyalov and Bourne, commonly known as
CEAlign. The module allows for two structures to be aligned based solely on
their 3D conformation, ie. in a sequence-independent manner. The method is
particularly powerful when the structures shared a very low degree of sequence
similarity. The new module is available in ``Bio.PDB.CEAligner`` with an
interface similar to other 3D superimposition modules.

A new module ``Bio.PDB.qcprot`` implements the QCP superposition algorithm in
pure Python, deprecating the existing C implementation. This leads to a slight
performance improvement and to much better maintainability. The refactored
``qcprot.QCPSuperimposer`` class has small changes to its API, to better mirror
that of ``Bio.PDB.Superimposer``.

The ``Bio.PDB.PDBList`` module now allows downloading biological assemblies,
for one or more entries of the wwPDB.

In the ``Bio.Restriction`` module, each restriction enzyme now includes an `id`
property giving the numerical identifier for the REBASE database identifier
from which the enzyme object was created, and a `uri` property with a canonical
`identifiers.org` link to the database, for use in linked-data representations.

Add new ``gc_fraction`` function in ``SeqUtils`` and marks ``GC`` for future
deprecation.

Support for the old format (dating back to 2004) of the GN line in SwissProt
files was dropped in ``Bio.SwissProt``.

Additionally, a number of small bugs and typos have been fixed with additions
to the test suite.
   2023-08-02 01:20:57 by Thomas Klausner | Files touched by this commit (158)
Log message:
*: remove more references to Python 3.7
   2023-07-01 10:37:47 by Thomas Klausner | Files touched by this commit (105) | Package updated
Log message:
*: restrict py-numpy users to 3.9+ in preparation for update
   2022-02-17 11:44:49 by Thomas Klausner | Files touched by this commit (3) | Package updated
Log message:
py-biopython: update to 1.79.

1 June 2021: Biopython 1.79
================================

This is intended to be our final release supporting Python 3.6. It also
supports Python 3.7, 3.8 and 3.9, and has also been tested on PyPy3.6.1 v7.1.1.

The ``Seq`` and ``MutableSeq`` classes in ``Bio.Seq`` now store their sequence
contents as ``bytes` ` and ``bytearray`` objects, respectively. Previously, for
``Seq`` objects a string object was used, and a Unicode array object for
``MutableSeq`` objects. This was maintained during the transition from Python2
to Python3. However, a Python2 string object corresponds to a ``bytes`` object
in Python3, storing the string as a series of 256-bit characters. While non-
ASCII characters could be stored in Python2 strings, they were not treated as
such. For example:

In Python2::

    >>> s = "Генетика"
    >>> type(s)
    <class 'str'>
    >>> len(s)
    16

In Python3::

    >>> s = "Генетика"
    >>> type(s)
    <class 'str'>
    >>> len(s)
    8

In Python3, storing the sequence contents as ``bytes`` and ``bytearray``
objects has the further advantage that both support the buffer protocol.

Taking advantage of the similarity between ``bytes`` and ``bytearray``, the
``Seq`` and ``MutableSeq`` classes now inherit from an abstract base class
``_SeqAbstractBaseClass`` in ``Bio.Seq`` that implements most of the ``Seq``
and ``MutableSeq`` methods, ensuring their consistency with each other. For
methods that modify the sequence contents, an optional ``inplace`` argument to
specify if a new sequence object should be returned with the new sequence
contents (if ``inplace`` is ``False``, the default) or if the sequence object
itself should be modified (if ``inplace`` is ``True``). For ``Seq`` objects,
which are immutable, using ``inplace=True`` raises an exception. For
``inplace=False``, the default, ``Seq`` objects and ``MutableSeq`` behave
consistently.

As before, ``Seq`` and ``MutableSeq`` objects can be initialized using a string
object, which will be converted to a ``bytes`` or ``bytearray`` object assuming
an ASCII encoding. Alternatively, a ``bytes`` or ``bytearray`` object can be
used, or an instance of any class inheriting from the new
``SequenceDataAbstractBaseClass`` abstract base class in ``Bio.Seq``. This
requires that the class implements the ``__len__`` and ``__getitem`` methods
that return the sequence length and sequence contents on demand. Initialzing a
``Seq`` instance using an instance of a class inheriting from
``SequenceDataAbstractBaseClass`` allows the ``Seq`` object to be lazy, meaning
that its sequence is provided on demand only, without requiring to initialize
the full sequence. This feature is now used in ``BioSQL``, providing on-demand
sequence loading from an SQL database, as well as in a new parser for twoBit
(.2bit) sequence data added to ``Bio.SeqIO``. This is a lazy parser that allows
fast access to genome-size DNA sequence files by not having to read the full
genome sequence. The new ``_UndefinedSequenceData`` class in ``Bio.Seq``  also
inherits from ``SequenceDataAbstractBaseClass`` to represent sequences of known
length but unknown sequence contents. This provides an alternative to
``UnknownSeq``, which is now deprecated as its definition was ambiguous. For
example, in these examples the ``UnknownSeq`` is interpreted as a sequence with
a well-defined sequence contents::

    >>> s = UnknownSeq(3, character="A")
    >>> s.translate()
    UnknownSeq(1, character='K')
    >>> s + "A"
    Seq("AAAA")

A sequence object with an undefined sequence contents can now be created by
using ``None`` when creating the ``Seq`` object, together with the sequence
length. Trying to access its sequence contents raises an
``UndefinedSequenceError``::

    >>> s = Seq(None, length=6)
    >>> s
    Seq(None, length=6)
    >>> len(s)
    6
    >>> "A" in s
    Traceback (most recent call last):
    ...
    Bio.Seq.UndefinedSequenceError: Sequence content is undefined
    >>> print(s)
    Traceback (most recent call last):
    ....
    Bio.Seq.UndefinedSequenceError: Sequence content is undefined

Element assignment in Bio.PDB.Atom now returns "X" when the element \ 
cannot be
unambiguously guessed from the atom name, in accordance with PDB structures.

Bio.PDB entities now have a ``center_of_mass()`` method that calculates either
centers of gravity or geometry.

New method ``disordered_remove()`` implemented in Bio.PDB DisorderedAtom and
DisorderedResidue to remove children.

New module Bio.PDB.SASA implements the Shrake-Rupley algorithm to calculate
atomic solvent accessible areas without third-party tools.

Expected ``TypeError`` behaviour has been restored to the ``Seq`` object's
string like methods (fixing a regression in Biopython 1.78).

The KEGG ``KGML_Pathway`` KGML output was fixed to produce output that complies
with KGML v0.7.2.

Parsing motifs in ``pfm-four-rows`` format can now handle motifs with values
in scientific notation.

Parsing motifs in ``minimal``` MEME format will use ``nsites`` when making
the count matrix from the frequency matrix, instead of multiply the frequency
matrix by 1000000.

Bio.UniProt.GOA now parses Gene Product Information (GPI) files version 1.2,
files can be downloaded from the EBI ftp site:
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/

4 September 2020: Biopython 1.78
================================

This release of Biopython supports Python 3.6, 3.7 and 3.8. It has also been
tested on PyPy3.6.1 v7.1.1.

The main change is that ``Bio.Alphabet`` is no longer used. In some cases you
will now have to specify expected letters, molecule type (DNA, RNA, protein),
or gap character explicitly. Please consult the updated Tutorial and API
documentation for guidance. This simplification has sped up many ``Seq``
object methods. See https://biopython.org/wiki/Alphabet for more information.

``Bio.SeqIO.parse()`` is faster with "fastq" format due to small \ 
improvements
in the ``Bio.SeqIO.QualityIO`` module.

The ``SeqFeature`` object's ``.extract()`` method can now be used for
trans-spliced locations via an optional dictionary of references.

As in recent releases, more of our code is now explicitly available under
either our original "Biopython License Agreement", or the very similar but
more commonly used "3-Clause BSD License".  See the ``LICENSE.rst`` \ 
file for
more details.

Additionally, a number of small bugs and typos have been fixed with additions
to the test suite. There has been further work to follow the Python PEP8,
PEP257 and best practice standard coding style, and all of the tests have
been reformatted with the ``black`` tool to match the main code base.

25 May 2020: Biopython 1.77
===========================

This release of Biopython supports Python 3.6, 3.7 and 3.8 It has also been
tested on PyPy3.6.1 v7.1.1-beta0.

**We have dropped support for Python 2 now.**

``pairwise2`` now allows the input of parameters with keywords and returns the
alignments as a list of ``namedtuples``.

The codon tables have been updated to NCBI genetic code table version 4.5,
which adds Cephalodiscidae mitochondrial as table 33.

Updated ``Bio.Restriction`` to the January 2020 release of REBASE.

A major contribution by Rob Miller to ``Bio.PDB`` provides new methods to
handle protein structure transformations using dihedral angles (internal
coordinates). The new framework supports lossless interconversion between
internal and cartesian coordinates, which, among other uses, simplifies the
analysis and manipulation of coordinates of proteins structures.

As in recent releases, more of our code is now explicitly available under
either our original "Biopython License Agreement", or the very similar but
more commonly used "3-Clause BSD License".  See the ``LICENSE.rst`` \ 
file for
more details.

Additionally, a number of small bugs and typos have been fixed with further
additions to the test suite. There has been further work to follow the Python
PEP8, PEP257 and best practice standard coding style, and all the main code
base has been reformatted with the ``black`` tool.

20 December 2019: Biopython 1.76
================================

This release of Biopython supports Python 2.7, 3.5, 3.6, 3.7 and 3.8. It has
also been tested on PyPy2.7.13 v7.1.1 and PyPy3.6.1 v7.1.1-beta0.

We intend this to be our final release supporting Python 2.7 and 3.5.

As in recent releases, more of our code is now explicitly available under
either our original "Biopython License Agreement", or the very similar but
more commonly used "3-Clause BSD License".  See the ``LICENSE.rst`` \ 
file for
more details.

``PDBParser`` and ``PDBIO`` now support PQR format file parsing and input/
output.

In addition to the mainstream ``x86_64`` aka ``AMD64`` CPU architecture, we
now also test every contribution on the ``ARM64``, ``ppc64le``, and ``s390x``
CPUs under Linux thanks to Travis CI. Further post-release testing done by
Debian and other packagers and distributors of Biopython also covers these
CPUs.

``Bio.motifs.PositionSpecificScoringMatrix.search()`` method has been
re-written: it now applies ``.calculate()`` to chunks of the sequence
to maintain a low memory footprint for long sequences.

Additionally, a number of small bugs and typos have been fixed with further
additions to the test suite. There has been further work to follow the Python
PEP8, PEP257 and best practice standard coding style, and more of the code
style has been reformatted with the ``black`` tool.

6 November 2019: Biopython 1.75
===============================

This release of Biopython supports Python 2.7, 3.5, 3.6, 3.7 and is expected
to work on the soon to be released Python 3.8. It has also been tested on
PyPy2.7.13 v7.1.1 and PyPy3.6.1 v7.1.1-beta0.

Note we intend to drop Python 2.7 support in early 2020.

The restriction enzyme list in ``Bio.Restriction`` has been updated to the
August 2019 release of REBASE.

``Bio.SeqIO`` now supports reading and writing files in the native format of
Christian Marck's DNA Strider program ("xdna" format, also used by Serial
Cloner), as well as reading files in the native formats of GSL Biotech's
SnapGene ("snapgene") and Textco Biosoftware's Gene Construction Kit \ 
("gck").

``Bio.AlignIO`` now supports GCG MSF multiple sequence alignments as the \ 
"msf"
format (work funded by the National Marrow Donor Program).

The main ``Seq`` object now has string-like ``.index()`` and ``.rindex()``
methods, matching the existing ``.find()`` and ``.rfind()`` implementations.
The ``MutableSeq`` object retains its more list-like ``.index()`` behaviour.

The ``MMTFIO`` class has been added that allows writing of MMTF file format
files from a Biopython structure object. ``MMTFIO`` has a similar interface to
``PDBIO`` and ``MMCIFIO``, including the use of a ``Select`` class to write
out a specified selection. This final addition to read/write support for
PDB/mmCIF/MMTF in Biopython allows conversion between all three file formats.

Values from mmCIF files are now read in as a list even when they consist of a
single value. This change improves consistency and reduces the likelihood of
making an error, but will require user code to be updated accordingly.

`Bio.motifs.meme` has been updated to parse XML output files from MEME over
the plain-text output file. The goal of this change is to parse a more
structured data source with minimal loss of functionality upon future MEME
releases.

``Bio.PDB`` has been updated to support parsing REMARK 99 header entries from
PDB-style Astral files.

A new keyword parameter ``full_sequences`` was added to ``Bio.pairwise2``'s
pretty print method ``format_alignment`` to restore the output of local
alignments to the 'old' format (showing the whole sequences including the
un-aligned parts instead of only showing the aligned parts).

A new function ``charge_at_pH(pH)`` has been added to ``ProtParam`` and
``IsoelectricPoint`` in ``Bio.SeqUtils``.

The ``PairwiseAligner`` in ``Bio.Align`` was extended to allow generalized
pairwise alignments, i.e. alignments of any Python object, for example
three-letter amino acid sequences, three-nucleotide codons, and arrays of
integers.

A new module ``substitution_matrices`` was added to ``Bio.Align``, which
includes an ``Array`` class that can be used as a substitution matrix. As
the ``Array`` class is a subclass of a numpy array, mathematical operations
can be applied to it directly, and C code that makes use of substitution
matrices can directly access the numerical values stored in the substitution
matrices. This module is intended as a replacement of ``Bio.SubsMat``,
which is currently unmaintained.

As in recent releases, more of our code is now explicitly available under
either our original "Biopython License Agreement", or the very similar but
more commonly used "3-Clause BSD License".  See the ``LICENSE.rst`` \ 
file for
more details.

Additionally, a number of small bugs and typos have been fixed with further
additions to the test suite, and there has been further work to follow the
Python PEP8, PEP257 and best practice standard coding style. We have also
started to use the ``black`` Python code formatting tool.
   2022-01-04 21:55:40 by Thomas Klausner | Files touched by this commit (1595)
Log message:
*: bump PKGREVISION for egg.mk users

They now have a tool dependency on py-setuptools instead of a DEPENDS