./www/py-beautifulsoup4, HTML/XML Parser for Python, version 4

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 4.7.1, Package name: py27-beautifulsoup4-4.7.1, Maintainer: pkgsrc-users

Beautiful Soup is a Python library designed for quick turnaround projects like
screen-scraping. Three features make it powerful:

* Beautiful Soup provides a few simple methods and Pythonic idioms for
navigating, searching, and modifying a parse tree: a toolkit for dissecting a
document and extracting what you need. It doesn't take much code to write an
* Beautiful Soup automatically converts incoming documents to Unicode and
outgoing documents to UTF-8. You don't have to think about encodings, unless
the document doesn't specify an encoding and Beautiful Soup can't autodetect
one. Then you just have to specify the original encoding.
* Beautiful Soup sits on top of popular Python parsers like lxml and html5lib,
allowing you to try out different parsing strategies or trade speed for

Beautiful Soup parses anything you give it, and does the tree traversal stuff
for you. You can tell it "Find all the links", or "Find all the links of class
externalLink", or "Find all the links whose urls match "foo.com", or "Find the
table heading that's got bold text, then give me that text."

Valuable data that was once locked up in poorly-designed websites is now within
your reach. Projects that would have taken hours take only minutes with
Beautiful Soup.

Required to run:
[devel/py-setuptools] [textproc/py-lxml] [lang/python27] [www/py-soupsieve]

Required to build:

Master sites:

SHA1: 1000ef6113d020d5140d556862107c3d502dd5ed
RMD160: 177353bdcd7b87aa7a922c807271df81f0bcddb4
Filesize: 163.149 KB

Version history: (Expand)

CVS history: (Expand)

   2019-01-08 10:30:44 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-beautifulsoup4: updated to 4.7.1


* Fixed a significant performance problem introduced in 4.7.0.

* Fixed an incorrectly raised exception when inserting a tag before or
  after an identical tag.

* Beautiful Soup will no longer try to keep track of namespaces that
  are not defined with a prefix; this can confuse soupselect.

* Tried even harder to avoid the deprecation warning originally fixed in
   2019-01-02 11:36:08 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-beautifulsoup4: updated to 4.7.0


* Beautiful Soup's CSS Selector implementation has been replaced by a
  dependency on Isaac Muse's SoupSieve project (the soupsieve package
  on PyPI). The good news is that SoupSieve has a much more robust and
  complete implementation of CSS selectors, resolving a large number
  of longstanding issues. The bad news is that from this point onward,
  SoupSieve must be installed if you want to use the select() method.

  You don't have to change anything lf you installed Beautiful Soup
  through pip (SoupSieve will be automatically installed when you
  upgrade Beautiful Soup) or if you don't use CSS selectors from
  within Beautiful Soup.

  SoupSieve documentation: https://facelessuser.github.io/soupsieve/

* Fix a number of problems with the tree builder that caused
  trees that were superficially okay, but which fell apart when bits
  were extracted.

* Fixed a problem with the tree builder in which elements that
  contained no content (such as empty comments and all-whitespace
  elements) were not being treated as part of the tree.

* Fixed a problem with multi-valued attributes where the value
  contained whitespace.

* Clarified ambiguous license statements in the source code. Beautiful
  Soup is released under the MIT license, and has been since 4.4.0.

* This file has been renamed from NEWS.txt to CHANGELOG.
   2018-08-14 09:26:20 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-beautifulsoup4: updated to 4.6.3

* Exactly the same as 4.6.2. Re-released to make the README file
  render properly on PyPI.

* Fix an exception when a custom formatter was asked to format a void
   2018-08-02 17:31:03 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-beautifulsoup4: updated to 4.6.1


* Stop data loss when encountering an empty numeric entity, and
  possibly in other cases.

* Preserve XML namespaces introduced inside an XML document, not just
   the ones introduced at the top level.

* Added a new formatter, "html5", which represents void elements
   as "<element>" rather than "<element/>".

* Fixed a problem where the html.parser tree builder interpreted
  a string like "&foo " as the character entity "&foo;"

* Correctly handle invalid HTML numeric character entities
  which reference code points that are not Unicode code points. Note
  that this is only fixed when Beautiful Soup is used with the
  html.parser parser -- html5lib already worked and I couldn't fix it
  with lxml.

* Improved the warning given when no parser is specified.

* When markup contains duplicate elements, a select() call that
  includes multiple match clauses will match all relevant

* Fixed code that was causing deprecation warnings in recent Python 3

* Fixed a Windows crash in diagnose() when checking whether a long
  markup string is a filename.

* Stopped HTMLParser from raising an exception in very rare cases of
  bad markup.

* Fixed a bug where find_all() was not working when asked to find a
  tag with a namespaced name in an XML document that was parsed as

* You can get finer control over formatting by subclassing
  bs4.element.Formatter and passing a Formatter instance into (e.g.)

* You can pass a dictionary of `attrs` into
  BeautifulSoup.new_tag. This makes it possible to create a tag with
  an attribute like 'name' that would otherwise be masked by another
  argument of new_tag.

* Clarified the deprecation warning when accessing tag.fooTag, to cover
  the possibility that you might really have been looking for a tag
  called 'fooTag'.
   2017-09-03 10:53:18 by Thomas Klausner | Files touched by this commit (165)
Log message:
Follow some redirects.
   2017-05-09 22:05:18 by Adam Ciarcinski | Files touched by this commit (3)
Log message:
= 4.6.0 (20170507) =

* Added the `Tag.get_attribute_list` method, which acts like `Tag.get` for
  getting the value of an attribute, but which always returns a list,
  whether or not the attribute is a multi-value attribute.

* It's now possible to use a tag's namespace prefix when searching,
  e.g. soup.find('namespace:tag')

* Improved the handling of empty-element tags like <br> when using the
  html.parser parser.

* HTML parsers treat all HTML4 and HTML5 empty element tags (aka void
  element tags) correctly.

* Namespace prefix is preserved when an XML tag is copied. Thanks
  to Vikas for a patch and test.
   2017-02-12 05:01:39 by Wen Heping | Files touched by this commit (3) | Package updated
Log message:
Update to 4.5.3

Upstream changes:
= 4.5.3 (20170102) =

* Fixed foster parenting when html5lib is the tree builder. Thanks to
  Geoffrey Sneddon for a patch and test.

* Fixed yet another problem that caused the html5lib tree builder to
  create a disconnected parse tree. [bug=1629825]

= 4.5.2 (20170102) =

* Apart from the version number, this release is identical to
  4.5.3. Due to user error, it could not be completely uploaded to
  PyPI. Use 4.5.3 instead.
   2016-08-09 16:40:54 by Leonardo Taccari | Files touched by this commit (3) | Package updated
Log message:
Update www/py-beautifulsoup4 to 4.5.1

= 4.5.1 (20160802) =
* Fixed a crash when passing Unicode markup that contained a
  processing instruction into the lxml HTML parser on Python
  3. [bug=1608048]

= 4.5.0 (20160719) =
* Beautiful Soup is no longer compatible with Python 2.6. This
  actually happened a few releases ago, but it's now official.
* Beautiful Soup will now work with versions of html5lib greater than
  0.99999999. [bug=1603299]
* If a search against each individual value of a multi-valued
  attribute fails, the search will be run one final time against the
  complete attribute value considered as a single string. That is, if
  a tag has class="foo bar" and neither "foo" nor \ 
"bar" matches, but
  "foo bar" does, the tag is now considered a match.
  This happened in previous versions, but only when the value being
  searched for was a string. Now it also works when that value is
  a regular expression, a list of strings, etc. [bug=1476868]
* Fixed a bug that deranged the tree when a whitespace element was
  reparented into a tag that contained an identical whitespace
  element. [bug=1505351]
* Added support for CSS selector values that contain quoted spaces,
  such as tag[style="display: foo"]. [bug=1540588]
* Corrected handling of XML processing instructions. [bug=1504393]
* Corrected an encoding error that happened when a BeautifulSoup
  object was copied. [bug=1554439]
* The contents of <textarea> tags will no longer be modified when the
  tree is prettified. [bug=1555829]
* When a BeautifulSoup object is pickled but its tree builder cannot
  be pickled, its .builder attribute is set to None instead of being
  destroyed. This avoids a performance problem once the object is
  unpickled. [bug=1523629]
* Specify the file and line number when warning about a
  BeautifulSoup object being instantiated without a parser being
  specified. [bug=1574647]
* The `limit` argument to `select()` now works correctly, though it's
  not implemented very efficiently. [bug=1520530]
* Fixed a Python 3 ByteWarning when a URL was passed in as though it
  were markup. Thanks to James Salter for a patch and
  test. [bug=1533762]
* We don't run the check for a filename passed in as markup if the
  'filename' contains a less-than character; the less-than character
  indicates it's most likely a very small document. [bug=1577864]

= 4.4.1 (20150928) =
* Fixed a bug that deranged the tree when part of it was
  removed. Thanks to Eric Weiser for the patch and John Wiseman for a
  test. [bug=1481520]
* Fixed a parse bug with the html5lib tree-builder. Thanks to Roel
  Kramer for the patch. [bug=1483781]
* Improved the implementation of CSS selector grouping. Thanks to
  Orangain for the patch. [bug=1484543]
* Fixed the test_detect_utf8 test so that it works when chardet is
  installed. [bug=1471359]
* Corrected the output of Declaration objects. [bug=1477847]

= 4.4.0 (20150703) =
Especially important changes:
* Added a warning when you instantiate a BeautifulSoup object without
  explicitly naming a parser. [bug=1398866]
* __repr__ now returns an ASCII bytestring in Python 2, and a Unicode
  string in Python 3, instead of a UTF8-encoded bytestring in both
  versions. In Python 3, __str__ now returns a Unicode string instead
  of a bytestring. [bug=1420131]
* The `text` argument to the find_* methods is now called `string`,
  which is more accurate. `text` still works, but `string` is the
  argument described in the documentation. `text` may eventually
  change its meaning, but not for a very long time. [bug=1366856]
* Changed the way soup objects work under copy.copy(). Copying a
  NavigableString or a Tag will give you a new NavigableString that's
  equal to the old one but not connected to the parse tree. Patch by
  Martijn Peters. [bug=1307490]
* Started using a standard MIT license. [bug=1294662]
* Added a Chinese translation of the documentation by Delong .w.
New features:
* Introduced the select_one() method, which uses a CSS selector but
  only returns the first match, instead of a list of
  matches. [bug=1349367]
* You can now create a Tag object without specifying a
  TreeBuilder. Patch by Martijn Pieters. [bug=1307471]
* You can now create a NavigableString or a subclass just by invoking
  the constructor. [bug=1294315]
* Added an `exclude_encodings` argument to UnicodeDammit and to the
  Beautiful Soup constructor, which lets you prohibit the detection of
  an encoding that you know is wrong. [bug=1469408]
* The select() method now supports selector grouping. Patch by
  Francisco Canas [bug=1191917]
Bug fixes:
* Fixed yet another problem that caused the html5lib tree builder to
  create a disconnected parse tree. [bug=1237763]
* Force object_was_parsed() to keep the tree intact even when an element
  from later in the document is moved into place. [bug=1430633]
* Fixed yet another bug that caused a disconnected tree when html5lib
  copied an element from one part of the tree to another. [bug=1270611]
* Fixed a bug where Element.extract() could create an infinite loop in
  the remaining tree.
* The select() method can now find tags whose names contain
  dashes. Patch by Francisco Canas. [bug=1276211]
* The select() method can now find tags with attributes whose names
  contain dashes. Patch by Marek Kapolka. [bug=1304007]
* Improved the lxml tree builder's handling of processing
  instructions. [bug=1294645]
* Restored the helpful syntax error that happens when you try to
  import the Python 2 edition of Beautiful Soup under Python
  3. [bug=1213387]
* In Python 3.4 and above, set the new convert_charrefs argument to
  the html.parser constructor to avoid a warning and future
  failures. Patch by Stefano Revera. [bug=1375721]
* The warning when you pass in a filename or URL as markup will now be
  displayed correctly even if the filename or URL is a Unicode
  string. [bug=1268888]
* If the initial <html> tag contains a CDATA list attribute such as
  'class', the html5lib tree builder will now turn its value into a
  list, as it would with any other tag. [bug=1296481]
* Fixed an import error in Python 3.5 caused by the removal of the
  HTMLParseError class. [bug=1420063]
* Improved docstring for encode_contents() and
  decode_contents(). [bug=1441543]
* Fixed a crash in Unicode, Dammit's encoding detector when the name
  of the encoding itself contained invalid bytes. [bug=1360913]
* Improved the exception raised when you call .unwrap() or
  .replace_with() on an element that's not attached to a tree.
* Raise a NotImplementedError whenever an unsupported CSS pseudoclass
  is used in select(). Previously some cases did not result in a
* It's now possible to pickle a BeautifulSoup object no matter which
  tree builder was used to create it. However, the only tree builder
  that survives the pickling process is the HTMLParserTreeBuilder
  ('html.parser'). If you unpickle a BeautifulSoup object created with
  some other tree builder, soup.builder will be None. [bug=1231545]