./www/py-beautifulsoup4, HTML/XML Parser for Python, version 4

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 4.8.2, Package name: py37-beautifulsoup4-4.8.2, Maintainer: pkgsrc-users

Beautiful Soup is a Python library designed for quick turnaround projects like
screen-scraping. Three features make it powerful:

* Beautiful Soup provides a few simple methods and Pythonic idioms for
navigating, searching, and modifying a parse tree: a toolkit for dissecting a
document and extracting what you need. It doesn't take much code to write an
* Beautiful Soup automatically converts incoming documents to Unicode and
outgoing documents to UTF-8. You don't have to think about encodings, unless
the document doesn't specify an encoding and Beautiful Soup can't autodetect
one. Then you just have to specify the original encoding.
* Beautiful Soup sits on top of popular Python parsers like lxml and html5lib,
allowing you to try out different parsing strategies or trade speed for

Beautiful Soup parses anything you give it, and does the tree traversal stuff
for you. You can tell it "Find all the links", or "Find all the links of class
externalLink", or "Find all the links whose urls match "foo.com", or "Find the
table heading that's got bold text, then give me that text."

Valuable data that was once locked up in poorly-designed websites is now within
your reach. Projects that would have taken hours take only minutes with
Beautiful Soup.

Required to run:
[devel/py-setuptools] [textproc/py-lxml] [lang/python37] [www/py-soupsieve]

Required to build:

Master sites:

SHA1: cf63aa6ecfdbc243a696ff3ae60936109ca48058
RMD160: 0f37b2da01e72e4a777174ecb266a17a907e204f
Filesize: 291.65 KB

Version history: (Expand)

CVS history: (Expand)

   2020-01-08 22:08:26 by Adam Ciarcinski | Files touched by this commit (3) | Package updated
Log message:
py-beautifulsoup4: updated to 4.8.2


* Added Python docstrings to all public methods of the most commonly
  used classes.

* Added a Chinese translation by Deron Wang and a Brazilian Portuguese
  translation by Cezar Peixeiro to the repository.

* Fixed two deprecation warnings.

* The html.parser tree builder now correctly handles DOCTYPEs that are
  not uppercase.

* PageElement.select() now returns a ResultSet rather than a regular
  list, making it consistent with methods like find_all().
   2019-10-15 19:21:35 by Adam Ciarcinski | Files touched by this commit (3) | Package updated
Log message:
py-beautifulsoup4: updated to 4.8.1


* When the html.parser or html5lib parsers are in use, Beautiful Soup
  will, by default, record the position in the original document where
  each tag was encountered. This includes line number (Tag.sourceline)
  and position within a line (Tag.sourcepos).  Based on code by Chris

* When instantiating a BeautifulSoup object, it's now possible to
   provide a dictionary ('element_classes') of the classes you'd like to be
   instantiated instead of Tag, NavigableString, etc.

* Fixed the definition of the default XML namespace when using
   lxml 4.4. Patch by Isaac Muse.

* Fixed a crash when pretty-printing tags that were not created
   during initial parsing.

* Copying a Tag preserves information that was originally obtained from
   the TreeBuilder used to build the original Tag.

* Raise an explanatory exception when the underlying parser
   completely rejects the incoming markup.

* Avoid a crash when trying to detect the declared encoding of a
   Unicode document.

* Avoid a crash when unpickling certain parse trees generated
   using html5lib on Python 3.
   2019-07-21 10:05:32 by Adam Ciarcinski | Files touched by this commit (3) | Package updated
Log message:
py-beautifulsoup4: updated to 4.8.0


This release focuses on making it easier to customize Beautiful Soup's
input mechanism (the TreeBuilder) and output mechanism (the Formatter).

* You can customize the TreeBuilder object by passing keyword
  arguments into the BeautifulSoup constructor. Those keyword
  arguments will be passed along into the TreeBuilder constructor.

  The main reason to do this right now is to change how which
  attributes are treated as multi-valued attributes (the way 'class'
  is treated by default). You can do this with the
  'multi_valued_attributes' argument.

* The role of Formatter objects has been greatly expanded. The Formatter
  class now controls the following:

  - The function to call to perform entity substitution. (This was
    previously Formatter's only job.)
  - Which tags should be treated as containing CDATA and have their
    contents exempt from entity substitution.
  - The order in which a tag's attributes are output.
  - Whether or not to put a '/' inside a void element, e.g. '<br/>' vs \ 

  All preexisting code should work as before.

* Added a new method to the API, Tag.smooth(), which consolidates
  multiple adjacent NavigableString elements.

* &apos; (which is valid in XML, XHTML, and HTML 5, but not HTML 4) is always
  recognized as a named entity and converted to a single quote.
   2019-01-08 10:30:44 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-beautifulsoup4: updated to 4.7.1


* Fixed a significant performance problem introduced in 4.7.0.

* Fixed an incorrectly raised exception when inserting a tag before or
  after an identical tag.

* Beautiful Soup will no longer try to keep track of namespaces that
  are not defined with a prefix; this can confuse soupselect.

* Tried even harder to avoid the deprecation warning originally fixed in
   2019-01-02 11:36:08 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-beautifulsoup4: updated to 4.7.0


* Beautiful Soup's CSS Selector implementation has been replaced by a
  dependency on Isaac Muse's SoupSieve project (the soupsieve package
  on PyPI). The good news is that SoupSieve has a much more robust and
  complete implementation of CSS selectors, resolving a large number
  of longstanding issues. The bad news is that from this point onward,
  SoupSieve must be installed if you want to use the select() method.

  You don't have to change anything lf you installed Beautiful Soup
  through pip (SoupSieve will be automatically installed when you
  upgrade Beautiful Soup) or if you don't use CSS selectors from
  within Beautiful Soup.

  SoupSieve documentation: https://facelessuser.github.io/soupsieve/

* Fix a number of problems with the tree builder that caused
  trees that were superficially okay, but which fell apart when bits
  were extracted.

* Fixed a problem with the tree builder in which elements that
  contained no content (such as empty comments and all-whitespace
  elements) were not being treated as part of the tree.

* Fixed a problem with multi-valued attributes where the value
  contained whitespace.

* Clarified ambiguous license statements in the source code. Beautiful
  Soup is released under the MIT license, and has been since 4.4.0.

* This file has been renamed from NEWS.txt to CHANGELOG.
   2018-08-14 09:26:20 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-beautifulsoup4: updated to 4.6.3

* Exactly the same as 4.6.2. Re-released to make the README file
  render properly on PyPI.

* Fix an exception when a custom formatter was asked to format a void
   2018-08-02 17:31:03 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-beautifulsoup4: updated to 4.6.1


* Stop data loss when encountering an empty numeric entity, and
  possibly in other cases.

* Preserve XML namespaces introduced inside an XML document, not just
   the ones introduced at the top level.

* Added a new formatter, "html5", which represents void elements
   as "<element>" rather than "<element/>".

* Fixed a problem where the html.parser tree builder interpreted
  a string like "&foo " as the character entity "&foo;"

* Correctly handle invalid HTML numeric character entities
  which reference code points that are not Unicode code points. Note
  that this is only fixed when Beautiful Soup is used with the
  html.parser parser -- html5lib already worked and I couldn't fix it
  with lxml.

* Improved the warning given when no parser is specified.

* When markup contains duplicate elements, a select() call that
  includes multiple match clauses will match all relevant

* Fixed code that was causing deprecation warnings in recent Python 3

* Fixed a Windows crash in diagnose() when checking whether a long
  markup string is a filename.

* Stopped HTMLParser from raising an exception in very rare cases of
  bad markup.

* Fixed a bug where find_all() was not working when asked to find a
  tag with a namespaced name in an XML document that was parsed as

* You can get finer control over formatting by subclassing
  bs4.element.Formatter and passing a Formatter instance into (e.g.)

* You can pass a dictionary of `attrs` into
  BeautifulSoup.new_tag. This makes it possible to create a tag with
  an attribute like 'name' that would otherwise be masked by another
  argument of new_tag.

* Clarified the deprecation warning when accessing tag.fooTag, to cover
  the possibility that you might really have been looking for a tag
  called 'fooTag'.
   2017-09-03 10:53:18 by Thomas Klausner | Files touched by this commit (165)
Log message:
Follow some redirects.