Subject: CVS commit: pkgsrc/www/py-beautifulsoup4
From: Adam Ciarcinski
Date: 2022-11-30 18:00:39
Message id: 20221130170039.31221FA90@cvs.NetBSD.org

Log Message:
py-beautifulsoup4: updated to 4.11.1

4.11.1 (20220408)

This release was done to ensure that the unit tests are packaged along
with the released source. There are no functionality changes in this
release, but there are a few other packaging changes:

* The Japanese and Korean translations of the documentation are included.
* The changelog is now packaged as CHANGELOG, and the license file is
  packaged as LICENSE. NEWS.txt and COPYING.txt are still present,
  but may be removed in the future.
* TODO.txt is no longer packaged, since a TODO is not relevant for released
  code.

4.11.0 (20220407)

* Ported unit tests to use pytest.

* Added special string classes, RubyParenthesisString and RubyTextString,
  to make it possible to treat ruby text specially in get_text() calls.

* It's now possible to customize the way output is indented by
  providing a value for the 'indent' argument to the Formatter
  constructor. The 'indent' argument works very similarly to the
  argument of the same name in the Python standard library's
  json.dump() function.

* If the charset-normalizer Python module
  (https://pypi.org/project/charset-normalizer/) is installed, Beautiful
  Soup will use it to detect the character sets of incoming documents.
  This is also the module used by newer versions of the Requests library.
  For the sake of backwards compatibility, chardet and cchardet both take
  precedence if installed.

* Added a workaround for an lxml bug
  (https://bugs.launchpad.net/lxml/+bug/1948551) that causes
  problems when parsing a Unicode string beginning with BYTE ORDER MARK.

* Issue a warning when an HTML parser is used to parse a document that
  looks like XML but not XHTML.

* Do a better job of keeping track of namespaces as an XML document is
  parsed, so that CSS selectors that use namespaces will do the right
  thing more often.

* Some time ago, the misleadingly named "text" argument to find-type
  methods was renamed to the more accurate "string." But this supposed
  "renaming" didn't make it into important places like the method
  signatures or the docstrings. That's corrected in this
  version. "text" still works, but will give a DeprecationWarning.

* Fixed a crash when pickling a BeautifulSoup object that has no
  tree builder.

* Fixed a crash when overriding multi_valued_attributes and using the
  html5lib parser.

* Standardized the wording of the MarkupResemblesLocatorWarning
  warnings to omit untrusted input and make the warnings less
  judgmental about what you ought to be doing.

* Removed support for the iconv_codec library, which doesn't seem
  to exist anymore and was never put up on PyPI. (The closest
  replacement on PyPI, iconv_codecs, is GPL-licensed, so we can't use
  it--it's also quite old.)

4.10.0 (20210907)

* This is the first release of Beautiful Soup to only support Python
  3. I dropped Python 2 support to maintain support for newer versions
  (58 and up) of setuptools. See:
  https://github.com/pypa/setuptools/issues/2769

* The behavior of methods like .get_text() and .strings now differs
  depending on the type of tag. The change is visible with HTML tags
  like <script>, <style>, and <template>. Starting in 4.9.0, \ 
methods
  like get_text() returned no results on such tags, because the
  contents of those tags are not considered 'text' within the document
  as a whole.

  But a user who calls script.get_text() is working from a different
  definition of 'text' than a user who calls div.get_text()--otherwise
  there would be no need to call script.get_text() at all. In 4.10.0,
  the contents of (e.g.) a <script> tag are considered 'text' during a
  get_text() call on the tag itself, but not considered 'text' during
  a get_text() call on the tag's parent.

  Because of this change, calling get_text() on each child of a tag
  may now return a different result than calling get_text() on the tag
  itself. That's because different tags now have different
  understandings of what counts as 'text'.

* NavigableString and its subclasses now implement the get_text()
  method, as well as the properties .strings and
  .stripped_strings. These methods will either return the string
  itself, or nothing, so the only reason to use this is when iterating
  over a list of mixed Tag and NavigableString objects.

* The 'html5' formatter now treats attributes whose values are the
  empty string as HTML boolean attributes. Previously (and in other
  formatters), an attribute value must be set as None to be treated as
  a boolean attribute. In a future release, I plan to also give this
  behavior to the 'html' formatter. Patch by Isaac Muse.

* The 'replace_with()' method now takes a variable number of arguments,
  and can be used to replace a single element with a sequence of elements.
  Patch by Bill Chandos. [rev=605]

* Corrected output when the namespace prefix associated with a
  namespaced attribute is the empty string, as opposed to
  None.

* Performance improvement when processing tags that speeds up overall
  tree construction by 2%. Patch by Morotti.

* Corrected the use of special string container classes in cases when a
  single tag may contain strings with different containers; such as
  the <template> tag, which may contain both TemplateString objects
  and Comment objects.

* The html.parser tree builder can now handle named entities
  found in the HTML5 spec in much the same way that the html5lib
  tree builder does. Note that the lxml HTML tree builder doesn't handle
  named entities this way.

* Added a second way to pass specify encodings to UnicodeDammit and
  EncodingDetector, based on the order of precedence defined in the
  HTML5 spec, starting at:
  \ 
https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding

  Encodings in 'known_definite_encodings' are tried first, then
  byte-order-mark sniffing is run, then encodings in 'user_encodings'
  are tried. The old argument, 'override_encodings', is now a
  deprecated alias for 'known_definite_encodings'.

  This changes the default behavior of the html.parser and lxml tree
  builders, in a way that may slightly improve encoding
  detection but will probably have no effect.

* Improve the warning issued when a directory name (as opposed to
  the name of a regular file) is passed as markup into the BeautifulSoup
  constructor.

Files:
RevisionActionfile
1.25modifypkgsrc/www/py-beautifulsoup4/Makefile
1.9modifypkgsrc/www/py-beautifulsoup4/PLIST
1.21modifypkgsrc/www/py-beautifulsoup4/distinfo
1.1removepkgsrc/www/py-beautifulsoup4/patches/patch-setup.py