Path to this page:
./
www/py-w3lib,
Python library of web-related functions
Branch: CURRENT,
Version: 2.2.1,
Package name: py312-w3lib-2.2.1,
Maintainer: pkgsrc-usersThis is a Python library of web-related functions, such as:
* remove comments, or tags from HTML snippets
* extract base url from HTML snippets
* translate entites on HTML strings
* convert raw HTTP headers to dicts and vice-versa
* construct HTTP auth header
* converting HTML pages to unicode
* sanitize urls (like browsers do)
* extract arguments from urls
Required to run:[
devel/py-setuptools] [
lang/py-six] [
lang/python37]
Required to build:[
pkgtools/cwrappers]
Master sites:
Filesize: 48.44 KB
Version history: (Expand)
- (2024-11-11) Updated to version: py312-w3lib-2.2.1
- (2024-06-12) Updated to version: py311-w3lib-2.2.1
- (2024-06-11) Updated to version: py311-w3lib-2.2.0
- (2024-01-14) Updated to version: py311-w3lib-2.1.2
- (2023-02-09) Updated to version: py310-w3lib-1.22.0nb1
- (2022-01-05) Updated to version: py39-w3lib-1.22.0nb1
CVS history: (Expand)
2024-01-14 21:49:07 by Adam Ciarcinski | Files touched by this commit (3) | |
Log message:
py-w3lib: updated to 2.1.2
2.1.2 (2023-08-03)
------------------
- Fix test failures on Python 3.11.4+
- Fix an incorrect type hint
- Add project URLs to setup.py
2.1.1 (2022-12-09)
------------------
- :func:`~w3lib.url.safe_url_string`, :func:`~w3lib.url.safe_download_url`
and :func:`~w3lib.url.canonicalize_url` now strip whitespace and control
characters urls according to the URL living standard.
2.1.0 (2022-11-28)
------------------
- Dropped Python 3.6 support, and made Python 3.11 support official.
- :func:`~w3lib.url.safe_url_string` now generates safer URLs.
To make URLs safer for the `URL living standard`_:
.. _URL living standard: https://url.spec.whatwg.org/
- ``;=`` are percent-encoded in the URL username.
- ``;:=`` are percent-encoded in the URL password.
- ``'`` is percent-encoded in the URL query if the URL scheme is `special
<https://url.spec.whatwg.org/#special-scheme>`__.
To make URLs safer for `RFC 2396`_ and `RFC 3986`_, ``|[]`` are
percent-encoded in URL paths, queries, and fragments.
.. _RFC 2396: https://www.ietf.org/rfc/rfc2396.txt
.. _RFC 3986: https://www.ietf.org/rfc/rfc3986.txt
- :func:`~w3lib.encoding.html_to_unicode` now checks for the `byte order
mark`_ before inspecting the ``Content-Type`` header when determining the
content encoding, in line with the `URL living standard`_.
.. _byte order mark: https://en.wikipedia.org/wiki/Byte_order_mark
- :func:`~w3lib.url.canonicalize_url` now strips spaces from the input URL,
to be more in line with the `URL living standard`_.
- :func:`~w3lib.html.get_base_url` now ignores HTML comments.
- Fixed :func:`~w3lib.url.safe_url_string` re-encoding percent signs on
the URL username and password even when they were being used as part of an
escape sequence.
- Fixed :func:`~w3lib.http.basic_auth_header` using the wrong flavor of
base64 encoding, which could prevent authentication in rare cases.
- Fixed :func:`~w3lib.html.replace_entities` raising :exc:`OverflowError` in
some cases due to `a bug in CPython
<https://github.com/python/cpython/issues/76763>`__.
- Improved typing and fixed typing issues.
- Made CI and test improvements.
- Adopted a Code of Conduct.
2.0.1 (2022-08-11)
------------------
Minor documentation fix (release date is set in the changelog).
2.0.0 (2022-08-11)
------------------
Backwards incompatible changes:
- Python 2 is no longer supported; Python 3.6+ is required now
- :func:`w3lib.url.safe_url_string` and :func:`w3lib.url.canonicalize_url`
no longer convert "%23" to "#" when it appears in the URL \
path. This is a bug
fix. It's listed as a backward-incomatible change because in some cases the
output of :func:`w3lib.url.canonicalize_url` is going to change, and so, if
this output is used to generate URL fingerprints, new fingerprints might be
incompatible with those created with the previous w3lib versions
Deprecation removals
- The ``w3lib.form`` module is removed.
- The ``w3lib.html.remove_entities`` function is removed.
- The ``w3lib.url.urljoin_rfc`` function is removed.
The following functions are deprecated, and will be removed in future releases:
- ``w3lib.util.str_to_unicode``
- ``w3lib.util.unicode_to_str``
- ``w3lib.util.to_native_str``
Other improvements and bug fixes:
- Type annotations are added
- Added support for Python 3.9 and 3.10
- Fixed :func:`w3lib.html.get_meta_refresh` for ``<meta>`` tags where
``http-equiv`` is written after ``content``
- Fixed :func:`w3lib.url.safe_url_string` for IDNA domains with ports
- :func:`w3lib.url.url_query_cleaner` no longer adds an unneeded ``#`` when
``keep_fragments=True`` is passed, and the URL doesn't have a fragment
- Removed a workaround for an ancient pathname2url bug
- CI is migrated to GitHub Actions
- The code is formatted using black
1.22.0 (2020-05-13)
-------------------
- Python 3.4 is no longer supported
- :func:`w3lib.url.safe_url_string` now supports an optional ``quote_path``
parameter to disable the percent-encoding of the URL path
- :func:`w3lib.url.add_or_replace_parameter` and
:func:`w3lib.url.add_or_replace_parameters` no longer remove duplicate
parameters from the original query string that are not being added or
replaced
- :func:`w3lib.html.remove_tags` now raises a :exc:`ValueError` exception
instead of :exc:`AssertionError` when using both the ``which_ones`` and the
``keep`` parameters
- Test improvements
- Documentation improvements
- Code cleanup
|
2022-01-04 21:55:40 by Thomas Klausner | Files touched by this commit (1595) |
Log message:
*: bump PKGREVISION for egg.mk users
They now have a tool dependency on py-setuptools instead of a DEPENDS
|
2021-10-26 13:31:15 by Nia Alarie | Files touched by this commit (1030) |
Log message:
www: Replace RMD160 checksums with BLAKE2s checksums
All checksums have been double-checked against existing RMD160 and
SHA512 hashes
Not committed (merge conflicts):
www/nghttp2/distinfo
Unfetchable distfiles (almost certainly fetched conditionally...):
./www/nginx-devel/distinfo array-var-nginx-module-0.05.tar.gz
./www/nginx-devel/distinfo echo-nginx-module-0.62.tar.gz
./www/nginx-devel/distinfo encrypted-session-nginx-module-0.08.tar.gz
./www/nginx-devel/distinfo form-input-nginx-module-0.12.tar.gz
./www/nginx-devel/distinfo headers-more-nginx-module-0.33.tar.gz
./www/nginx-devel/distinfo lua-nginx-module-0.10.19.tar.gz
./www/nginx-devel/distinfo naxsi-1.3.tar.gz
./www/nginx-devel/distinfo nginx-dav-ext-module-3.0.0.tar.gz
./www/nginx-devel/distinfo nginx-rtmp-module-1.2.2.tar.gz
./www/nginx-devel/distinfo nginx_http_push_module-1.2.10.tar.gz
./www/nginx-devel/distinfo ngx_cache_purge-2.5.1.tar.gz
./www/nginx-devel/distinfo ngx_devel_kit-0.3.1.tar.gz
./www/nginx-devel/distinfo ngx_http_geoip2_module-3.3.tar.gz
./www/nginx-devel/distinfo njs-0.5.0.tar.gz
./www/nginx-devel/distinfo set-misc-nginx-module-0.32.tar.gz
./www/nginx/distinfo array-var-nginx-module-0.05.tar.gz
./www/nginx/distinfo echo-nginx-module-0.62.tar.gz
./www/nginx/distinfo encrypted-session-nginx-module-0.08.tar.gz
./www/nginx/distinfo form-input-nginx-module-0.12.tar.gz
./www/nginx/distinfo headers-more-nginx-module-0.33.tar.gz
./www/nginx/distinfo lua-nginx-module-0.10.19.tar.gz
./www/nginx/distinfo naxsi-1.3.tar.gz
./www/nginx/distinfo nginx-dav-ext-module-3.0.0.tar.gz
./www/nginx/distinfo nginx-rtmp-module-1.2.2.tar.gz
./www/nginx/distinfo nginx_http_push_module-1.2.10.tar.gz
./www/nginx/distinfo ngx_cache_purge-2.5.1.tar.gz
./www/nginx/distinfo ngx_devel_kit-0.3.1.tar.gz
./www/nginx/distinfo ngx_http_geoip2_module-3.3.tar.gz
./www/nginx/distinfo njs-0.5.0.tar.gz
./www/nginx/distinfo set-misc-nginx-module-0.32.tar.gz
|
2021-10-07 17:09:00 by Nia Alarie | Files touched by this commit (1033) |
Log message:
www: Remove SHA1 hashes for distfiles
|
2020-05-14 08:06:42 by Adam Ciarcinski | Files touched by this commit (2) | |
Log message:
py-w3lib: updated to 1.22.0
1.22.0:
- Python 3.4 is no longer supported
- :func:`w3lib.url.safe_url_string` now supports an optional ``quote_path``
parameter to disable the percent-encoding of the URL path
- :func:`w3lib.url.add_or_replace_parameter` and
:func:`w3lib.url.add_or_replace_parameters` no longer remove duplicate
parameters from the original query string that are not being added or
replaced
- :func:`w3lib.html.remove_tags` now raises a :exc:`ValueError` exception
instead of :exc:`AssertionError` when using both the ``which_ones`` and the
``keep`` parameters
- Test improvements
- Documentation improvements
- Code cleanup
|
2019-08-12 22:03:01 by Adam Ciarcinski | Files touched by this commit (2) | |
Log message:
py-w3lib: updated to 1.21.0
1.21.0:
- Add the encoding and path_encoding parameters to
:func:w3lib.url.safe_download_url
- :func:w3lib.url.safe_url_string now also removes tabs and new lines
- :func:w3lib.html.remove_comments now also removes truncated comments
- :func:w3lib.html.remove_tags_with_content no longer removes tags which
start with the same text as one of the specified tags
- Recommend pytest instead of nose to run tests
|
2019-01-16 00:05:37 by Adam Ciarcinski | Files touched by this commit (2) | |
Log message:
py-w3lib: updated to 1.20.0
1.20.0:
- Fix url_query_cleaner to do not append "?" to urls without a query string
- Add support for Python 3.7 and drop Python 3.3
- Add w3lib.url.add_or_replace_parameters helper
- Documentation fixes
|
2018-01-26 09:06:07 by Adam Ciarcinski | Files touched by this commit (2) | |
Log message:
py-w3lib: updated to 1.19.0
1.19.0:
- Add a workaround for CPython segfault (https://bugs.python.org/issue32583)
which affect w3lib.encoding functions. This is technically **backwards
incompatible** because it changes the way non-decodable bytes are replaced
(in some cases instead of two ``\ufffd`` chars you can get one).
As a side effect, the fix speeds up decoding in Python 3.4+.
- Add 'encoding' parameter for w3lib.http.basic_auth_header.
- Fix pypy testing setup, add pypy3 to CI.
|