Path to this page:
./
textproc/py-lxml-html-clean,
HTML cleaner from lxml project
Branch: CURRENT,
Version: 0.4.1,
Package name: py312-lxml-html-clean-0.4.1,
Maintainer: pkgsrc-usersThis project was initially a part of lxml. Because HTML cleaner is designed as
blocklist-based, many reports about possible security vulnerabilities were
filed for lxml and that make the project problematic for security-sensitive
environments. Therefore we decided to extract the problematic part to a
separate project.
Master sites:
Filesize: 20.877 KB
Version history: (Expand)
- (2024-11-17) Updated to version: py312-lxml-html-clean-0.4.1
- (2024-11-16) Updated to version: py312-lxml-html-clean-0.4.0
- (2024-10-14) Updated to version: py312-lxml-html-clean-0.3.1
- (2024-09-07) Updated to version: py312-lxml-html-clean-0.2.2
- (2024-08-29) Updated to version: py312-lxml-html-clean-0.2.1
- (2024-07-30) Updated to version: py311-lxml-html-clean-0.2.0
CVS history: (Expand)
2024-11-17 13:28:01 by Adam Ciarcinski | Files touched by this commit (2) | |
Log message:
py-lxml-html-clean: updated to 0.4.1
0.4.1 (2024-11-15)
Bugs fixed
* Removed superfluous debug prints.
|
2024-11-16 11:19:49 by Adam Ciarcinski | Files touched by this commit (2) | |
Log message:
py-lxml-html-clean: updated to 0.4.0
0.4.0 (2024-11-12)
Bugs fixed
* The ``Cleaner()`` now scans for hidden JavaScript code embedded
within CSS comments. In certain contexts, such as within ``<svg>`` or \
``<math>`` tags,
``<style>`` tags may lose their intended function, allowing comments
like ``/* foo */`` to potentially be executed by the browser.
If a suspicious content is detected, only the comment is removed.
|
2024-11-11 08:29:31 by Thomas Klausner | Files touched by this commit (862) |
Log message:
py-*: remove unused tool dependency
py-setuptools includes the py-wheel functionality nowadays
|
2024-10-14 07:13:13 by Adam Ciarcinski | Files touched by this commit (2) | |
Log message:
py-lxml-html-clean: updated to 0.3.1
0.3.1 (2024-10-09)
Features added
* Do not parse URL addresses when it is not necessary.
0.3.0 (2024-10-09)
Features added
* Parsing of URL addresses has been enhanced and Cleaner removes ambiguous URLs.
|
2024-09-07 07:39:08 by Adam Ciarcinski | Files touched by this commit (2) | |
Log message:
py-lxml-html-clean: updated to 0.2.2
0.2.2 (2024-08-30)
Bugs fixed
* sdist now includes all test files and changelog.
|
2024-08-29 14:25:11 by Adam Ciarcinski | Files touched by this commit (2) | |
Log message:
py-lxml-html-clean: updated to 0.2.1
0.2.1
Bugs fixed
Memory efficiency is now much better for HTML pages where cleaner removes a lot \
of elements.
|
2024-07-30 05:52:59 by Adam Ciarcinski | Files touched by this commit (3) | |
Log message:
py-lxml-html-clean: updated to 0.2.0
0.2.0 (2024-07-29)
Features added
ASCII control characters (except HT, VT, CR and LF) are now removed from string \
inputs before they're parsed by lxml/libxml2.
|
2024-05-27 16:39:29 by Adam Ciarcinski | Files touched by this commit (4) |
Log message:
py-lxml-html-clean: added version 0.1.1
This project was initially a part of lxml. Because HTML cleaner is designed as
blocklist-based, many reports about possible security vulnerabilities were
filed for lxml and that make the project problematic for security-sensitive
environments. Therefore we decided to extract the problematic part to a
separate project.
|