pkgsrc.se | The NetBSD package collection

./converters/py-charset-normalizer, Universal Charset Detector

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 3.4.1, Package name: py312-charset-normalizer-3.4.1, Maintainer: pkgsrc-users

A library that helps you read text from an unknown charset encoding.

Master sites:

https://files.pythonhosted.org/packages/source/c/charset-normalizer/ (Download)

Filesize: 120.301 KB

Version history: (Expand)

(2024-12-26) Updated to version: py312-charset-normalizer-3.4.1
(2024-10-10) Updated to version: py312-charset-normalizer-3.4.0
(2023-11-01) Updated to version: py311-charset-normalizer-3.3.2
(2023-10-23) Updated to version: py311-charset-normalizer-3.3.1
(2023-09-30) Updated to version: py311-charset-normalizer-3.3.0
(2023-07-08) Updated to version: py310-charset-normalizer-3.2.0

CVS history: (Expand)

2024-12-26 18:20:37 by Adam Ciarcinski | Files touched by this commit (2) | Package updated

Log message:
py-charset-normalizer: updated to 3.4.1

3.4.1

Changed
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` \ 
using setuptools as the build backend.
- Enforce annotation delayed loading for a simpler and consistent types in the \ 
project.
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8

Added
- pre-commit configuration.
- noxfile.

Removed
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration \ 
test (see noxfile).
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
- Unused `utils.range_scan` function.

Fixed
- Converting content to Unicode bytes may insert `utf_8` instead of preferred \ 
`utf-8`.
- Deprecation warning "'count' is passed as positional argument" when \ 
converting to Unicode bytes on Python 3.13+

2024-11-11 08:29:31 by Thomas Klausner | Files touched by this commit (862)

Log message:
py-*: remove unused tool dependency

py-setuptools includes the py-wheel functionality nowadays

2024-10-10 11:58:01 by Adam Ciarcinski | Files touched by this commit (3) | Package updated

Log message:
py-charset-normalizer: updated to 3.4.0

3.4.0

Added
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
- Support for Python 3.13

Fixed
- Relax the TypeError exception thrown when trying to compare a CharsetMatch \ 
with anything else than a CharsetMatch.
- Improved the general reliability of the detector based on user feedbacks.
- Declared charset in content (preemptive detection) not changed when converting \ 
to utf-8 bytes.

2023-11-01 10:14:56 by Adam Ciarcinski | Files touched by this commit (2) | Package updated

Log message:
py-charset-normalizer: updated to 3.3.2

3.3.2
Fixed
- Unintentional memory usage regression when using large payload that match \ 
several encoding
- Regression on some detection case showcased in the documentation

2023-10-23 09:56:04 by Adam Ciarcinski | Files touched by this commit (2) | Package updated

Log message:
py-charset-normalizer: updated to 3.3.1

3.3.1
Changed
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
- Improved the general detection reliability based on reports from the community

2023-09-30 19:16:30 by Adam Ciarcinski | Files touched by this commit (3) | Package updated

Log message:
py-charset-normalizer: updated to 3.3.0

3.3.0

Added
- Allow to execute the CLI (e.g. normalizer) through `python -m \ 
charset_normalizer.cli` or `python -m charset_normalizer`
- Support for 9 forgotten encoding that are supported by Python but unlisted in \ 
`encoding.aliases` as they have no alias

Removed
- (internal) Redundant utils.is_ascii function and unused function \ 
is_private_use_only
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant

Changed
- (internal) Unicode code blocks in constants are updated using the latest \ 
v15.0.0 definition to improve detection
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7

Fixed
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were \ 
close due to an unreachable condition in \_\_lt\_\_

2023-07-08 06:35:31 by Adam Ciarcinski | Files touched by this commit (2) | Package updated

Log message:
py-charset-normalizer: updated to 3.2.0

3.2.0

Changed
- Typehint for function `from_path` no longer enforce `PathLike` as its first \ 
argument
- Minor improvement over the global detection reliability

Added
- Introduce function `is_binary` that relies on main capabilities, and optimized \ 
to detect binaries
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and \ 
`from_fp` that allow a deeper control over the detection (default True)
- Explicit support for Python 3.12

Fixed
- Edge case detection failure where a file would contain 'very-long' camel cased word

2023-04-24 12:30:04 by Adam Ciarcinski | Files touched by this commit (2) | Package updated

Log message:
py-charset-normalizer: updated to 3.1.0

3.1.0

Added
- Argument `should_rename_legacy` for legacy function `detect` and disregard any \ 
new arguments without errors

Removed
- Support for Python 3.6

Changed
- Optional speedup provided by mypy/c 1.0.1