./converters/py-charset-normalizer, Universal Charset Detector

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 2.0.8, Package name: py39-charset-normalizer-2.0.8, Maintainer: pkgsrc-users

A library that helps you read text from an unknown charset encoding.


Master sites:

Filesize: 73.826 KB

Version history: (Expand)


CVS history: (Expand)


   2021-11-25 09:10:29 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-charset-normalizer: updated to 2.0.8

2.0.8
Changed
- Improvement over Vietnamese detection
- MD improvement on trailing data and long foreign (non-pure latin) data
- Efficiency improvements in cd/alphabet_languages from \ 
[@adbar](https://github.com/adbar)
- call sum() without an intermediary list following PEP 289 recommendations from \ 
[@adbar](https://github.com/adbar)
- Code style as refactored by Sourcery-AI
- Minor adjustment on the MD around european words
- Remove and replace SRTs from assets / tests
- Initialize the library logger with a `NullHandler` by default from \ 
[@nmaynes](https://github.com/nmaynes)
- Setting kwarg `explain` to True will add provisionally (bounded to function \ 
lifespan) a specific stream handler
   2021-10-26 12:06:54 by Nia Alarie | Files touched by this commit (150)
Log message:
converters: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes
   2021-10-12 11:12:20 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-charset-normalizer: updated to 2.0.7

Version 2.0.7

Changes:

Addition: 🍱 Add support for Kazakh (Cyrillic) language detection
Improvement: ❇️ Further improve inferring the language from a given code \ 
page (single-byte)
Removed: 🔥 Remove redundant logging entry about detected language(s)
Miscellaneous: 🔧 Trying to leverage PEP263 when PEP3120 is not supported
While I do not think that this (116) will actually fix something, it will rather \ 
raise a SyntaxError (Not about ASCII decoding error) for those trying to install \ 
this package using a non-supported Python version
Improvement: ⚡ Refactoring for potential performance improvements in loops
Improvement: ✨ Various detection improvement (MD+CD)
Bugfix: 🐛 Fix a minor inconsistency between Python 3.5 and other versions \ 
regarding language detection
   2021-10-07 15:29:13 by Nia Alarie | Files touched by this commit (150)
Log message:
converters: Remove SHA1 hashes for distfiles
   2021-09-19 12:39:10 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-charset-normalizer: updated to 2.0.6

Version 2.0.6

Changes:

Bugfix: 🐛 Unforeseen regression with the loss of the backward-compatibility \ 
with some older minor of Python 3.5.x
Bugfix: 🐛 Fix CLI crash when using --minimal output in certain cases
Improvement: ✨ Minor improvement to the detection efficiency (less than 1%)

Version 2.0.5

Changes:

Internal: 🎨 The project now comply with: flake8, mypy, isort and black to \ 
ensure a better overall quality
Internal: 🎨 The MANIFEST.in was not exhaustive
Improvement: ✨ The BC-support with v1.x was improved, the old staticmethods \ 
are restored
Remove: 🔥 The project no longer raise warning on tiny content given for \ 
detection, will be simply logged as warning instead
Improvement: ✨ The Unicode detection is slightly improved
Bugfix: 🐛 In some rare case, the chunks extractor could cut in the middle of \ 
a multi-byte character and could mislead the mess detection
Bugfix: 🐛 Some rare 'space' characters could trip up the \ 
UnprintablePlugin/Mess detection
Improvement: 🎨 Add syntax sugar __bool__ for results CharsetMatches list-container

This release push further the detection coverage to 97 % !

Version 2.0.4

Changes:

Improvement: ❇️ Adjust the MD to lower the sensitivity, thus improving the \ 
global detection reliability
Improvement: ❇️ Allow fallback on specified encoding if any
Bugfix: 🐛 The CLI no longer raise an unexpected exception when no encoding \ 
has been found
Bugfix: 🐛 Fix accessing the 'alphabets' property when the payload contains \ 
surrogate characters
Bugfix: 🐛 ✏️ The logger could mislead (explain=True) on detected \ 
languages and the impact of one MBCS match
Bugfix: 🐛 Submatch factoring could be wrong in rare edge cases
Bugfix: 🐛 Multiple files given to the CLI were ignored when publishing \ 
results to STDOUT. (After the first path)
Internal: 🎨 Fix line endings from CRLF to LF for certain files
   2021-07-30 06:14:49 by Adam Ciarcinski | Files touched by this commit (5)
Log message:
py-charset-normalizer: added version 2.0.3

A library that helps you read text from an unknown charset encoding.