pkgsrc.se | The NetBSD package collection

Next | Query returned 14 messages, browsing 1 to 10 | Previous

CVS Commit History:

2024-01-06 20:51:02 by Adam Ciarcinski | Files touched by this commit (4) | Package updated

Log message:
py-ftfy: updated to 6.1.3

Version 6.1.3 (November 21, 2023)

- Updated wcwidth.
- Switched to the Apache 2.0 license.
- Dropped support for Python 3.7.

Version 6.1.2 (February 17, 2022)

- Added type information for `guess_bytes`.

Version 6.1.1 (February 9, 2022)

- Updated the heuristic to fix the letter ß in UTF-8/MacRoman mojibake,
 which had regressed since version 5.6.

- Packaging fixes to pyproject.toml.

Version 6.1 (February 9, 2022)

- Updated the heuristic to fix the letter Ñ with more confidence.

- Fixed type annotations and added py.typed.

- ftfy is packaged using Poetry now, and wheels are created and uploaded to
 PyPI.

Version 6.0.3 (May 14, 2021)

- Allow the keyword argument `fix_entities` as a deprecated alias for
 `unescape_html`, raising a warning.

- `ftfy.formatting` functions now disregard ANSI terminal escapes when
 calculating text width.

Version 6.0.2 (May 4, 2021)

This version is purely a cosmetic change, updating the maintainer's e-mail
address and the project's canonical location on GitHub.

Version 6.0.1 (April 12, 2021)

- The `remove_terminal_escapes` step was accidentally not being used. This
 version restores it.

- Specified in setup.py that ftfy 6 requires Python 3.6 or later.

- Use a lighter link color when the docs are viewed in dark mode.

Version 6.0 (April 2, 2021)

- New function: `ftfy.fix_and_explain()` can describe all the transformations
 that happen when fixing a string. This is similar to what
 `ftfy.fixes.fix_encoding_and_explain()` did in previous versions, but it
 can fix more than the encoding.

- `fix_and_explain()` and `fix_encoding_and_explain()` are now in the top-level
 ftfy module.

- Changed the heuristic entirely. ftfy no longer needs to categorize every
 Unicode character, but only characters that are expected to appear in
 mojibake.

- Because of the new heuristic, ftfy will no longer have to release a new
 version for every new version of Unicode. It should also run faster and
 use less RAM when imported.

- The heuristic `ftfy.badness.is_bad(text)` can be used to determine whether
 there appears to be mojibake in a string. Some users were already using
 the old function `sequence_weirdness()` for that, but this one is actually
 designed for that purpose.

- Instead of a pile of named keyword arguments, ftfy functions now take in
 a TextFixerConfig object. The keyword arguments still work, and become
 settings that override the defaults in TextFixerConfig.

- Added support for UTF-8 mixups with Windows-1253 and Windows-1254.

- Overhauled the documentation: https://ftfy.readthedocs.org

2022-01-05 16:41:32 by Thomas Klausner | Files touched by this commit (289)

Log message:
python: egg.mk: add USE_PKG_RESOURCES flag

This flag should be set for packages that import pkg_resources
and thus need setuptools after the build step.

Set this flag for packages that need it and bump PKGREVISION.

2022-01-04 21:55:40 by Thomas Klausner | Files touched by this commit (1595)

Log message:
*: bump PKGREVISION for egg.mk users

They now have a tool dependency on py-setuptools instead of a DEPENDS

2021-10-26 13:23:42 by Nia Alarie | Files touched by this commit (1161)

Log message:
textproc: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes

Unfetchable distfiles (fetched conditionally?):
./textproc/convertlit/distinfo clit18src.zip

2021-10-07 17:02:49 by Nia Alarie | Files touched by this commit (1162)

Log message:
textproc: Remove SHA1 hashes for distfiles

2017-09-16 21:27:31 by Thomas Klausner | Files touched by this commit (372)

Log message:
Reset maintainer

2017-07-31 00:32:28 by Thomas Klausner | Files touched by this commit (229)

Log message:
Switch github HOMEPAGEs to https.

2017-01-12 01:48:25 by Blue Rats | Files touched by this commit (1)

Log message:
DEPENDS on devel/py-wcwidth and textproc/py-html5lib.

2017-01-12 01:45:43 by Blue Rats | Files touched by this commit (3)

Log message:
Version 4.3.0 (December 29, 2016)

ftfy has gotten by for four years without dependencies on other Python \ 
libraries, but now we can spare ourselves some code and some maintenance burden \ 
by delegating certain tasks to other libraries that already solve them well. \ 
This version now depends on the html5lib and wcwidth libraries.

Feature changes:

    The remove_control_chars fixer will now remove some non-ASCII control \ 
characters as well, such as deprecated Arabic control characters and byte-order \ 
marks. Bidirectional controls are still left as is.

    This should have no impact on well-formed text, while cleaning up many \ 
characters that the Unicode Consortium deems "not suitable for markup" \ 
(see Unicode Technical Report #20).

    The unescape_html fixer uses a more thorough list of HTML entities, which it \ 
imports from html5lib.

    ftfy.formatting now uses wcwidth to compute the width that a string will \ 
occupy in a text console.

Heuristic changes:

    Updated the data file of Unicode character categories to Unicode 9, as used \ 
in Python 3.6.0. (No matter what version of Python you're on, ftfy uses the same \ 
data.)

Pending deprecations:

    The remove_bom option will become deprecated in 5.0, because it has been \ 
superseded by remove_control_chars.

    ftfy 5.0 will remove the previously deprecated name fix_text_encoding. It \ 
was renamed to fix_encoding in 4.0.

    ftfy 5.0 will require Python 3.2 or later, as planned. Python 2 users, \ 
please specify ftfy < 5 in your dependencies if you haven't already.

Version 4.2.0 (September 28, 2016)

Heuristic changes:

    Math symbols next to currency symbols are no longer considered 'weird' by \ 
the heuristic. This fixes a false positive where text that involved the \ 
multiplication sign and British pounds or euros (as in '5ÃÂ£35') could turn \ 
into Hebrew letters.

    A heuristic that used to be a bonus for certain punctuation now also gives a \ 
bonus to successfully decoding other common codepoints, such as the non-breaking \ 
space, the degree sign, and the byte order mark.

    In version 4.0, we tried to "future-proof" the categorization of \ 
emoji (as a kind of symbol) to include codepoints that would likely be assigned \ 
to emoji later. The future happened, and there are even more emoji than we \ 
expected. We have expanded the range to include those emoji, too.

    ftfy is still mostly based on information from Unicode 8 (as Python 3.5 is), \ 
but this expanded range should include the emoji from Unicode 9 and 10.

    Emoji are increasingly being modified by variation selectors and skin-tone \ 
modifiers. Those codepoints are now grouped with 'symbols' in ftfy, so they fit \ 
right in with emoji, instead of being considered 'marks' as their Unicode \ 
category would suggest.

    This enables fixing mojibake that involves iOS's new diverse emoji.

    An old heuristic that wasn't necessary anymore considered Latin text with \ 
high-numbered codepoints to be 'weird', but this is normal in languages such as \ 
Vietnamese and Azerbaijani. This does not seem to have caused any false \ 
positives, but it caused ftfy to be too reluctant to fix some cases of broken \ 
text in those languages.

    The heuristic has been changed, and all languages that use Latin letters \ 
should be on even footing now.

Version 4.1.1 (April 13, 2016)

    Bug fix: in the command-line interface, the -e option had no effect on \ 
Python 3 when using standard input. Now, it correctly lets you specify a \ 
different encoding for standard input.

Version 4.1.0 (February 25, 2016)

Heuristic changes:

    ftfy can now deal with "lossy" mojibake. If your text has been run \ 
through a strict Windows-1252 decoder, such as the one in Python, it may contain \ 
the replacement character ï¿½ (U+FFFD) where there were bytes that are \ 
unassigned in Windows-1252.

    Although ftfy won't recover the lost information, it can now detect this \ 
situation, replace the entire lossy character with ï¿½, and decode the rest \ 
of the characters. Previous versions would be unable to fix any string that \ 
contained U+FFFD.

    As an example, text in curly quotes that gets corrupted Ã¢â¬Å like \ 
this Ã¢â¬ï¿½ now gets fixed to be â like this ï¿½.

    Updated the data file of Unicode character categories to Unicode 8.0, as \ 
used in Python 3.5.0. (No matter what version of Python you're on, ftfy uses the \ 
same data.)

    Heuristics now count characters such as ~ and ^ as punctuation instead of \ 
wacky math symbols, improving the detection of mojibake in some edge cases.

New features:

    A new module, ftfy.formatting, can be used to justify Unicode text in a \ 
monospaced terminal. It takes into account that each character can take up \ 
anywhere from 0 to 2 character cells.

    Internally, the utf-8-variants codec was simplified and optimized.

Version 4.0.0 (April 10, 2015)

Breaking changes:

    The default normalization form is now NFC, not NFKC. NFKC replaces a large \ 
number of characters with 'equivalent' characters, and some of these \ 
replacements are useful, but some are not desirable to do by default.

    The fix_text function has some new options that perform more targeted \ 
operations that are part of NFKC normalization, such as fix_character_width, \ 
without requiring hitting all your text with the huge mallet that is NFKC.
        If you were already using NFC normalization, or in general if you want \ 
to preserve the spacing of CJK text, you should be sure to set \ 
fix_character_width=False.

    The remove_unsafe_private_use parameter has been removed entirely, after two \ 
versions of deprecation. The function name fix_bad_encoding is also gone.

New features:

    Fixers for strange new forms of mojibake, including particularly clear cases \ 
of mixed UTF-8 and Windows-1252.

    New heuristics, so that ftfy can fix more stuff, while maintaining \ 
approximately zero false positives.

    The command-line tool trusts you to know what encoding your input is in, and \ 
assumes UTF-8 by default. You can still tell it to guess with the -g option.

    The command-line tool can be configured with options, and can be used as a pipe.

    Recognizes characters that are new in Unicode 7.0, as well as emoji from \ 
Unicode 8.0+ that may already be in use on iOS.

Deprecations:

    fix_text_encoding is being renamed again, for conciseness and consistency. \ 
It's now simply called fix_encoding. The name fix_text_encoding is available but \ 
emits a warning.

Pending deprecations:

    Python 2.6 support is largely coincidental.

    Python 2.7 support is on notice. If you use Python 2, be sure to pin a \ 
version of ftfy less than 5.0 in your requirements.

2017-01-03 14:23:05 by Jonathan Perkin | Files touched by this commit (52)

Log message:
Use "${MV} || ${TRUE}" and "${RM} -f" consistently in \ 
post-install targets.

Next | Query returned 14 messages, browsing 1 to 10 | Previous