./textproc/R-stringi, Character string processing facilities

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.7.6, Package name: R-stringi-1.7.6, Maintainer: pkgsrc-users

stringi (pronounced "stringy") is the R package for fast, correct,
consistent and convenient string/text processing in each locale and
any native character encoding. The use of the ICU library gives R
users a platform-independent set of functions known to Java, Perl,
Python, PHP, and Ruby programmers.


Required to run:
[textproc/icu] [math/R]

Required to build:
[pkgtools/cwrappers]

Master sites: (Expand)


Version history: (Expand)


CVS history: (Expand)


   2022-04-23 16:43:58 by Makoto Fujiwara | Files touched by this commit (2)
Log message:
(textproc/R-stringi) Updated 1.7.4 to 1.7.6

# What Is New in *stringi*

## 1.7.6 (2021-11-29)

* [BUILD TIME] #463: Added loongarch support in ICU's double conversion
    (@liuxiang88).

* [BUGFIX] #467: The UCRT build on Windows was not marking strings as `latin1`.

## 1.7.5 (2021-10-04)

* [DOCUMENTATION] Paper on *stringi* has been accepted for
  publication in the *Journal of Statistical Software*,
  see <https://stringi.gagolewski.com/_static/vignette/stringi.pdf>
  for a draft version.

* [DOCUMENTATION] The *stringi* website at <https://stringi.gagolewski.com>
  now features a comprehensive tutorial based on the aforementioned paper.

* [DOCUMENTATION] The *ICU* Project site has been moved to
  <https://icu.unicode.org/>.

* [BUILD TIME] #457: The `autoconf` macros `AC_LANG_CPLUSPLUS`
  and `AC_TRY_COMPILE` were obsolete.

* [BUGFIX] #458: Passing ALTREP objects no longer yields
  'embeded nul in string' errors.
   2022-04-18 21:12:27 by Adam Ciarcinski | Files touched by this commit (1798) | Package updated
Log message:
revbump for textproc/icu update
   2021-12-08 17:07:18 by Adam Ciarcinski | Files touched by this commit (3063)
Log message:
revbump for icu and libffi
   2021-10-26 13:23:42 by Nia Alarie | Files touched by this commit (1161)
Log message:
textproc: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes

Unfetchable distfiles (fetched conditionally?):
./textproc/convertlit/distinfo clit18src.zip
   2021-10-07 17:02:49 by Nia Alarie | Files touched by this commit (1162)
Log message:
textproc: Remove SHA1 hashes for distfiles
   2021-09-08 15:04:22 by Makoto Fujiwara | Files touched by this commit (2) | Package updated
Log message:
(textproc/R-stringi) Updated 1.4.6 to 1.7.4, make test passes

## 1.7.4 (2021-08-12)

* [BUGFIX] #449: Fixed segfaults generated by `stri_sprintf`.

* [BUILD TIME] No longer defining `USE_RINTERNALS` and `R_NO_REMAP`.

## 1.7.3 (2021-07-15)

* [BUGFIX] Fixed the previous patch of ICU55 causing a build failure on,
  amongst others, CRAN's Solaris-based target.

## 1.7.2 (2021-07-14)

* [BUGFIX] Workaround for a bug in `tools::checkFF` failing
  when `NA_character_` is passed to `.Call`.

## 1.7.1 (2021-07-14)

* [BACKWARD INCOMPATIBILITY] `%s$%` and `%stri$%` now use the new `stri_sprintf`
  (see below) function instead of `base::sprintf`.

* [BACKWARD INCOMPATIBILITY, NEW FEATURE] In `stri_sub<-` and `stri_sub_all<-`,
  providing a negative `length` from now on does not result in the corresponding
  input string being altered.

* [BACKWARD INCOMPATIBILITY, NEW FEATURE] In `stri_sub` and `stri_sub_all`,
  negative `length` results in the corresponding output being `NA`
  or not extracted at all, depending on the setting of the new argument
  `ignore_negative_length`.

* [BACKWARD INCOMPATIBILITY, BUGFIX, NEW FEATURE] In `stri_subset*`
  and their replacement versions, `pattern` and `value` cannot be longer
  than `str` (but now they are recycled if necessary).

* [BACKWARD INCOMPATIBILITY, NEW FEATURE] `stri_sub*` now accept the
  `from` argument being a matrix like `cbind(from, length=length)`.
  Unnamed columns or any other names are still interpreted as `cbind(from, to)`.
  Also, the new argument `use_matrix` can be used to disable
  the special treatment of such matrices.

* [DOCUMENTATION] It has been clarified that the syntax of `*_charclass`
  (e.g., used in `stri_trim*`) differs slightly from regex character
  classes.

* [NEW FEATURE] #420: `stri_sprintf` (alias: `stri_string_format`)
  is a Unicode-aware replacement for and enhancement of the base `sprintf`:
  it adds a customised handling of `NA`s (on demand), computing field size
  based on code point width, outputting substrings of at most given width,
  variable width and precision (both at the same time), etc. Moreover,
  `stri_printf` can be used to display formatted strings conveniently.

* [NEW FEATURE] #153: `stri_match_*_regex` now extract capture group names.

* [NEW FEATURE] #25: `stri_locate_*_regex` now have a new argument,
  `capture_groups`, which allows for extracting positions of matches
  to parenthesised subexpressions.

* [NEW FEATURE] `stri_locate_*` now have a new argument, `get_length`,
  whose setting may result in generating *from-length* matrices
  (instead of *from-to* ones).

* [NEW FEATURE] #438: `stri_trans_general` now supports rule-based
  as well as reverse-direction transliteration.

* [NEW FEATURE] #434: `stri_datetime_format` and `stri_datetime_parse`
  are now vectorised also with respect to the `format` argument.

* [NEW FEATURE] `stri_datetime_fstr` has a new argument, `ignore_special`,
  which defaults to `TRUE` for backward compatibility.

* [NEW FEATURE] `stri_datetime_format`, `stri_datetime_add`, and
  `stri_datetime_fields` now call `as.POSIXct` more eagerly.

* [NEW FEATURE] `stri_trim*` now have a new argument, `negate`.

* [NEW FEATURE] `stri_replace_rstr` converts `gsub`-style replacement strings
  to `stri_replace`-style.

* [INTERNAL] `stri_prepare_arg*` have been refactored, buffer overruns
  in the exception handling subsystem are now avoided.

* [BUGFIX] Few functions (`stri_length`, `stri_enc_toutf32`, etc.)
  did not throw an exception on an invalid UTF-8
  byte sequence (and merely issued a warning instead).

* [BUGFIX] `stri_datetime_fstr` did not honour `NA_character_`
  and did not parse format strings such as `"%Y%m%d"` correctly.
  It has now been completely rewritten (in C).

* [BUGFIX] `stri_wrap` did not recognise the width of certain Unicode sequences
  correctly.

## 1.6.2 (2021-05-14)

* [BACKWARD INCOMPATIBILITY] In `stri_enc_list()`,
  `simplify` now defaults to `TRUE`.

* [NEW FEATURE] #425: The outputs of `stri_enc_list()`, `stri_locale_list()`,
  `stri_timezone_list()`, and `stri_trans_list()` are now sorted.

* [NEW FEATURE] #428: In `stri_flatten`, `na_empty=NA` now omits missing values.

* [BUILD TIME] #431: Pre-4.9.0 GCC has `::max_align_t`,
  but not `std::max_align_t`, added a (possible) workaround, see the `INSTALL`
  file.

* [BUGFIX] #429: `stri_width()` misclassified the width of certain
  code points (including grave accent, Eszett, etc.);
  General category *Sk* (Symbol, modifier) is no longer of width 0,
  `UCHAR_EAST_ASIAN_WIDTH` of `U_EA_AMBIGUOUS` is no longer of width 2.

* [BUGFIX] #354: `ALTREP` `CHARSXP`s were not copied, and thus could have been
  garbage collected in the so-called meanwhile (with thanks to @jimhester).

## 1.6.1 (2021-05-05)

* [GENERAL] #401: stringi is now bundled with ICU4C 69.1 (upgraded from 61.1),
  which is used on most Windows and OS X builds as well as on *nix systems
  not equipped with system ICU. However, if the C++11 support is disabled,
  stringi will be built against the battle-tested ICU4C 55.1.
  The update to ICU brings Unicode 13.0 and CLDR 39 support.

* [DOCUMENTATION] A draft version of a paper on `stringi` is now available at
  https://stringi.gagolewski.com/_static/vignette/stringi.pdf

* [GENERAL] stringi now requires R >= 3.1 (`CXX_STD` of `CXX11` or `CXX1X`).

* [NEW FEATURE] #408: `stri_trans_casefold()` performs case folding;
  this is different from case mapping, which is locale-dependent.
  Folding makes two pieces of text that differ only in case identical.
  This can come in handy when comparing strings.

* [NEW FEATURE] #421: `stri_rank()` ranks strings in a character vector
  (e.g., for ordering data frames with regards to multiple criteria,
  the ranks can be passed to `order()`, see #219).

* [NEW FEATURE] #266: `stri_width()` now supports emojis.

* [NEW FEATURE] `%s$%` and `%stri$%` are now vectorised with respect to
  both arguments.

* [BUGFIX] `stri_sort_key()` now outputs `bytes`-encoded strings.

* [BUGFIX] #415: `locale=''` was not equivalent to `locale=NULL`
  in `stri_opts_collator()`.

* [INTERNAL] #414: Use `LEVELS(x)` macro instead of accessing `(x)->sxpinfo.gp`
  directly (@lukaszdaniel).

## 1.5.3 (2020-09-04)

* [DOCUMENTATION] stringi home page has moved to https://stringi.gagolewski.com
  and now includes a comprehensive reference manual.

* [NEW FEATURE] #400: `%s$%` and `%stri$%` are now binary operators
  that call base R's `sprintf()`.

* [NEW FEATURE] #399: The `%s*%` and `%stri*%` operators can be used
  in addition to `stri_dup()`, for the very same purpose.

* [NEW FEATURE] #355: `stri_opts_regex()` now accepts the `time_limit` and
  `stack_limit` options so as to prevent malformed or malicious regexes
  from running for too long.

* [NEW FEATURE] #345: `stri_startswith()` and `stri_endswith()` are now equipped
  with the `negate` parameter.

* [NEW FEATURE] #382: Incorrect regexes are now reported to ease debugging.

* [DEPRECATION WARNING] #347: Any unknown option passed to `stri_opts_fixed()`,
  `stri_opts_regex()`, `stri_opts_coll()`, and `stri_opts_brkiter()` now
  generates a warning. In the future, the `...` parameter will be removed,
  so that will be an error.

* [DEPRECATION WARNING] `stri_duplicated()`'s `fromLast` argument
  has been renamed `from_last`. `fromLast` is now its alias scheduled
  for removal in a future version of the package.

* [DEPRECATION WARNING] `stri_enc_detect2()`
  is scheduled for removal in a future version of the package.
  Use `stri_enc_detect()` or the more targeted `stri_enc_isutf8()`,
  `stri_enc_isascii()`, etc., instead.

* [DEPRECATION WARNING] `stri_read_lines()`,  `stri_write_lines()`,
  `stri_read_raw()`: use `con` argument instead of `fname` now.
  The argument `fallback_encoding` is scheduled for removal and is no longer
  used. `stri_read_lines()` does not support `encoding="auto"` anymore.

* [DEPRECATION WARNING] `nparagraphs` in `stri_rand_lipsum()` has been renamed
  `n_paragraphs`.

* [NEW FEATURE] #398: Alternative, British spelling of function parameters
  has been introduced, e.g., `stri_opts_coll()` now supports both
  `normalization` and `normalisation`.

* [NEW FEATURE] #393: `stri_read_bin()`, `stri_read_lines()`, and
  `stri_write_lines()` are no longer marked as draft API.

* [NEW FEATURE] #187: `stri_read_bin()`, `stri_read_lines()`, and
  `stri_write_lines()` now support connection objects as well.

* [NEW FEATURE] #386: New function `stri_sort_key()` for generating
  locale-dependent sort keys which can be ordered at the byte level and
  return an equivalent ordering to the original string (@DavisVaughan).

* [BUGFIX] #138: `stri_encode()` and `stri_rand_strings()`
  now can generate strings of much larger lengths.

* [BUGFIX] `stri_wrap()` did not honour `indent` correctly when
  `use_width` was `TRUE`.
   2021-04-21 13:43:04 by Adam Ciarcinski | Files touched by this commit (1822)
Log message:
revbump for textproc/icu
   2020-11-05 10:09:30 by Ryo ONODERA | Files touched by this commit (1814)
Log message:
*: Recursive revbump from textproc/icu-68.1