./textproc/R-stringi, Character string processing facilities

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.7.4, Package name: R-stringi-1.7.4, Maintainer: pkgsrc-users

stringi (pronounced "stringy") is the R package for fast, correct,
consistent and convenient string/text processing in each locale and
any native character encoding. The use of the ICU library gives R
users a platform-independent set of functions known to Java, Perl,
Python, PHP, and Ruby programmers.


Required to run:
[textproc/icu] [math/R]

Required to build:
[pkgtools/cwrappers]

Master sites: (Expand)


Version history: (Expand)


CVS history: (Expand)


   2021-10-07 17:02:49 by Nia Alarie | Files touched by this commit (1162)
Log message:
textproc: Remove SHA1 hashes for distfiles
   2021-09-08 15:04:22 by Makoto Fujiwara | Files touched by this commit (2) | Package updated
Log message:
(textproc/R-stringi) Updated 1.4.6 to 1.7.4, make test passes

## 1.7.4 (2021-08-12)

* [BUGFIX] #449: Fixed segfaults generated by `stri_sprintf`.

* [BUILD TIME] No longer defining `USE_RINTERNALS` and `R_NO_REMAP`.

## 1.7.3 (2021-07-15)

* [BUGFIX] Fixed the previous patch of ICU55 causing a build failure on,
  amongst others, CRAN's Solaris-based target.

## 1.7.2 (2021-07-14)

* [BUGFIX] Workaround for a bug in `tools::checkFF` failing
  when `NA_character_` is passed to `.Call`.

## 1.7.1 (2021-07-14)

* [BACKWARD INCOMPATIBILITY] `%s$%` and `%stri$%` now use the new `stri_sprintf`
  (see below) function instead of `base::sprintf`.

* [BACKWARD INCOMPATIBILITY, NEW FEATURE] In `stri_sub<-` and `stri_sub_all<-`,
  providing a negative `length` from now on does not result in the corresponding
  input string being altered.

* [BACKWARD INCOMPATIBILITY, NEW FEATURE] In `stri_sub` and `stri_sub_all`,
  negative `length` results in the corresponding output being `NA`
  or not extracted at all, depending on the setting of the new argument
  `ignore_negative_length`.

* [BACKWARD INCOMPATIBILITY, BUGFIX, NEW FEATURE] In `stri_subset*`
  and their replacement versions, `pattern` and `value` cannot be longer
  than `str` (but now they are recycled if necessary).

* [BACKWARD INCOMPATIBILITY, NEW FEATURE] `stri_sub*` now accept the
  `from` argument being a matrix like `cbind(from, length=length)`.
  Unnamed columns or any other names are still interpreted as `cbind(from, to)`.
  Also, the new argument `use_matrix` can be used to disable
  the special treatment of such matrices.

* [DOCUMENTATION] It has been clarified that the syntax of `*_charclass`
  (e.g., used in `stri_trim*`) differs slightly from regex character
  classes.

* [NEW FEATURE] #420: `stri_sprintf` (alias: `stri_string_format`)
  is a Unicode-aware replacement for and enhancement of the base `sprintf`:
  it adds a customised handling of `NA`s (on demand), computing field size
  based on code point width, outputting substrings of at most given width,
  variable width and precision (both at the same time), etc. Moreover,
  `stri_printf` can be used to display formatted strings conveniently.

* [NEW FEATURE] #153: `stri_match_*_regex` now extract capture group names.

* [NEW FEATURE] #25: `stri_locate_*_regex` now have a new argument,
  `capture_groups`, which allows for extracting positions of matches
  to parenthesised subexpressions.

* [NEW FEATURE] `stri_locate_*` now have a new argument, `get_length`,
  whose setting may result in generating *from-length* matrices
  (instead of *from-to* ones).

* [NEW FEATURE] #438: `stri_trans_general` now supports rule-based
  as well as reverse-direction transliteration.

* [NEW FEATURE] #434: `stri_datetime_format` and `stri_datetime_parse`
  are now vectorised also with respect to the `format` argument.

* [NEW FEATURE] `stri_datetime_fstr` has a new argument, `ignore_special`,
  which defaults to `TRUE` for backward compatibility.

* [NEW FEATURE] `stri_datetime_format`, `stri_datetime_add`, and
  `stri_datetime_fields` now call `as.POSIXct` more eagerly.

* [NEW FEATURE] `stri_trim*` now have a new argument, `negate`.

* [NEW FEATURE] `stri_replace_rstr` converts `gsub`-style replacement strings
  to `stri_replace`-style.

* [INTERNAL] `stri_prepare_arg*` have been refactored, buffer overruns
  in the exception handling subsystem are now avoided.

* [BUGFIX] Few functions (`stri_length`, `stri_enc_toutf32`, etc.)
  did not throw an exception on an invalid UTF-8
  byte sequence (and merely issued a warning instead).

* [BUGFIX] `stri_datetime_fstr` did not honour `NA_character_`
  and did not parse format strings such as `"%Y%m%d"` correctly.
  It has now been completely rewritten (in C).

* [BUGFIX] `stri_wrap` did not recognise the width of certain Unicode sequences
  correctly.

## 1.6.2 (2021-05-14)

* [BACKWARD INCOMPATIBILITY] In `stri_enc_list()`,
  `simplify` now defaults to `TRUE`.

* [NEW FEATURE] #425: The outputs of `stri_enc_list()`, `stri_locale_list()`,
  `stri_timezone_list()`, and `stri_trans_list()` are now sorted.

* [NEW FEATURE] #428: In `stri_flatten`, `na_empty=NA` now omits missing values.

* [BUILD TIME] #431: Pre-4.9.0 GCC has `::max_align_t`,
  but not `std::max_align_t`, added a (possible) workaround, see the `INSTALL`
  file.

* [BUGFIX] #429: `stri_width()` misclassified the width of certain
  code points (including grave accent, Eszett, etc.);
  General category *Sk* (Symbol, modifier) is no longer of width 0,
  `UCHAR_EAST_ASIAN_WIDTH` of `U_EA_AMBIGUOUS` is no longer of width 2.

* [BUGFIX] #354: `ALTREP` `CHARSXP`s were not copied, and thus could have been
  garbage collected in the so-called meanwhile (with thanks to @jimhester).

## 1.6.1 (2021-05-05)

* [GENERAL] #401: stringi is now bundled with ICU4C 69.1 (upgraded from 61.1),
  which is used on most Windows and OS X builds as well as on *nix systems
  not equipped with system ICU. However, if the C++11 support is disabled,
  stringi will be built against the battle-tested ICU4C 55.1.
  The update to ICU brings Unicode 13.0 and CLDR 39 support.

* [DOCUMENTATION] A draft version of a paper on `stringi` is now available at
  https://stringi.gagolewski.com/_static/ … tringi.pdf

* [GENERAL] stringi now requires R >= 3.1 (`CXX_STD` of `CXX11` or `CXX1X`).

* [NEW FEATURE] #408: `stri_trans_casefold()` performs case folding;
  this is different from case mapping, which is locale-dependent.
  Folding makes two pieces of text that differ only in case identical.
  This can come in handy when comparing strings.

* [NEW FEATURE] #421: `stri_rank()` ranks strings in a character vector
  (e.g., for ordering data frames with regards to multiple criteria,
  the ranks can be passed to `order()`, see #219).

* [NEW FEATURE] #266: `stri_width()` now supports emojis.

* [NEW FEATURE] `%s$%` and `%stri$%` are now vectorised with respect to
  both arguments.

* [BUGFIX] `stri_sort_key()` now outputs `bytes`-encoded strings.

* [BUGFIX] #415: `locale=''` was not equivalent to `locale=NULL`
  in `stri_opts_collator()`.

* [INTERNAL] #414: Use `LEVELS(x)` macro instead of accessing `(x)->sxpinfo.gp`
  directly (@lukaszdaniel).

## 1.5.3 (2020-09-04)

* [DOCUMENTATION] stringi home page has moved to https://stringi.gagolewski.com
  and now includes a comprehensive reference manual.

* [NEW FEATURE] #400: `%s$%` and `%stri$%` are now binary operators
  that call base R's `sprintf()`.

* [NEW FEATURE] #399: The `%s*%` and `%stri*%` operators can be used
  in addition to `stri_dup()`, for the very same purpose.

* [NEW FEATURE] #355: `stri_opts_regex()` now accepts the `time_limit` and
  `stack_limit` options so as to prevent malformed or malicious regexes
  from running for too long.

* [NEW FEATURE] #345: `stri_startswith()` and `stri_endswith()` are now equipped
  with the `negate` parameter.

* [NEW FEATURE] #382: Incorrect regexes are now reported to ease debugging.

* [DEPRECATION WARNING] #347: Any unknown option passed to `stri_opts_fixed()`,
  `stri_opts_regex()`, `stri_opts_coll()`, and `stri_opts_brkiter()` now
  generates a warning. In the future, the `...` parameter will be removed,
  so that will be an error.

* [DEPRECATION WARNING] `stri_duplicated()`'s `fromLast` argument
  has been renamed `from_last`. `fromLast` is now its alias scheduled
  for removal in a future version of the package.

* [DEPRECATION WARNING] `stri_enc_detect2()`
  is scheduled for removal in a future version of the package.
  Use `stri_enc_detect()` or the more targeted `stri_enc_isutf8()`,
  `stri_enc_isascii()`, etc., instead.

* [DEPRECATION WARNING] `stri_read_lines()`,  `stri_write_lines()`,
  `stri_read_raw()`: use `con` argument instead of `fname` now.
  The argument `fallback_encoding` is scheduled for removal and is no longer
  used. `stri_read_lines()` does not support `encoding="auto"` anymore.

* [DEPRECATION WARNING] `nparagraphs` in `stri_rand_lipsum()` has been renamed
  `n_paragraphs`.

* [NEW FEATURE] #398: Alternative, British spelling of function parameters
  has been introduced, e.g., `stri_opts_coll()` now supports both
  `normalization` and `normalisation`.

* [NEW FEATURE] #393: `stri_read_bin()`, `stri_read_lines()`, and
  `stri_write_lines()` are no longer marked as draft API.

* [NEW FEATURE] #187: `stri_read_bin()`, `stri_read_lines()`, and
  `stri_write_lines()` now support connection objects as well.

* [NEW FEATURE] #386: New function `stri_sort_key()` for generating
  locale-dependent sort keys which can be ordered at the byte level and
  return an equivalent ordering to the original string (@DavisVaughan).

* [BUGFIX] #138: `stri_encode()` and `stri_rand_strings()`
  now can generate strings of much larger lengths.

* [BUGFIX] `stri_wrap()` did not honour `indent` correctly when
  `use_width` was `TRUE`.
   2021-04-21 13:43:04 by Adam Ciarcinski | Files touched by this commit (1822)
Log message:
revbump for textproc/icu
   2020-11-05 10:09:30 by Ryo ONODERA | Files touched by this commit (1814)
Log message:
*: Recursive revbump from textproc/icu-68.1
   2020-07-31 20:44:15 by Brook Milligan | Files touched by this commit (2) | Package updated
Log message:
R-stringi: update to 1.4.6.
   2020-06-02 10:25:05 by Adam Ciarcinski | Files touched by this commit (1689)
Log message:
Revbump for icu
   2020-04-12 10:29:21 by Adam Ciarcinski | Files touched by this commit (956) | Package updated
Log message:
Recursive revision bump after textproc/icu update
   2019-08-08 21:53:58 by Brook Milligan | Files touched by this commit (189) | Package updated
Log message:
Update all R packages to canonical form.

The canonical form [1] of an R package Makefile includes the
following:

- The first stanza includes R_PKGNAME, R_PKGVER, PKGREVISION (as
  needed), and CATEGORIES.

- HOMEPAGE is not present but defined in math/R/Makefile.extension to
  refer to the CRAN web page describing the package.  Other relevant
  web pages are often linked from there via the URL field.

This updates all current R packages to this form, which will make
regular updates _much_ easier, especially using pkgtools/R2pkg.

[1] http://mail-index.netbsd.org/tech-pkg/2 … 21711.html