./math/R-stringdist, Approximate String Matching and String Distance Functions

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 0.9.10, Package name: R-stringdist-0.9.10, Maintainer: pkgsrc-users

Implements an approximate string matching version of R's native
'match' function. Can calculate various string distances based on
edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting
alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic
metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided
as well. Distances can be computed between character vectors while
taking proper care of encoding or between integer vectors representing
generic sequences. This package is built for speed and runs in
parallel by using 'openMP'. An API for C or C++ is exposed as well.


Required to run:
[math/R]

Required to build:
[pkgtools/cwrappers]

Master sites: (Expand)


Version history: (Expand)


CVS history: (Expand)


   2023-06-02 15:49:35 by Makoto Fujiwara | Files touched by this commit (2)
Log message:
(math/R-stringdist) Updated 0.9.8 to 0.9.10

(from NEWS, no info on 0.9.10)
version 0.9.9
- Fixed warnings generated by new C compiler. (function prototypes must
  now be defined completely). (Thanks to Kurt Hornik for the head's up.)
   2021-10-26 12:56:13 by Nia Alarie | Files touched by this commit (458)
Log message:
math: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes
   2021-10-07 16:28:36 by Nia Alarie | Files touched by this commit (458)
Log message:
math: Remove SHA1 hashes for distfiles
   2021-09-18 14:38:42 by Makoto Fujiwara | Files touched by this commit (2) | Package updated
Log message:
(math/R-stringdist) Updated 0.9.5.5 to 0.9.8

version 0.9.8
- Fixed some issues on C-level causing problems with the
  CLANG compiler. (Thanks to Brian Ripley for not only
  reporting this, but also sending updated code with
  fixes).

version 0.9.7
- Fixes in use of INTEGER() and VECTOR_ELT() after updates in R's C API.
  this affected 'afind' and 'max_length' (internally). (Thanks to Luke
  Tierny and Kurt Hornik for the notification).
- Fix in 'amatch' causing utf-8 characters to be ignored in some
  cases (thanks to Joan Mime for reporting #78).
- Fix: segfault when 'afind' was called with many search patterns or many
  texts to be searched.
- Fix: stringsimmatrix was not normalized correctly (Thanks to Tamas Ferenci
  for reporting GH).

version 0.9.6.3
- Resubmit. Fixed an URL redirect that was detected by CRAN.

version 0.9.6.2
- Resubmit. Fixed url issues detected by CRAN, added doi to description
  as per CRAN request.

version 0.9.6.1
- Bugfix: afind/grab/grabl returned wrong results on MacOS only.
  (thanks to Prof. Brian Ripley for the notification and for running tests
   on his personal machine and to Tomas Kalibera for making the
   ubuntu-rchk docker image available).

version 0.9.6
- New function 'afind': find approximate matches in text based on string distance.
- New functions 'grab', 'grabl': fuzzy matching equivalent to 'grep' and 'grepl'.
- New function 'extract': fuzzy matching equivalent of stringr::str_extract.
- New algorithm 'running_cosine': fast fuzzy text search using cosine distance.
- New function 'stringsimmatrix' (Thanks to Johannes Gruber).
- Number of threads used is now reported when loading 'stringdist'.
- Internal fixes (in some cases class() == 'class' was used).
   2020-02-10 15:21:00 by Makoto Fujiwara | Files touched by this commit (3)
Log message:
(math/R-stringdist) import R-stringdist-0.9.5.5

Implements an approximate string matching version of R's native
'match' function. Can calculate various string distances based on
edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting
alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic
metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided
as well. Distances can be computed between character vectors while
taking proper care of encoding or between integer vectors representing
generic sequences. This package is built for speed and runs in
parallel by using 'openMP'. An API for C or C++ is exposed as well.