./textproc/R-data.table, Extension of data.frame

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.14.10, Package name: R-data.table-1.14.10, Maintainer: pkgsrc-users

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered
joins, fast add/modify/delete of columns by group using no copies at
all, list columns, friendly and fast character-separated-value
read/write. Offers a natural and flexible syntax, for faster
development.


Required to run:
[math/R]

Required to build:
[pkgtools/cwrappers]

Master sites: (Expand)


Version history: (Expand)


CVS history: (Expand)


   2024-01-21 11:55:57 by Makoto Fujiwara | Files touched by this commit (1)
Log message:
(textproc/R-data.table) +TEST_DEPENDS+= R-nanotime, make test does not pass yet
   2024-01-21 05:49:57 by Makoto Fujiwara | Files touched by this commit (2) | Package updated
Log message:
(textproc/R-data.table) Updated 1.14.0 to 1.14.10

# data.table [v1.14.10](https://github.com/Rdatatable/data.table/milestone/20)

## NOTES

1. Maintainer of the package for CRAN releases is from now on Tyson
  Barrett (@tysonstanley),
  [#5710](https://github.com/Rdatatable/data.table/issues/5710).

2. Updated internal code for breaking change of `is.atomic(NULL)` in
  R-devel,
  [#5691](https://github.com/Rdatatable/data.table/pull/5691). Thanks to
  Martin Maechler for the patch.

3. Fix multiple test concerning coercion to missing complex numbers,
  [#5695](https://github.com/Rdatatable/data.table/issues/5695) and
  [#5748](https://github.com/Rdatatable/data.table/issues/5748). Thanks
  to @MichaelChirico and @ben-schwen for the patches.

4. Fix multiple format warnings (e.g., -Wformat)
  [#5712](https://github.com/Rdatatable/data.table/pull/5712),
  [#5781](https://github.com/Rdatatable/data.table/pull/5781),
  [#5880](https://github.com/Rdatatable/data.table/pull/5800),
  [#5786](https://github.com/Rdatatable/data.table/pull/5786). Thanks to
  @MichaelChirico and @jangorecki for the patches.

# data.table \ 
[v1.14.8](https://github.com/Rdatatable/data.table/milestone/28?closed=1)  (17 \ 
Feb 2023)

## NOTES

1. Test 1613.605 now passes changes to `as.data.frame()` in R-devel,
  [#5597](https://github.com/Rdatatable/data.table/pull/5597). Thanks to
  Avraham Adler for reporting.

2. An out of bounds read when combining non-equi join with `by=.EACHI`
  has been found and fixed thanks to clang ASAN,
  [#5598](https://github.com/Rdatatable/data.table/issues/5598). There
  was no bug or consequence because the read was followed (now preceded)
  by a bounds test.

3. `.rbind.data.table` (note the leading `.`) is no longer exported
  when `data.table` is installed in R>=4.0.0 (Apr 2020),
  [#5600](https://github.com/Rdatatable/data.table/pull/5600). It was
  never documented which R-devel now detects and warns about. It is only
  needed by `data.table` internals to support R<4.0.0; see note 1 in
  v1.12.6 (Oct 2019) below in this file for more details.

# data.table \ 
[v1.14.6](https://github.com/Rdatatable/data.table/milestone/27?closed=1)  (16 \ 
Nov 2022)

## BUG FIXES

1. `fread()` could leak memory,
  [#3292](https://github.com/Rdatatable/data.table/issues/3292). Thanks
  to @patrickhowerter for reporting, and Jim Hester for the fix. The fix
  requires R 3.4.0 or later. Loading `data.table` in earlier versions
  now highlights this issue on startup, asks users to upgrade R, and
  warns that we intend to upgrade `data.table`'s dependency from 8 year
  old R 3.1.0 (April 2014) to 5 year old R 3.4.0 (April 2017).

## NOTES

1. Test 1962.098 has been modified to pass latest changes to `POSIXt`
  in R-devel.

2. `test.data.table()` no longer creates `DT` in `.GlobalEnv`, a CRAN
  policy violation,
  [#5514](https://github.com/Rdatatable/data.table/issues/5514). No
  other writes occurred to `.GlobalEnv` and release procedures have been
  improved to prevent this happening again.

3. The memory usage of the test suite has been halved,
  [#5507](https://github.com/Rdatatable/data.table/issues/5507).

# data.table \ 
[v1.14.4](https://github.com/Rdatatable/data.table/milestone/26?closed=1)  (17 \ 
Oct 2022)

## NOTES

1. gcc 12.1 (May 2022) now detects and warns about an always-false
  condition (`-Waddress`) in `fread` which caused a small efficiency
  saving never to be invoked,
  [#5476](https://github.com/Rdatatable/data.table/pull/5476). Thanks to
  CRAN for testing latest versions of compilers.

2. `update.dev.pkg()` has been renamed `update_dev_pkg()` to get out
  of the way of the `stats::update` generic function,
  [#5421](https://github.com/Rdatatable/data.table/pull/5421). This is a
  utility function which upgrades the version of `data.table` to the
  latest commit in development which has passed all tests. As such we
  don't expect any backwards compatibility concerns. Its manual page was
  causing an intermittent hang/crash from `R CMD check` on Windows-only
  on CRAN which we hope will be worked around by changing its name.

3. Internal C code now passes `-Wstrict-prototypes` to satisfy the
  warnings now displayed on CRAN,
  [#5477](https://github.com/Rdatatable/data.table/pull/5477).

4. `write.csv` in R-devel no longer responds to
  `getOption("digits.secs")` for `POSIXct`,
  [#5478](https://github.com/Rdatatable/data.table/issues/5478). This
  caused our tests of `fwrite(, dateTimeAs="write.csv")` to fail on
  CRAN's daily checks using latest daily R-devel. While R-devel
  discussion continues, and currently it seems like the change is
  intended with further changes possible, this `data.table` release
  massages our tests to pass on latest R-devel. The idea is to try to
  get out of the way of R-devel changes in this regard until the new
  behavior of `write.csv` is released and confirmed. Package updates are
  not accepted on CRAN if they do not pass the latest daily version of
  R-devel, even if R-devel changes after the package update is
  submitted. If the change to `write.csv()` stands, then a future
  release of `data.table` will be needed to make `fwrite(,
  dateTimeAs="write.csv")` match `write.csv()` output again in that
  future version of R onwards. If you use an older version of
  `data.table` than said future one in the said future version of R,
  then `fwrite(, dateTimeAs="write.csv")` may not match `write.csv()` if
  you are using `getOption("digits.secs")` too. However, you can always
  check that your installation of `data.table` works in your version of
  R on your platform by simply running `test.data.table()`
  yourself. Doing so would detect such a situation for you: test 1741
  would fail in this case. `test.data.table()` runs the entire suite of
  tests and is always available to you locally. This way you do not need
  to rely on our statements about which combinations of versions of R
  and `data.table` on which platforms we have tested and support; just
  run `test.data.table()` yourself. Having said that, because test 1741
  has been relaxed in this release in order to be accepted on CRAN to
  pass latest R-devel, this won't be true for this particular release in
  regard to this particular test.

    ```R
    $ R --vanilla
    R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
    > DF = data.frame(A=as.POSIXct("2022-10-01 01:23:45.012"))
    > options(digits.secs=0)
    > write.csv(DF)
    "","A"
    "1",2022-10-01 01:23:45
    > options(digits.secs=3)
    > write.csv(DF)
    "","A"
    "1",2022-10-01 01:23:45.012

    $ Rdevel --vanilla
    R Under development (unstable) (2022-10-06 r83040) -- "Unsuffered \ 
Consequences"
    > DF = data.frame(A=as.POSIXct("2022-10-01 01:23:45.012"))
    > options(digits.secs=0)
    > write.csv(DF)
    "","A"
    "1",2022-10-01 01:23:45.012
    ```

5. Many thanks to Kurt Hornik for investigating potential impact of a
  possible future change to `base::intersect()` on empty input,
  providing a patch so that `data.table` won't break if the change is
  made to R, and giving us plenty of notice,
  [#5183](https://github.com/Rdatatable/data.table/pull/5183).

6. `datatable.[dll|so]` has changed name to `data_table.[dll|so]`,
  [#4442](https://github.com/Rdatatable/data.table/pull/4442). Thanks to
  Jan Gorecki for the PR. We had previously removed the `.` since `.` is
  not allowed by the following paragraph in the Writing-R-Extensions
  manual. Replacing `.` with `_` instead now seems more consistent with
  the last sentence.

    > ... the basename of the DLL needs to be both a valid file name
      and valid as part of a C entry point (e.g. it cannot contain
      ‘.’): for portable code it is best to confine DLL names to be
      ASCII alphanumeric plus underscore. If entry point R_init_lib is
      not found it is also looked for with ‘.’ replaced by ‘_’.

# data.table \ 
[v1.14.2](https://github.com/Rdatatable/data.table/milestone/24?closed=1)  (27 \ 
Sep 2021)

## NOTES

1. clang 13.0.0 (Sep 2021) requires the system header `omp.h` to be
  included before R's headers,
  [#5122](https://github.com/Rdatatable/data.table/issues/5122). Many
  thanks to Prof Ripley for testing and providing a patch file.
   2021-10-26 13:23:42 by Nia Alarie | Files touched by this commit (1161)
Log message:
textproc: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes

Unfetchable distfiles (fetched conditionally?):
./textproc/convertlit/distinfo clit18src.zip
   2021-10-07 17:02:49 by Nia Alarie | Files touched by this commit (1162)
Log message:
textproc: Remove SHA1 hashes for distfiles
   2021-06-23 21:59:11 by Jason Bacon | Files touched by this commit (1)
Log message:
textproc/R-data.table: Release maintainership

Narrowing my focus to biology packages
   2021-06-06 15:32:51 by Makoto Fujiwara | Files touched by this commit (2) | Package updated
Log message:
(textproc/R-data.table) updated  1.12.2 to 1.14.0

NEWS.md from 1.12.2 to 1.14.0 has over 1,000 lines, See following
URL for full text:
https://github.com/Rdatatable/data.table/blob/master/NEWS.md

Here is only for 'POTENTIALLY BREAKING CHANGES' in 1.140.0

# data.table \ 
[v1.14.0](https://github.com/Rdatatable/data.table/milestone/23?closed=1)

## POTENTIALLY BREAKING CHANGES

1. In v1.13.0 (July 2020) native parsing of datetime was added to
`fread` by Michael Chirico which dramatically improved
performance. Before then datetime was read as type character by
default which was slow. Since v1.13.0, UTC-marked datetime
(e.g. `2020-07-24T10:11:12.134Z` where the final `Z` is present) has
been read automatically as POSIXct and quickly. We provided the
migration option `datatable.old.fread.datetime.character` to revert to
the previous slow character behavior. We also added the `tz=` argument
to control unmarked datetime; i.e. where the `Z` (or equivalent UTC
postfix) is missing in the data. The default `tz=""` reads unmarked
datetime as character as before, slowly. We gave you the ability to
set `tz="UTC"` to turn on the new behavior and read unmarked datetime
as UTC, quickly. R sessions that are running in UTC by setting the TZ
environment variable, as is good practice and common in production,
have also been reading unmarked datetime as UTC since v1.13.0, much
faster. Note 1 of v1.13.0 (below in this file) ended `In addition to
convenience, fread is now significantly faster in the presence of
dates, UTC-marked datetimes, and unmarked datetime when tz="UTC" is
provided.`.

    At `rstudio::global(2021)`, Neal Richardson, Director of
    Engineering at Ursa Labs, compared Arrow CSV performance to
    `data.table` CSV performance, [Bigger Data With Ease Using Apache
    \ 
Arrow](https://rstudio.com/resources/rstudioglobal-2021/bigger-data-with-ease-using-apache-arrow/). \ 
He
    opened by comparing to `data.table` as his main point. Arrow was
    presented as 3 times faster than `data.table`. He talked at length
    about this result. However, no reproducible code was provided and
    we were not contacted in advance in case we had any comments. He
    mentioned New York Taxi data in his talk which is a dataset known
    to us as containing unmarked
    datetime. [Rebuttal](https://twitter.com/MattDowle/status/1360073970498875394).

    `tz=`'s default is now changed from `""` to `"UTC"`. If \ 
you have
    been using `tz=` explicitly then there should be no change. The
    change to read UTC-marked datetime as POSIXct rather than
    character already happened in v1.13.0. The change now is that
    unmarked datetimes are now read as UTC too by default without
    needing to set `tz="UTC"`. None of the 1,017 CRAN packages
    directly using `data.table` are affected. As before, the migration
    option `datatable.old.fread.datetime.character` can still be set
    to TRUE to revert to the old character behavior. This migration
    option is temporary and will be removed in the near future.
   2019-08-08 21:53:58 by Brook Milligan | Files touched by this commit (189) | Package updated
Log message:
Update all R packages to canonical form.

The canonical form [1] of an R package Makefile includes the
following:

- The first stanza includes R_PKGNAME, R_PKGVER, PKGREVISION (as
  needed), and CATEGORIES.

- HOMEPAGE is not present but defined in math/R/Makefile.extension to
  refer to the CRAN web page describing the package.  Other relevant
  web pages are often linked from there via the URL field.

This updates all current R packages to this form, which will make
regular updates _much_ easier, especially using pkgtools/R2pkg.

[1] http://mail-index.netbsd.org/tech-pkg/2019/08/02/msg021711.html