Subject: CVS commit: pkgsrc/textproc/R-data.table
From: Makoto Fujiwara
Date: 2021-06-06 15:32:51
Message id: 20210606133251.85ED8FA95@cvs.NetBSD.org

Log Message:
(textproc/R-data.table) updated  1.12.2 to 1.14.0

NEWS.md from 1.12.2 to 1.14.0 has over 1,000 lines, See following
URL for full text:
https://github.com/Rdatatable/data.table/blob/master/NEWS.md

Here is only for 'POTENTIALLY BREAKING CHANGES' in 1.140.0

# data.table \ 
[v1.14.0](https://github.com/Rdatatable/data.table/milestone/23?closed=1)

## POTENTIALLY BREAKING CHANGES

1. In v1.13.0 (July 2020) native parsing of datetime was added to
`fread` by Michael Chirico which dramatically improved
performance. Before then datetime was read as type character by
default which was slow. Since v1.13.0, UTC-marked datetime
(e.g. `2020-07-24T10:11:12.134Z` where the final `Z` is present) has
been read automatically as POSIXct and quickly. We provided the
migration option `datatable.old.fread.datetime.character` to revert to
the previous slow character behavior. We also added the `tz=` argument
to control unmarked datetime; i.e. where the `Z` (or equivalent UTC
postfix) is missing in the data. The default `tz=""` reads unmarked
datetime as character as before, slowly. We gave you the ability to
set `tz="UTC"` to turn on the new behavior and read unmarked datetime
as UTC, quickly. R sessions that are running in UTC by setting the TZ
environment variable, as is good practice and common in production,
have also been reading unmarked datetime as UTC since v1.13.0, much
faster. Note 1 of v1.13.0 (below in this file) ended `In addition to
convenience, fread is now significantly faster in the presence of
dates, UTC-marked datetimes, and unmarked datetime when tz="UTC" is
provided.`.

    At `rstudio::global(2021)`, Neal Richardson, Director of
    Engineering at Ursa Labs, compared Arrow CSV performance to
    `data.table` CSV performance, [Bigger Data With Ease Using Apache
    \ 
Arrow](https://rstudio.com/resources/rstudioglobal-2021/bigger-data-with-ease-using-apache-arrow/). \ 
He
    opened by comparing to `data.table` as his main point. Arrow was
    presented as 3 times faster than `data.table`. He talked at length
    about this result. However, no reproducible code was provided and
    we were not contacted in advance in case we had any comments. He
    mentioned New York Taxi data in his talk which is a dataset known
    to us as containing unmarked
    datetime. [Rebuttal](https://twitter.com/MattDowle/status/1360073970498875394).

    `tz=`'s default is now changed from `""` to `"UTC"`. If \ 
you have
    been using `tz=` explicitly then there should be no change. The
    change to read UTC-marked datetime as POSIXct rather than
    character already happened in v1.13.0. The change now is that
    unmarked datetimes are now read as UTC too by default without
    needing to set `tz="UTC"`. None of the 1,017 CRAN packages
    directly using `data.table` are affected. As before, the migration
    option `datatable.old.fread.datetime.character` can still be set
    to TRUE to revert to the old character behavior. This migration
    option is temporary and will be removed in the near future.

Files:
RevisionActionfile
1.3modifypkgsrc/textproc/R-data.table/Makefile
1.2modifypkgsrc/textproc/R-data.table/distinfo