Subject: CVS commit: pkgsrc/textproc/miller
From: Thomas Klausner
Date: 2016-02-18 11:07:48
Message id: 20160218100748.E2A00FBB7@cvs.NetBSD.org

Log Message:
Update miller to 3.4.0.

Use release tarball and drop autotools dependencies.

Changes in 3.4.0:

JSON, reshape, regex captures, and more

Primary features:

    JSON is now a supported format for input and output. Miller handles tabular \ 
data, and JSON supports arbitrarily deeply nested data structures, so if you \ 
want general JSON processing you should use jq. But if you have tabular data \ 
represented in JSON then Miller can now handle that for you. Please see the \ 
reference page and the FAQ.

    Reshape is a standard data-processing idiom, now available in Miller: \ 
http://johnkerl.org/miller/doc/reference.html#reshape

    Incidentally (not part of this release, but new since the last release) \ 
Miller is now available in FreeBSD's package manager: \ 
https://www.freshports.org/textproc/miller/. A full list of distributions \ 
containing Miller may be found here.

    Miller is not yet available from within Fedora/CentOS, but as a step toward \ 
this goal, an SRPM is included in this release (see file-list below).

DSL enhancements for mlr put and mlr filter:

    Regex captures \0 through \9: \ 
http://johnkerl.org/miller/doc/reference.html#Regex_captures

    Ternary operator in expression right-hand sides: e.g. mlr put '$y = $x < \ 
0.5 ? 0 : 1'

    Boolean literals true and false

    Final semicolon is now allowed: e.g. mlr put '$x=1;$y=2;'

    Environment variables are now accessible, where environment-variable names \ 
may be string literals or arbitrary expressions: mlr put '$home = \ 
ENV["HOME"]' or mlr put '$value = ENV[$name]'.

    While records are still string-to-string maps for input and output, and \ 
between then statements, types are preserved between multiple statements within \ 
a put. Example: mlr put '$y = string($x); $z = $y . $y' works as expected, \ 
without requring mlr put '$y = string($x); $z = string($y) . string($y)' as \ 
before.

Bug fixes:

    Mixed-format join, e.g. CSV file joined with DKVP file, was incorrectly \ 
computing default separators (IRS, IFS, IPS). This resulted in records not being \ 
joined together.

    Segmentation violation on non-standard-input read of files with size an \ 
exact multiple of page size and not ending in IRS, e.g. newline. (This is less \ 
of a corner case than it sounds: for example, leave a long-running program \ 
running with output redirected to a file, then in a sleep-and-process loop, have \ 
Miller process that file. The former program's stdio library will likely be \ 
doing block-sized buffered I/O, where block sizes will often be multiples of \ 
system page size and the block will almost surely not ending a newline.)

Acknowledgements: Big thank-yous to @gregfr and @aaronwolen for feature requests \ 
including reshape and regex captures, and to @jungle-boogie for his work getting \ 
Miller into FreeBSD. Also, ongoing thanks to @0-wiz-0 for his past work on \ 
configure support, making it possible for Miller to be put to use in multiple \ 
operating systems.

3.3.2

Bootstrap sampling, EWMA, merge-fields, isnull/isnotnull functions

@johnkerl johnkerl released this on Jan 11 · 497 commits to master since this \ 
release

    Bootstrap sampling in mlr bootstrap: \ 
http://johnkerl.org/miller/doc/reference.html#bootstrap. Compare to reservoir \ 
sampling in mlr sample: http://johnkerl.org/miller/doc/reference.html#sample.
    Exponentially weighted moving averages in mlr step -a ewma: principally \ 
useful for smoothing of noisy time series, e.g. finely sampled system-resource \ 
utilization to give one of many possible examples. Please see \ 
http://johnkerl.org/miller/doc/reference.html#step.
    "Horizontal" univariate statistics in mlr merge-fields, compared \ 
to mlr stats which is "vertical". Also allows collapsing multiple \ 
fields into one, such as in_bytes and out_bytes data fields summing to \ 
bytes_sum. This can also be done easily using mlr put. However, mlr merge-fields \ 
allows aggregation of more than just a pair of field names, and supports \ 
pattern-matching on field names. Please see \ 
http://johnkerl.org/miller/doc/reference.html#merge-fields for more information.
    isnull and isnotnull functions for mlr filter and mlr put.
    stats1, stats2, merge-fields, step, and top correctly handle not only \ 
missing fields (in the row-heterogeneous-data case) but also null-valued fields.
    Minor memory-management improvements.

Files:
RevisionActionfile
1.6modifypkgsrc/textproc/miller/Makefile
1.7modifypkgsrc/textproc/miller/distinfo