Path to this page:
Subject: CVS commit: pkgsrc/textproc/miller
From: Thomas Klausner
Date: 2016-02-18 11:07:48
Message id: 20160218100748.E2A00FBB7@cvs.NetBSD.org
Log Message:
Update miller to 3.4.0.
Use release tarball and drop autotools dependencies.
Changes in 3.4.0:
JSON, reshape, regex captures, and more
Primary features:
JSON is now a supported format for input and output. Miller handles tabular \
data, and JSON supports arbitrarily deeply nested data structures, so if you \
want general JSON processing you should use jq. But if you have tabular data \
represented in JSON then Miller can now handle that for you. Please see the \
reference page and the FAQ.
Reshape is a standard data-processing idiom, now available in Miller: \
http://johnkerl.org/miller/doc/reference.html#reshape
Incidentally (not part of this release, but new since the last release) \
Miller is now available in FreeBSD's package manager: \
https://www.freshports.org/textproc/miller/. A full list of distributions \
containing Miller may be found here.
Miller is not yet available from within Fedora/CentOS, but as a step toward \
this goal, an SRPM is included in this release (see file-list below).
DSL enhancements for mlr put and mlr filter:
Regex captures \0 through \9: \
http://johnkerl.org/miller/doc/reference.html#Regex_captures
Ternary operator in expression right-hand sides: e.g. mlr put '$y = $x < \
0.5 ? 0 : 1'
Boolean literals true and false
Final semicolon is now allowed: e.g. mlr put '$x=1;$y=2;'
Environment variables are now accessible, where environment-variable names \
may be string literals or arbitrary expressions: mlr put '$home = \
ENV["HOME"]' or mlr put '$value = ENV[$name]'.
While records are still string-to-string maps for input and output, and \
between then statements, types are preserved between multiple statements within \
a put. Example: mlr put '$y = string($x); $z = $y . $y' works as expected, \
without requring mlr put '$y = string($x); $z = string($y) . string($y)' as \
before.
Bug fixes:
Mixed-format join, e.g. CSV file joined with DKVP file, was incorrectly \
computing default separators (IRS, IFS, IPS). This resulted in records not being \
joined together.
Segmentation violation on non-standard-input read of files with size an \
exact multiple of page size and not ending in IRS, e.g. newline. (This is less \
of a corner case than it sounds: for example, leave a long-running program \
running with output redirected to a file, then in a sleep-and-process loop, have \
Miller process that file. The former program's stdio library will likely be \
doing block-sized buffered I/O, where block sizes will often be multiples of \
system page size and the block will almost surely not ending a newline.)
Acknowledgements: Big thank-yous to @gregfr and @aaronwolen for feature requests \
including reshape and regex captures, and to @jungle-boogie for his work getting \
Miller into FreeBSD. Also, ongoing thanks to @0-wiz-0 for his past work on \
configure support, making it possible for Miller to be put to use in multiple \
operating systems.
3.3.2
Bootstrap sampling, EWMA, merge-fields, isnull/isnotnull functions
@johnkerl johnkerl released this on Jan 11 · 497 commits to master since this \
release
Bootstrap sampling in mlr bootstrap: \
http://johnkerl.org/miller/doc/reference.html#bootstrap. Compare to reservoir \
sampling in mlr sample: http://johnkerl.org/miller/doc/reference.html#sample.
Exponentially weighted moving averages in mlr step -a ewma: principally \
useful for smoothing of noisy time series, e.g. finely sampled system-resource \
utilization to give one of many possible examples. Please see \
http://johnkerl.org/miller/doc/reference.html#step.
"Horizontal" univariate statistics in mlr merge-fields, compared \
to mlr stats which is "vertical". Also allows collapsing multiple \
fields into one, such as in_bytes and out_bytes data fields summing to \
bytes_sum. This can also be done easily using mlr put. However, mlr merge-fields \
allows aggregation of more than just a pair of field names, and supports \
pattern-matching on field names. Please see \
http://johnkerl.org/miller/doc/reference.html#merge-fields for more information.
isnull and isnotnull functions for mlr filter and mlr put.
stats1, stats2, merge-fields, step, and top correctly handle not only \
missing fields (in the row-heterogeneous-data case) but also null-valued fields.
Minor memory-management improvements.
Files: