Path to this page:
Subject: CVS commit: pkgsrc/textproc/miller
From: Thomas Klausner
Date: 2017-06-19 22:28:50
Message id: 20170619202850.1AD55FAE8@cvs.NetBSD.org
Log Message:
Updated miller to 5.2.0.
This release contains mostly feature requests.
Features:
The stats1 verb now lets you use regular expressions to specify
which field names to compute statistics on, and/or which to
group by. Full details are here.
The min and max DSL functions, and the min/max/percentile
aggregators for the stats1 and merge-fields verbs, now support
numeric as well as string field values. (For mixed string/numeric
fields, numbers compare before strings.) This means in particular
that order statistics -- min, max, and non-interpolated percentiles
-- as well as mode, antimode, and count are now possible on
string-only (or mixed) fields. (Of course, any operations
requiring arithmetic on values, such as computing sums, averages,
or interpolated percentiles, yield an error on string-valued
input.)
There is a new DSL function mapexcept which returns a copy of
the argument with specified key(s), if any, unset. The motivating
use-case is to split records to multiple filenames depending
on particular field value, which is omitted from the output:
mlr --from f.dat put 'tee > "/tmp/data-".$a, mapexcept($*, \
"a")'
Likewise, mapselect returns a copy of the argument with only
specified key(s), if any, set. This resolves #137.
A new -u option for count-distinct allows unlashed counts for
multiple field names. For example, with -f a,b and without -u,
count-distinct computes counts for distinct pairs of a and b
field values. With -f a,b and with -u, it computes counts for
distinct a field values and counts for distinct b field values
separately.
If you build from source, you can now do ./configure without
first doing autoreconf -fiv. This resolves #131.
The UTF-8 BOM sequence 0xef 0xbb 0xbf is now automatically
ignored from the start of CSV files. (The same is already done
for JSON files.) This resolves #138.
For put and filter with -S, program literals such as the 6 in
$x = 6 were being parsed as strings. This is not sensible, since
the -S option for put and filter is intended to suppress numeric
conversion of record data, not program literals. To get string
6 one may use $x = "6".
Documentation:
A new cookbook example shows how to compute differences between
successive queries, e.g. to find out what changed in time-varying
data when you run and rerun a SQL query.
Another new cookbook example shows how to compute interquartile
ranges.
A third new cookbook example shows how to compute weighted
means.
Bugfixes:
CRLF line-endings were not being correctly autodetected when
I/O formats were specified using --c2j et al.
Integer division by zero was causing a fatal runtime exception,
rather than computing inf or nan as in the floating-point case.
Files: