Next | Query returned 44 messages, browsing 21 to 30 | Previous

History of commit frequency

CVS Commit History:


   2021-03-24 11:31:05 by Frederic Cambus | Files touched by this commit (2) | Package updated
Log message:
miller: update to 5.10.2.

This release restore mlr manpage to distro file.
   2021-03-22 11:48:32 by Frederic Cambus | Files touched by this commit (2) | Package updated
Log message:
miller: update to 5.10.1.

Various bugfixes. No upstream ChangeLog.
   2020-12-01 22:59:56 by Frederic Cambus | Files touched by this commit (2) | Package updated
Log message:
miller: update to 5.10.0.

ChangeLog:

Features:

- The unsparsify -f feature
- The new sort-within-records verb is an old ask, underway from the Go
  port, backported to C
- Likewise the truncate DSL function

Bugfixes:

- The count -n feature was not implemented as intended
- Pretty-print format now works correctly with --headerless-csv-output
- The seqgen verb now correctly tracks NR and FNR in the records it emits
- An intermittent JSON-parsing bug has been fixed
   2020-09-03 10:14:13 by Frederic Cambus | Files touched by this commit (2) | Package updated
Log message:
miller: update to 5.9.1.

ChangeLog:

Security update: disallow --prepipe in .mlrrc

As of Miller 5.9.0, you can have a .mlrrc file containing preferred flags.

As reported in #363, it would be possible for someone to prepare a repository
or some other zipfile/tarfile, for example, containing datasets, and send it
to you. They could have a line of the form prepipe do_something_bad; cat in
that repository, so when you ran any mlr commands in there, it would run the
do_something_bad command (whatever that might be).

The fix is (a) disallow prepipe within .mlrrc files; (b) as a consolation,
allow new prepipe-zcat and prepipe-gunzip options which are safe to use.

Fixes CVE-2020-15167.
   2020-08-20 16:01:27 by Frederic Cambus | Files touched by this commit (2) | Package updated
Log message:
miller: update to 5.9.0.

ChangeLog:

- You can now save common defaults in a ~/.mlrrc. For example, if you
  normally process CSV files, you can say that in your ~/.mlrrc and you
  can leave off the --csv flag from your mlr commands. You can read more
  about this feature here, or in man mlr, or in mlr --help.
   2020-08-04 17:35:53 by Frederic Cambus | Files touched by this commit (2) | Package updated
Log message:
miller: update to 5.8.0.

ChangeLog:

Features

    The new count verb is a keystroke-saver for stats1 -a count -f
    {some field name}.

    --jsonx and --ojsonx are keystroke-savers for --json --jvstack
    and --ojson --jvstack, which is to say, multi-line pretty-printed
    JSON format.

    The new -s name=value feature for mlr put and mlr filter gives
    you simpler access to environment variables in your Miller
    script, as requested in #315.

Bugfixes

    mlr format-values is no longer SEGVing on CSV/TSV input. This
    was reported on #330.

    #313 fixes a corner case when field names within command-line
    arguments have embedded newlines.

    Line/column indicators for JSON-formatting error messages are
    now correct (previously they were showing up as 0).

    end {print NF} no longer SEGVs. This was reported in #330.

    Several broken doc links were fixed up as reported on #329.
   2020-03-17 15:38:25 by Frederic Cambus | Files touched by this commit (2) | Package updated
Log message:
miller: update to 5.7.0.

ChangeLog:

Features

    The new remove-empty-columns and skip-trivial-records are
    keystroke-savers for things which would other require DSL
    syntax, as tracked in #274.

Bugfixes

    A bug regarding optional regex-pattern groups was fixed in
    #277.

    As of #294 you can now specify --implicit-csv-header for the
    join-file in mlr join.

    A bug with spaces in XTAB-file values was fixed on #296.

    A bug with missing final newline for XTAB-formatted files
    using MMAP files was fixed on #301.

Documentation

    Look-and-feel at http://johnkerl.org/miller/doc/ is (hopefully)
    improved, including clearer visual indication of which section/page
    you're currently looking at. Note that this change has been
    live for a few weeks, as look-and-feel-related doc-mods from
    post-5.6.2 were backported to http://johnkerl.org/miller/doc/.

    #282 improves DSL-function documentation at
    \ 
http://johnkerl.org/miller/doc/reference-dsl.html#Built-in_functions_for_filter_and_put,_summary

Note

Support for mmap mode has been entirely discontinued. This is an
invisible change and should not affect you at all. For anyone
interested in lower-level details, though, the summary is as follows:

    For an incremental performance gain (perhaps 10-20% run time
    at most, but see below), within the C source code one can use
    the mmap system call to access input files via pointer arithmetic
    rather than malloc-and-memcopy using stdio.

    However mmap is not available when reading from standard
    input -- it cannot be memory-mapped.

    This means all file-format readers are implemented twice
    within the Miller source code.

    While I try to regression-test Miller thoroughly, running
    all canned tests through mmap and stdio mode, I've nonetheless
    found my mmap implementations liable to corner-cases which I
    miss but users find: for example #29, #102, and #296.

    As tracked on #160, various operating systems do not release
    mmapped pages after use as one might intuit, meaning that for
    large files and/or large numbers of files, I've for a long time
    now needed to have Miller opt out of mmap usage for precisely
    those cases which most need the performance gain: see #160,
    #181, and #256.

    Additionally, mmap is not used at all for Windows/MSYS2 so
    there is nothing to lose there.

For these reasons, keeping mmap mode isn't worth the development
overhead.

As of release 5.6.3, the mlr executable will still accept the --mmap
and --no-mmap command-line flags as no-ops, for backward compatibility.

The caveat for you is that for everyday small files, the default
was previously mmap mode and is now stdio (except mlr ... < filename
or ... | mlr ... which have always used stdio). There is the off
chance that this will newly reveal an old, latent bug or two
somewhere.

I've re-run regressions in valgrind mode to aggressively catch any
errors, but, please let me know ASAP via GitHub issue of any
unexpected behavior in 5.7.0.
   2020-03-06 09:18:31 by Frederic Cambus | Files touched by this commit (2) | Package updated
Log message:
miller: update to 5.6.2.

ChangeLog:

v5.6.2

Bug fixes:

    #271 fixes a corner-case bug with more than 100 CSV/TSV files with
    headers of varying lengths.

Documentation:

    The new http://johnkerl.org/miller/doc/whyc-details.html is an
    elaboration on http://johnkerl.org/miller/doc/whyc.html which answers
    a question posed by @BurntSushi on Reddit a couple years ago which
    I did not address in detail at the time.

v5.6.1

    The only change is that http://johnkerl.org/miller/doc is now
    more mobile-friendly.  All build artifacts are the same as at
    https://github.com/johnkerl/miller/releases/tag/v5.6.0

v5.6.0

    The new system DSL function allows you to run arbitrary shell commands
    and store them in field values. Some example usages are documented
    here. This is in response to issues #246 and #209.

    There is now support for ASV and USV file formats. This is in response
    to issue #245.

    The new format-values verb allows you to apply numerical formatting
    across all record values. This is in response to issue #252.

Documentation:

    The new DKVP I/O in Python sample code now works for Python 2 as
    well as Python 3.

    There is a new cookbook entry on doing multiple joins. This is in
    response to issue #235.

Bugfixes:

    The toupper, tolower, and capitalize DSL functions
    are now UTF-8 aware, thanks to @sheredom's marvelous
    https://github.com/sheredom/utf8.h. The internationalization page
    has also been expanded. This is in response to issue #254.

    #250 fixes a bug using in-place mode in conjunction with verbs
    (such as rename or sort) which take field-name lists as arguments.

    #253 fixes a bug in the label when one or more names are common
    between old and new.

    #251 fixes a corner-case bug when (a) input is CSV; (b) the last
    field ends with a comma and no newline; (c) input is from standard
    input and/or --no-mmap is supplied.

v5.5.0

    The new positional-indexing feature resolves #236 from @aborruso. You
    can now get the name of the 3rd field of each record via $[[3]], and
    its value by $[[[3]]]. These are both usable on either the left-hand
    or right-hand side of assignment statements, so you can more easily
    do things like renaming fields progrmatically within the DSL.

    There is a new capitalize DSL function, complementing the
    already-existing toupper. This stems from #236.

    There is a new skip-trivial-records verb, resolving #197. Similarly,
    there is a new remove-empty-columns verb, resolving #206. Both are
    useful for data-cleaning use-cases.

    Another pair is #181 and #256. While Miller uses mmap internally
    (and invisibily) to get approximately a 20% performance boost over
    not using it, this can cause out-of-memory issues with reading either
    large files, or too many small ones. Now, Miller automatically avoids
    mmap in these cases. You can still use --mmap or --no-mmap if you
    want manual control of this.

    There is a new --ivar option for the nest verb which complements
    the already-existing --evar. This is from #260 thanks to @jgreely.

    There is a new keystroke-saving urandrange DSL function:
    urandrange(low, high) is the same as low + (high - low) *
    urand(). This arose from #243.

    There is a new -v option for the cat verb which writes a low-level
    record-structure dump to standard error.

    There is a new -N option for mlr which is a keystroke-saver for
    --implicit-csv-header --headerless-csv-output.

Documentation:

    The new FAQ entry
    http://johnkerl.org/miller/doc/faq.html#How_to_escape_'%3F'_in_regexes%3F
    resolves #203.

    The new FAQ entry
    http://johnkerl.org/miller/doc/faq.html#How_can_I_filter_by_date%3F
    resolves #208.

    #244 fixes a documentation issue while highlighting the need for #241.

Bugfixes:

    There was a SEGV using nest within then-chains, fixed in response
    to #220.

    Quotes and backslashes weren't being escaped in JSON output with
    --jvquoteall; reported on #222.

v5.4.0

    The new clean-whitespace verb resolves #190 from @aborruso. Along with
    the new functions strip, lstrip, rstrip, collapse_whitespace, and
    clean_whitespace, there is now both coarse-grained and fine-grained
    control over whitespace within field names and/or values. See the
    linked-to documentation for examples.

    The new altkv verb resolves #184 which was originally opened via an
    email request. This supports mapping value-lists such as a,b,c,d to
    alternating key-value pairs such as a=b,c=d.

    The new fill-down verb resolves #189 by @aborruso. See the linked-to
    documentation for examples.

    The uniq verb now has a uniq -a which resolves #168 from @sjackman.

    The new regextract and regextract_or_else functions resolve #183
    by @aborruso.

    The new ssub function arises from #171 by @dohse, as a simplified way
    to avoid escaping characters which are special to regular-expression
    parsers.

    There are new localtime functions in response to #170 by
    @sitaramc. However note that as discussed on #170 these do
    not undo one another in all circumstances. This is a non-issue
    for timezones which do not do DST. Otherwise, please use with
    disclaimers: localdate, localtime2sec, sec2localdate, sec2localtime,
    strftime_local, and strptime_local.

Builds:

    Windows build-artifacts are now available in Appveyor at
    https://ci.appveyor.com/project/johnkerl/miller/build/artifacts,
    and will be attached to this and future releases. This resolves #167,
    #148, and #109.

    Travis builds at https://travis-ci.org/johnkerl/miller/builds now
    run on OSX as well as Linux.

    An Ubuntu 17 build issue was fixed by @singalen on #164.

Documentation:

    put/filter documentation was confusing as reported by @NikosAlexandris
    on #169.

    The new FAQ entry
    \ 
http://johnkerl.org/miller-releases/miller-head/doc/faq.html#How_to_rectangularize_after_joins_with_unpaired?
    resolves #193 by @aborruso.

    The new cookbook entry
    \ 
http://johnkerl.org/miller/doc/cookbook.html#Options_for_dealing_with_duplicate_rows
    arises from #168 from @sjackman.

    The unsparsify documentation had some words missing as reported by
    @tst2005 on #194.

    There was a typo in the cookpage page
    http://johnkerl.org/miller/doc/cookbook.html#Full_field_renames_and_reassigns
    as fixed by @tst2005 in #192.

Bugfixes:

    There was a memory leak for TSV-format files only as reported by
    @treynr on #181.

    Dollar sign in regular expressions were not being escaped properly
    as reported by @dohse on #171.

v5.3.0

    Comment strings in data files: mlr --skip-comments allows
    you to filter out input lines starting with #, for all file
    formats. Likewise, mlr --skip-comments-with X lets you specify
    the comment-string X. Comments are only supported at start of data
    line. mlr --pass-comments and mlr --pass-comments-with X allow you
    to forward comments to program output as they are read.

    The count-similar verb lets you compute cluster sizes by cluster
    labels.

    While Miller DSL arithmetic gracefully overflows from 64-integer
    to double-precision float (see also here), there are now the
    integer-preserving arithmetic operators .+ .- .* ./ .// for those
    times when you want integer overflow.

    There is a new bitcount function: for example, echo x=0xf0000206 |
    mlr put '$y=bitcount($x)' produces x=0xf0000206,y=7.

    Issue 158: mlr -T is an alias for --nidx --fs tab, and mlr -t is an
    alias for mlr --tsvlite.

    The mathematical constants π and e have been renamed from PI and
    E to M_PI and M_E, respectively. (It's annoying to get a syntax
    error when you try to define a variable named E in the DSL, when
    A through D work just fine.) This is a backward incompatibility,
    but not enough of us to justify calling this release Miller 6.0.0.

Documentation:

    As noted here, while Miller has its own DSL there will always be
    things better expressible in a general-purpose language. The new page
    Sharing data with other languages shows how to seamlessly share data
    back and forth between Miller, Ruby, and Python. SQL-input examples
    and SQL-output examples contain detailed information the interplay
    between Miller and SQL.

    Issue 150 raised a question about suppressing numeric conversion. This
    resulted in a new FAQ entry How do I suppress numeric conversion?,
    as well as the longer-term follow-on issue 151 which will make
    numeric conversion happen on a just-in-time basis.

    To my surprise, csvlite format options weren’t listed in mlr --help
    or the manpage. This has been fixed.

    Documentation for auxiliary commands has been expanded, including
    within the manpage.

Bugfixes:

    Issue 159 fixes regex-match of literal dot.

    Issue 160 fixes out-of-memory cases for huge files. This is an old
    bug, as old as Miller, and is due to inadequate testing of huge-file
    cases. The problem is simple: Miller prefers memory-mapped I/O
    (using mmap) over stdio since mmap is fractionally faster. Yet as
    any processing (even mlr cat) steps through an input file, more and
    more pages are faulted in -- and, unfortunately, previous pages are
    not paged out once memory pressure increases. (This despite gallant
    attempts with madvise.) Once all processing is done, the memory is
    released; there is no leak per se. But the Miller process can crash
    before the entire file is read. The solution is equally simple: to
    prefer stdio over mmap for files over 4GB in size. (This 4GB threshold
    is tunable via the --mmap-below flag as described in the manpage.)

    Issue 161 fixes a CSV-parse error (with error message "unwrapped
    double quote at line 0") when a CSV file starts with the UTF-8
    byte-order-mark ("BOM") sequence 0xef 0xbb 0xbf and the header line
    has double-quoted fields. (Release 5.2.0 introduced handling for
    UTF-8 BOMs, but missed the case of double-quoted header line.)

    Issue 162 fixes a corner case doing multi-emit of aggregate variables
    when the first variable name is a typo.

    The Miller JSON parser used to error with Unable to parse JSON data:
    Line 1 column 0: Unexpected 0x00 when seeking value on empty input,
    or input with trailing whitespace; this has been fixed.
   2019-03-29 00:52:09 by Leonardo Taccari | Files touched by this commit (1)
Log message:
miller: Add flex as tool dependency

(flex is explicitly needed in c/parsing/Makefile for mlr_dsl_lexer.c.)
   2017-08-14 23:22:55 by Thomas Klausner | Files touched by this commit (2)
Log message:
Updated miller to 5.2.2.

5.2.2

This bugfix release delivers a fix for #147 where a memory allocation failed \ 
beyond 4GB.

5.2.1

Fix non-x86/gcc7 build error

Next | Query returned 44 messages, browsing 21 to 30 | Previous