Subject: CVS commit: pkgsrc/sysutils/bup
From: Greg Troxel
Date: 2024-12-11 02:19:15
Message id: 20241211011916.0940EFC1C@cvs.NetBSD.org

Log Message:
sysutils/bup: Update to 0.33.5

This is a micro but it fixes serious bugs so if you use get/gc you
really need to update!

Summary of upstream NEWS:

Notable changes in 0.33.5 since 0.33.4 (incomplete)
===================================================

May require attention
---------------------

* Problems have been discovered that could have allowed the creation
  of incomplete trees or commits, for example `bup save` or `bup get`
  could have created saves with missing data.  This should no longer
  be possible, but any existing incomplete trees might also be re-used
  by `bup get` (for example), and so represent a continuing hazard.
  Note that if you've never used `bup gc` or `bup get`, then we don't
  currently believe your repositories could have been affected.

  You can detect whether you've been affected by running
  `bup-validate-object-links(1)`.  If it doesn't report any broken
  links (as `no HASH for PARENT_HASH`), then you can stop here, the
  repository should be fine.  You can also run `bup midx -af` first,
  which may speed up the validation.

  If it does report broken links, then you should run `bup gc --ignore
  missing` to completion before making any further additions to the
  repository.  But first, if you have other repositories that might
  still contain the missing objects, then you may want to try to
  retrieve them.  See "repopulate missing objects" below for details.
  `bup gc --ignore-missing` will eliminate some of the hazards and
  report at least one of the paths to each missing object.

  If `gc` doesn't report any broken paths (missing objects), then you
  can stop here, the repository should be fine.

  If `gc` does report broken paths, then you should clear the related
  indexes, e.g. `bup index --clear` or `bup on HOST index --clear`,
  etc.

  If you don't rely on `bup get` (e.g. if you only `save`) then
  clearing the index(es) should ensure that new saves will be complete
  (though existing broken saves will remain structurally broken for
  now).

  If you do rely on `bup get`, and if `gc` reports broken paths, then
  there's not yet an easy way to ensure `get` won't continue to re-use
  incomplete trees when building new saves.  For now, you could start
  a new repository (and save the old one for future repairs).

  You can try to repopulate missing objects from the source (or some
  other) repository.  To do so, you can collect a list of missing
  objects via `bup validate-object-links`:

    bup validate-object-links | tee validate-out
    grep -E '^no ' validate-out | cut -d' ' -f 2 | sort -u > missing-objects
    sed -e 's/^/--unnamed git:/' missing-objects > unnamed-objects

  and then try to retrieve the missing objects from another repository
  via `bup get`.  For example, perhaps:

    xargs bup get --source repo --unnamed --ignore-missing < unnamed-objects

  or

    xargs bup on HOST get --unnamed --ignore-missing < unnamed-objects

  After that, you can run `bup validate-object-links` to see whether
  you were able to fix all of the broken references (i.e. whether it
  still reports missing objects).

  If you have enough missing objects, it's possible xargs might split
  the argument list between `--unnamed` and its argument, causing
  `get` to fail.  If so, you can just specify an even numbered value
  for `xargs -n`, for example `xargs -n 64 bup get ...`.
  On most systems, you can choose a much larger `n`.

  If you would just like to validate some saves, you can now run `bup
  validate-ref-links SAVE...` which should be much more efficient than
  attempting a restore or joining the saves to /dev/null.

  We're also working on a command that will repair the structure of
  any existing broken trees so that commands like restore will still
  be able to work with them.

  See issue/missing-objects.md for a detailed explanation of the
  problem.  If you have pandoc and graphviz dot installed, this will
  be rendered to issue/missing-objects.html which you can open in a
  browser, or you can find it
  [here](https://bup.github.io/issue/missing-objects.html).

General
-------

* `bup validate-object-links` has been added.  This command scans the
  objects in the repository and reports any "broken links" it finds,
  i.e. any links from a tree or commit in the repository to an object
  that doesn't exist.

* `bup validate-ref-links` has been added.  This command traverses
  repository references (e.g. saves) and logs paths to missing
  objects, i.e. references from a tree or commit to an object that
  doesn't exist in the repository.  At the moment, it will report at
  least one path to each missing object; it does not attempt to find
  all of the paths.

* `bup gc` now provides `--ignore-missing` which allows a `gc`
  operation to continue after encountering objects that are missing
  from the repository.

* `bup join` now reports the path to any missing object it encounters.

Bugs
----

* `bup gc` should no longer risk leaving the repository with
  incomplete tree or commit objects -- trees or commits with
  references to objects that are no longer in the repository.

  Previously this could happen because the collection was
  probabilistic with respect to all object types, and so it could
  leave (completely orphaned) vestigial commits or trees that referred
  to objects that had been removed.  It could also do this if the
  `--threshold` caused it to keep a parent in one "live enough" pack,
  while discarding a descendant in a pack that doesn't cross the
  threshold.

  These objects can cause serious trouble because they can be re-used
  as-is (without noticing that they are incomplete) by other commands
  like `bup get`.

* `bup get` should no longer be able to leave the repository with
  incomplete trees or commits if it's interrupted at the wrong time.
  Previously it fetched objects "top down", and so if it was
  interrupted after the parent tree/commit was written to the
  repository, but before all the children were written, then the
  repository would be left with an incomplete tree.

* `bup` should always ignore midx files that refer to missing indexes.
  Previously it might not notice when objects had disappeared (via
  `gc`) which could, in particular, cause remote/client operations
  like a remote save to decide that the repository already contained
  data that it did not.

* `bup midx` `--auto` and `--force` now delete midx files that refer
  to missing indexes.

* `bup gc` should no longer throw bloom close-related exceptions when
  interrupted.

Build system
------------

* [Graphviz](https://graphviz.org) `dot` is optional, but must be
  available in order to render the figures referred to by
  issue/missing-objects.md.

Thanks to (at least)
====================

Greg Troxel, Johannes Berg, and Rob Browning

Files:
RevisionActionfile
1.67modifypkgsrc/sysutils/bup/Makefile
1.13modifypkgsrc/sysutils/bup/PLIST
1.27modifypkgsrc/sysutils/bup/distinfo