Subject: CVS commit: pkgsrc/textproc/xapian
From: Amitai Schleier
Date: 2024-07-24 12:54:36
Message id: 20240724105436.B8B19FC74@cvs.NetBSD.org

Log Message:
xapian-core: update to 1.4.26. Changes:

API:

* Weight: Document that Weight statistics DOC_LENGTH_MIN, DOC_LENGTH_MAX and
  WDF_MAX are for the shard rather than the whole database.  Usually this is
  what we want as with a sharded database it gives tighter bounds and so better
  match optimisation, but it does make them unsuitable for uses such as
  calculating a suitable offset to add to every get_sumextra() to allow
  implementing a weighting formula which can give a negative term independent
  weight contribution.  This case will be addressed in the next release series
  which also provides bounds such as DB_DOC_LENGTH_MIN which are for the whole
  database.

* LMWeight: This class was meant to implement the "Language Model" \ 
Weighting
  scheme, but we've discovered the implementation was incorrect and fixing it
  requires ABI-incompatible changes.  For 1.4.x we need to leave it in place so
  as not to break existing code, but it's now deprecated and we recommend
  avoiding using it.  It will be removed in the next release series and
  replaced with new separate classes implementing Language Model weighting with
  each smoothing.  Thanks to Sourav Saha for reporting this problem.

* PL2PlusWeight: Fix bug in implementation of formula.  Our variable mean is
  1/lambda_t from the PL2+ paper, so we need to check mean>1 for lambda_t<1 but
  we were actually checking mean<1 instead.  The result of this is that PL2+
  actually returned a zero weight unless the term occurred frequently enough in
  the collection.

* TradWeight::get_maxpart() no longer forces the wdf_max value to be at least
  one.  We used to do this so that a non-existent term in the query would cause
  it not to achieve 100%, but now we calculate percentages based on the number
  of matching subqueries, and it is more natural for a non-existent term to get
  zero weight (ditto for a term which always has wdf 0).  This was already
  addressed for BM25Weight in 1.2.1 back in 2010.

* Enquire::set_expansion_scheme(): Add "prob" as new preferred name for
  probabilistic query expansion, with the previous "trad" still being \ 
accepted
  for now.

* QueryParser::set_prefix() and set_boolean_prefix(): Allow an optional
  trailing `:` on the field name.  This makes the API here more consistent with
  ranges, where you need to include the `:` if you want one.  See #720.

testsuite:

* Catch and report if a testcase causes signal SIGPIPE.

* Suppress valgrind errors about calling memmove() with overlapping source and
  destination (which is valid, valgrind is just confused when memcpy() and
  memmove() share an implementation).

* Add more testing of weighting schemes.

* Mark checkstatsweight3 with a sharded database as XFAIL (expected to fail).
  This testcase was previous not run for sharded databases, with a FIXME
  comment noting this.  Investigating shows it's due to a bug where we use the
  shard's termfreqs rather than those for the whole database for an expanded
  wildcard, but this seems complex to fix.

matcher:

* Fix minor wildcard weighting bug spotted while reading the code.  We were
  returning too high a value from the first call to get_maxpart() in some
  cases.  Mostly this just means the matcher continue working when it could
  have stopped, but it will also cause MSet::get_termweight() to return a
  higher value than the actual known upper bound.

glass backend:

* Simplify file descriptor handling for lock files on Unix-like platforms
  which don't support OFD locks.  This eliminates corner cases where we
  could end up with file descriptors without close-on-exec set in the main
  process.

remote backend:

* xapian-tcpsrv: Use _exit() instead of exit() to end child processes which
  avoids the risk of duplicated output from stdio buffers getting copied by
  fork() then flushed in both processes.

* Simplify file descriptor handling when launching prog remote.

inmemory backend:

* Fix bug adding posting entries.

build system:

* Improve probe for -Bsymbolic-functions.  MSVC doesn't support this flag, but
  it only emits a warning when it is used and that warning didn't match any of
  the patterns we already check for so we were detecting it as supported.

* Report result of probe to determine compiler support for -Werror or
  equivalent.

documentation:

* Improve MSVC build instructions.  Thanks to Baran Demir for feedback.

* Improve formatting of stat_flags API documentation.

* sorting.rst: Replace custom weighting scheme documentation with a link to the
  more complete equivalent in "Getting Started with Xapian".

* remote.rst: Update to reflect that user metadata is fully supported (since
  1.2.4).

portability:

* Fix to compile as C++20 and C++23.

* Resolve SIGPIPE issues on NetBSD, which were causing testcase keepalive1 to
  fail.  These seem to be due to SO_NOSIGPIPE not working correctly there so
  we now use MSG_NOSIGNAL instead for NetBSD.

* Include <errno.h> for sys_errlist.  We already do this for the configure
  check but were failing to when actually using sys_errlist, which probably
  affects at least NetBSD.

* configure: Fix clang detection which wasn't working when configure determined
  a -std=X option was needed to get C++11 support.  The obvious symptom was
  that --enable-werror wouldn't add -Werror.

* configure: NetBSD automatically pulls in library dependencies, so set
  link_all_deplibs_CXX=no there.

* Avoid using sprintf() if snprintf() is available, even in cases where the
  output size is bounded, to avoid deprecation warnings on macOS.  For 1.4.x
  we still fall back to sprintf() to avoid a point release breaking support
  for any platform still lacking snprintf().

* Stop linking with --enable-runtime-pseudo-reloc.  We were requiring this for
  cygwin and mingw, but from the documentation it should only be needed for a
  library which exports data symbols, which we don't do, and the build works
  without it.

* Use `override` for subclassing functors.  This is good practice as it gives a
  clear compile error if we have to change the signature of an virtual method
  on such a functor.  See #830.

* Avoid redefining MSVC-specific macros if they are already defined.  This
  avoids an MSVC warning and potential for the code to be ill-formed if the
  user defines these macros with a value other than 1.  Patch from A. Jiang
  (https://github.com/xapian/xapian/pull/334).

Files:
RevisionActionfile
1.24modifypkgsrc/textproc/xapian/Makefile.common
1.52modifypkgsrc/textproc/xapian/distinfo
1.30modifypkgsrc/textproc/xapian/distinfo-bindings
1.2removepkgsrc/textproc/xapian/patches/patch-common_errno__to__string.cc