Subject: CVS commit: pkgsrc/textproc/xapian
From: Amitai Schleier
Date: 2019-12-17 04:52:58
Message id:

Log Message:
Update to 1.4.14. From the changelog:


* Xapian::QueryParser: Handle "" inside a quoted phrase better.  In a \ 
  boolean term, "" is treated as an escaped ", so handle it in a \ 
compatible way
  for quoted phrases.  Previously we'd drop out of the phrase and start a new
  phrase.  Fixes #630, reported by Austin Clements.

* Xapian::Stem: The constructor which takes a stemmer name now takes an
  optional second bool parameter - if this is true, then an unknown stemmer
  name falls back to using the "none" stemmer instead of throwing an \ 
  This allows simply constructing a stemmer from an ISO language code without
  having to worry about whether there's a stemmer for that language, and
  without having to handle an exception if there isn't.

* Xapian::Stem: Fix a bug with handling 4-byte UTF-8 sequences which
  potentially affects most of the stemmers.  None of the stemmers work in
  languages where 4-byte UTF-8 sequences are part of the alphabet, but this
  bug could result in invalid UTF-8 sequences in terms generated from text
  containing high Unicode codepoints such as emoji, which can cause issues (for
  example, in some language bindings).  Fix synced from Snowball git post
  2.0.0.  Reported by Ilari Nieminen in

* Xapian::Stem: Add a new is_none() method which tests if this is a "none"

* Xapian::Weight: The total length of all documents is now made available to
  Xapian::Weight subclasses, and this is now used by DLHWeight, DPHWeight and
  LMWeight.  To maintain ABI compatibility, internally this still fetches the
  average length and the number of documents, multiplies them, then rounds the
  result, but in the next release series this will be handled directly.

* Xapian::Database::locked() on an inmemory database used to always return
  false, but an inmemory Database is always actually a WritableDatabase
  underneath, so now we always report true in this case because it's really
  always report being locked for writing.

* Fix write one past end of std::vector on certain QueryParser parser errors.
  This is undefined behaviour, but the write was always into reserved space, so
  in practice we'd actually get away with it (it was noticed because it
  triggers an error when running under ubsan and using libc++).  Reported by
  Germán M. Bravo.

* MSet::get_matches_estimated(): Improve rounding of result - a bug meant we
  would almost always round down.

* Optimise test for UTF-8 continuation character.  Performing a signed char
  comparison shaves an instruction or two on most architectures.

* Database::get_revision(): Return revision 0 for a Database with no shards
  rather that throwing InvalidOperationError.

* DPHWeight: Avoid dividing by 0 when searching a sharded database when one
  shard is empty.  The result wasn't used in this case, but it's still
  undefined behaviour.  Detected by UBSan.


* Fix failing multi_glass_remoteprog_glass tests on x86.  When the tests are
  run under valgrind, remote servers should be run using the runsrv wrapper
  script, but this wasn't happening for remote servers in multi-databases - now
  it is.  Also, previously runsrv only used valgrind for the remote for an x86
  build that didn't use SSE, but it seems there are x87 instructions in libc
  that are affected by valgrind not providing excess precision, so do this for
  x86 builds which use SSE too.  Together these changes fix failures of
  topercent2, xor2, tradweight1 under backend multi_glass_remoteprog_glass on

* Fix C++ One-Definition Rule (ODR) violation in testsuite code.  Two different
  source files linked into apitest were each defining a different `struct
  test`.  Wrap each in an anonymous namespace to localise it to the file it is
  defined and used in.  This was probably harmless in practice, unless trying
  to build with Link-Time Optimisation or similar (which is how it was

* Test all language codes in stemlangs1.  The testsuite hardcodes a list of
  supported language codes which hadn't been updated since 2008.

* Improve DateRangeProcessor test coverage.

* The "singlefile" test harness backend manager now creates databases by
  compacting the corresponding underlying backend database (creating it first
  if need be) rather than always creating a temporary database to compact.

* Enable compaction testcases for multi and singlefile test harness backends.

* Add generated database support for remoteprog and remotetcp test harness
  backends.  Implemented by Tanmay Sachan.

* Add test harness support for running testcases using a multi database
  comprised of one local and one remote shard, or two remote shards.
  Implemented by Tanmay Sachan.

* Check if removing existing multi stub failed.  Previously if removing an
  existing stub failed, the test harness would create a temporary new stub and
  then try to rename it over the old one, which will always fail on Microsoft

* Wait for xapian-tcpsrv processes to finish before moving on to the next
  testcase under __WIN32__ like we already do on POSIX platforms.


* Handle pruning under a positional check.  This used to be impossible, but
  since 1.4.13 it can happen as we now hoist AND_NOT to just below where we
  hoist the positional checks.  The code on master already handles pruning here
  so this bug is specific to the RELEASE/1.4 branch.  Fixes #796, reported by
  Oliver Runge.

* When searching with collapsing over multiple shards, at least some of which
  are remote, uncollapsed_upper_bound could be too low and
  uncollapsed_lower_bound too high.  This was causing assertion failures in
  testcases msize1 and msize2 under test harness backends
  multi_glass_remoteprog_glass and multi_remoteprog_glass.

* Internally we no longer calculate a bogus total_term_count as the sum of
  total_length * doc_count for all shards.  Instead we just use the sum of
  total_length, which gives the total number of term occurrences.  This change
  should improve the estimated collection_freq values for synonyms.

* Several places where we might divide zero by zero in a database where wdf was
  always zero have been fixed.

* Optimise OP_AND_NOT better.  We now combine its left argument with other
  connected and-like subqueries, and gather up and hoist the negated subqueries
  and apply them together above the combined and-like subqueries, just below
  any positional filters.

* Optimise OP_AND_MAYBE better.  We now combine its left argument with other
  connected and-like subqueries, and gather up and hoist the optional
  subqueries and apply them together above the combined and-like subqueries and
  any hoisted positional filters.

* Treat all BoolWeight queries as scaled by 0 - we can optimise better if we
  know the query is unweighted.

build system:

* configure: Stop using AC_FUNC_MEMCMP.  The autoconf manual marks it as
  "obsolescent", and it seems clear that nobody's relying on it as we're
  missing the "'AC_LIBOBJ' replacement for 'memcmp'" which it would try to
  use if needed.

glass backend:

* Allow zlib compression to reduce size by one byte.  We were specifying an
  output buffer size one byte smaller than the input, but it appears zlib won't
  use the final byte in the buffer, so we actually need to pass the input size
  as the output buffer size.

* Only try to compress Btree item values > 18 bytes, which saves CPU time
  without sacrificing any significant size savings.

remote backend:

* Fix match stats when searching with collapsing over multiple shards and at
  least some shards are remote.  Bug discovered by Tanmay Sachan's test harness

* Ignore orphaned remote protocol replies which can happen when searching with
  a remote shard if an exception is thrown by another shard.  Bug discovered
  by Tanmay Sachan's test harness improvements.

* Wait for xapian-progsrv child to exit when a remote Database or
  WritableDatabase object is closed under __WIN32__ like we already do for
  POSIX platforms.


* HACKING: Replace release docs with pointer to the developer guide where they
  are now maintained.

* Correct documentation of initial messages in replication protocol.


* quest: Report bounds and estimate of number of matches.

* xapian-delve: Improve output when database revision information is not
  available.  We now specially handle the cases of a DB with multiple shards
  and a backend which doesn't support get_revision().


* Eliminate 2 uses of atoi().  These are potentially problematic in a
  multithreaded application if setlocale() is called by another thread at the
  same time.  See #665.

* Don't check __GNUC__ in visibility.h as the configure probe before defining
  XAPIAN_ENABLE_VISIBILITY checks that the visibility attributes work.  This
  probably makes no difference in practice, as all compilers we're aware of
  which support symbol visibility also define __GNUC__.

* Document Sun C++ requires --disable-shared.  Closes #631.

* Fix warning from GCC 9 with -Wdeprecated-copy (which is enabled by -Wextra)
  if a reference to an Error object is thrown.

* Suppress GCC warning in our API headers when compiling code using Xapian with
  GCC and -Wduplicated-branches.

* Mark some internal classes as final (following GCC -Wsuggest-final-types
  suggestions to allow some method calls to be devirtualised).

* Fix to build with --enable-maintainer-mode and Perl < 5.10, which doesn't
  have the `//=` operator.  It's unlikely developers will have such an old
  Perl, but the mingw environment on appveyor CI does.  The use of `//=` was
  introduced by changes in 1.4.10.