pkgsrc.se | The NetBSD package collection

Subject: CVS commit: pkgsrc/textproc/xapian
From: Amitai Schleier
Date: 2024-03-08 20:00:54
Message id: 20240308190054.36B93FA2A@cvs.NetBSD.org

Log Message:
xapian: update to 1.4.25. Changes:

API:

* MSet::get_eset(): Don't fetch the collection frequency for each term unless
  we're using the Bo1EWeight expansion scheme which actually needs it.  In a
  simple test this reduced the time taken to do a search and generate expand
  terms by a third.  Partly addresses #264.

* QueryParser::parse_query(): Fix parse error when using FLAG_CJK_NGRAM (aka
  FLAG_NGRAMS) with a query string which has non-CJK followed by whitespace,
  CJK, and more non-CJK.  Patch from Robert Stepanek
  (https://github.com/xapian/xapian/pulls/331).

testsuite:

* unittest: Improve sparse file detection by using SEEK_HOLE, which is
  specified by POSIX and seems to be widely supported.  On platforms without it
  or on an FS with a > 128K block size we will skip the tests involving a 4GB
  file, but that's acceptable.  On ZFS st_blocks reports the number of blocks
  after compression and also lags behind when data has only been committed to
  the journal, which means our previous check based on st_blocks couldn't be
  made to work without potentially falsely detecting sparse file support.
  Fixes #823, reported by someplaceguy.

* apitest: Enable adddoc2 and adddoc5 testcases for sharded databases.  We
  now just skip the TermIterator::get_termfreq() checks in this case.

glass backend:

* Check Btree level value from disk is in range, which avoids potential out of
  range access on corrupt database.  Fixes #824, reported by group13.

* Reject invalid blocksize read from corrupted version file.  Throw
  DatabaseCorruptError if value is out of range or not a power of two.

* Optimise allterms iteration.  Most terms don't contain any zero bytes, and
  for such terms the key for the first chunk in the termlist table is just the
  termname so no decoding is needed when advancing the iterator.  This optimisation
  is 8.4% faster in a simple test of iterating allterms via xapian-delve.

* Compaction of an empty non-optional table now gives an empty output, whereas
  previous it was one block in size (8K by default).  This isn't important in
  general as the non-optional tables are not likely to be empty in a real
  database, but it's helpful for making small test database and it seems weird
  that compaction would make a database much larger in percentage terms in this
  edge case.

chert backend:

* Check Btree level value from disk is in range, which avoids potential out of
  range access on corrupt database.  Fixes #824, reported by group13.

build system:

* configure: DragonflyBSD automatically pulls in library dependencies, so set
  link_all_deplibs_CXX=no there.

documentation:

* Document allterms_begin() and termlist_begin() iteration order.  Thanks to
  Eric Wong for querying this.

* Document TermIterator::get_termfreq() quirk.  In the case of a TermIterator
  from termlist_begin() on a Document from a sharded database, you get term
  frequencies from just the shard.  Fixes #423

portability:

* Support building on platforms without AI_NUMERICSERV (e.g. macOS 10.5).
  Patch from Sergey Fedorov.

Files:

Revision	Action	file
1.23	modify	pkgsrc/textproc/xapian/Makefile.common
1.51	modify	pkgsrc/textproc/xapian/distinfo
1.29	modify	pkgsrc/textproc/xapian/distinfo-bindings