Subject: CVS commit: pkgsrc/textproc/xapian-omega
From: Amitai Schleier
Date: 2018-07-06 18:23:55
Message id: 20180706162355.9A76DFBEC@cvs.NetBSD.org

Log Message:
Update to 1.4.6. From the changelog:

general:

* Fix generate_sample() (used by OmegaScript $truncate and omindex) to return
  an empty sample instead of throwing an exception when the requested sample
  size is less than the size of the truncation indicator string.  Patch from
  Addy.  Fixes https://trac.xapian.org/ticket/754 reported by Gaurav Arora.

indexers:

* Check for the HTML5 doctype or legacy doctype declaration and use default
  charset UTF-8 if either is present.  Previously we always used ISO-8859-1,
  which is correct for older HTML versions, but not for HTML5.

* omindex:

  + When running commands without going through the shell, emulate shell exit
    codes 127 (for command not found) and 126 (for other cases where we fail to
    run the command).  This means the "missing filter" handling should \ 
now work
    properly for such commands.  Noted by Gaurav Arora.

  + Index POD files despite minor formatting errors.  We now pass
    --errors=stderr to pod2text so that minor formatting errors don't prevent
    us from indexing a file.  (It may seem that --errors=none is a better
    option, but for podlators < 4.11 that results in an ERRATA section in the
    generated text version which we then end up indexing; 4.11 fixed that but
    we can't assume that's in use).  Reported by Gaurav Arora.

* omindex:

  + Check file size before calling libmagic to get the mime type, since
    reading the file size is a much cheaper check and we can skip the
    libmagic test if the file is empty or larger than the specified
    maximum size.  Patch from caiyulun.

* scriptindex:

  + Avoid some unnecessary copying of Action objects by making use of C++11
    features.

  + Consistently send errors to stderr - some were sent to stdout.
    Patch from Gaurav Arora.

  + Add new "hextobin" action.  Based on a patch from Gaurav Arora.

  + Warn about non-integer arg to hash.

  + Fix hash action without an argument, which was failing with an assertion.
    Based on a patch by Gaurav Arora: https://github.com/xapian/xapian/pull/189

  + Reject 'hash' with argument < 6.  The hashing truncates and then adds a
    6 character hash of the removed part, so can't produce a result shorter
    than 6 characters.  Patch from Gaurav Arora.

  + Look for alphanumerics when parsing index actions.  None of the current
    index actions contain digits, but we give more helpful error messages this
    way.

  + Deprecate allowing spaces around = in scripts.  This was never documented
    as supported, and leads to a missing argument quietly swallowing the next
    action rather than using an empty value or giving an error.  Reported by
    Gaurav Arora in https://github.com/xapian/xapian/pull/182

  + In boolean and unique actions, add a colon between prefix and term when
    the term starts with a colon.  This means the mapping is reversible, and
    matches what omega actually does in this case when it tries to reverse the
    mapping.  Thanks to Andy Chilton for pointing out this corner case.

  + Add parsedate and valuepacked actions.  Together these assist adding date
    values for sorting and date range filtering.  Based on a patch from Gaurav
    Arora.

  + Use DB_RETRY_LOCK to wait if the database is already in use rather than
    sleeping for a second and retrying.  On most platforms this means we make a
    blocking request for the lock, and even on platforms where that's not
    supported, we now sleep and retry inside libxapian, and without having to
    throw and catch an exception each time.

* scriptindex:

  + Reject index scripts with multiple "unique" actions.  We don't \ 
handle this
    case sensibly, and it doesn't seem like it really has a use, so better to
    give an error for people who do this inadvertently.

omega:

* $freq: Speed up some cases by avoiding throwing and catching an exception
  when we know the MSet has no term frequency information.

* $sort: New OmegaScript command which does a string sort on an OmegaScript
  list, with u (unique) and r (reverse) options.

* $cond: New OmegaScript conditional multi-way conditional.  Inspired by LISP's
  COND, this provides a neater way to write a cascade of $if checks.

* $switch: New OmegaScript multi-way conditional which provides an even neater
  way to write a cascade of $if{$eq{X,VALUE1},$if{$eq{X,VALUE2},...}}.

* $subdb and $subid: New commands which report the subdatabase name and the
  docid in that subdatabase.

+ $termprefix and $unprefix: New OmegaScript commands which expose the existing
  code inside omega for splitting up a term.

* Use str() to convert time_t to string, which is simpler code and faster than
  using snprintf().

* New $seterror command to set the error message.  Implemented by Gaurav Arora.

* Make $highlight more efficient.  Patch from Vivek Pal.

templates:

* query: Use $prettyurl for the URL shown at the end of each match (previously
  we only used it on the URL shown as a fallback when the document has no
  title).  Split off from changes by Vivek Pal in
  https://github.com/xapian/xapian/pull/161

Files:
RevisionActionfile
1.23modifypkgsrc/textproc/xapian-omega/distinfo