./textproc/xapian-omega, Search engine application for websites using Xapian

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.4.27, Package name: xapian-omega-1.4.27, Maintainer: schmonz

Omega operates on a set of databases. Each database is created and
updated separately using either omindex or scriptindex. You can
search these databases (or any other Xapian database with suitable
contents) via a web front-end provided by omega, a CGI application.
A search can also be done over more than one database at once.


Required to run:
[lang/perl5] [textproc/xapian] [devel/pcre2]

Master sites:

Filesize: 570.273 KB

Version history: (Expand)


CVS history: (Expand)


   2024-12-06 16:32:24 by Amitai Schleier | Files touched by this commit (1) | Package updated
Log message:
xapian-omega: update to 1.4.27. Changes:

omega:

* Calculate date spans in days rather than converting to time_t, which
  side-steps issues due to 32-bit time_t and some implementations not
  handling negative time_t values.

portability:

* Fix build with UCRT64 variant of mingw-w64 by stopping defining
  __MSVCRT_VERSION__ by default. We fixed this for xapian-core in 1.4.24
  but missed that omega defined it too.

* Remove unnecessary 'using namespace std' to fix build on (at least)
  FreeBSD where's nothing in the std namespace to import at this point
  so we get a compiler error.
   2024-07-24 12:55:07 by Amitai Schleier | Files touched by this commit (1) | Package updated
Log message:
xapian-omega: update to 1.4.26. Changes:

indexers:

* omindex:

  + Make robust to the indexer process being run with stdin or stdout closed.

omega:

* Support "bm25+" and "pl2+" in "$set{weighting,...}".

* Deprecate "lm" in "$set{weighting,...}".  This was meant \ 
to implement the
  "Language Model" Weighting scheme, but we've discovered the \ 
implementation
  was incorrect and fixing it requires ABI-incompatible changes in xapian-core.
  For 1.4.x we need to leave it in place so as not to break existing code, but
  we recommended avoiding using it.  It will be removed in the next release
  series and replaced with new separate classes implementing Language Model
  weighting with each smoothing.

* Add "prob" as new preferred name for probabilistic query expansion in
  "$set{expansion,...}", with the previous "trad" still \ 
being accepted for now.

build system:

* Report result of probe to determine compiler support for -Werror or
  equivalent.

* If pkg-config is available, use it to probe for libmagic.

* configure: Probe for closefrom().  Patch from Qiu Yingbo in
  https://github.com/xapian/xapian/pull/323

portability:

* configure: Fix clang detection which wasn't working when configure determined
  a -std=X option was needed to get C++11 support.  The obvious symptom was
  that --enable-werror wouldn't add -Werror.

* configure: NetBSD automatically pulls in library dependencies, so set
  link_all_deplibs_CXX=no there.

* Define __WIN32__/__WIN64__ like we do for xapian-core.  Spotted by Baran Demir.

* Avoid using sprintf() if snprintf() is available, even in cases where the
  output size is bounded to avoid deprecation warnings on macOS.  For 1.4.x
  we still fall back to sprintf() to avoid a point release breaking support
  for any platform still lacking snprintf().

* Use `override` for subclassing functors.  This is good practice as it gives a
  clear compile error if we have to change the signature of an virtual method
  on such a functor.  See #830.

* Fix building with MSVC - it seems to support AR=lib we need to use AM_PROG_AR
  which probes for AR's command line interface.
   2024-03-08 20:01:39 by Amitai Schleier | Files touched by this commit (1) | Package updated
Log message:
xapian-omega: update to 1.4.25. Changes:

testsuite:

* omegatest.pl: Correct program name in error message.

build system:

* configure: DragonflyBSD automatically pulls in library dependencies, so set
  link_all_deplibs_CXX=no there.

* configure: Avoid compiler warning during GCC version check when compiler
  needs an option to enable C++11 support (same fix as applied to xapian-core
  in 1.4.23).
   2023-11-07 23:36:54 by Amitai Schleier | Files touched by this commit (3) | Package updated
Log message:
xapian-omega: update to 1.4.24. Changes:

documentation:

* Document $filesize error handling.

indexers:

* omindex:

  + Implement piped input to filters for __WIN32__.  Previously it looks like
    the filter was run but the input wasn't connected to its stdin so it would
    probably block indefinitely.

  + Fix corner case in shell emulation - we no longer set environment variables
    which start with a digit.

    This issue was spotted from reading the code - in practice this isn't a
    case that's likely to be encountered, and the previous behaviour doesn't
    appear to have any security consequences even if a user was somehow tricked
    into specifying an extraction command did this.

* scriptindex:

  + Check if we can actually support %z in parsedate action.  Previously we
    assumed we could if struct tm had a tm_gmtoff member, but that's only a
    necessary condition and not sufficient, e.g. on Cygwin we have tm_gmtoff
    but strptime() doesn't currently understand %z.

  + If we were expecting an action but didn't get an identifier this triggered
    an infinitely repeating error:

    Unknown index action ''

    Now we instead give a single error:

    Expected index action, found '...'

    where '...' shows the sequence of non-whitespace characters encountered.

testsuite:

* Run tests under eatmydata if available.

* Turn off MSYS2 argument conversion for tests as it breaks omegatest, and we
  shouldn't need this conversion there.

* omegatest: Rewrite in Perl as we were hitting non-portable quoting issues
  with the shell implementation, and really it had grown too large to make
  sense as a shell script anyway.

build system:

* Add --enable-werror configure option.

* configure: Only auto-enable -D_FORTIFY_SOURCE=2 if it works without
  additional libraries and remove the hard-coded block against using it
  on mingw.  Mingw-w64 v11.0.0 eliminated the requirement to link with -lssp
  so we now auto-enable -D_FORTIFY_SOURCE=2 there.

portability:

* Fix to build on Cygwin.

* Rename our bswap32 helper function to avoid clash with system-provided
  function on FreeBSD and NetBSD.
   2023-07-10 17:08:30 by Amitai Schleier | Files touched by this commit (1) | Package updated
Log message:
Update to 1.4.23. From the changelog:

documentation:

* Improve documentation for OmegaScript numerical and logical operators.  Patch
  from Vaibhav Kansagara.

* Improve documentation for DATEVALUE, xFILTERS and $filters.

indexers:

* omindex:

  + Handle XPS files with multiple FixedDocument parts better.  Previously we
    only extracted text from the first FixedDocument part.

  + Prefer latter subparts of multipart/alternative which is what RFC2046 (and
    earlier RFCs which that obsoletes) say, but previously we used the first
    subpart that we could get text from.

  + Prefer latter subparts of multipart/alternative when indexing Outlook
    .msg files too.

  + Fix obscure bug in --mimetype option.  We keep track of the length of the
    longest extension we have a mapping for, but this was being updated using
    the length of the MIME type rather than the length of the extension.
    Theoretically this could have led to us effectively ignoring a --mimetype
    option, but in the real world the MIME type will probably always be longer
    so this just results in us testing long extensions unnecessarily.

omega:

* Ignore DATEVALUE CGI parameter if START.n, etc is specified on the same
  slot.  We explicitly document not to do this, but if that advice is ignored
  it's more helpful to at least preserve the property that we only have
  one date range per value slot.

* Add flag_ngrams as a preferred new alias for flag_cjk_ngram.  In the next
  release series this feature has been expanded to cover many more languages
  so the "cjk" in the name has become inaccurate as it stands for
  "Chinese, Japanese and Korean").

* Fix handling of Outlook .msg containing Unicode.  Codepoints <= U+00FF appear
  to have been handled correctly, but anything higher resulted in individual
  bytes of the UTF-8 encoding being treated as separate characters.

  Fixes https://github.com/xapian/xapian/pull/326, reported by uhuntu.

portability:

* Fix compatibility code for old libmagic versions.  The code we were using
  seems like it would never have worked.  Nobody's reported this (it was
  spotted while looking at the code) so we could just require libmagic >= 4.22,
  but it's trivial to actually handle so we've fixed the fallback code.

* Remove lingering traces of IRIX support as it's been dead for many years.
   2022-09-25 14:25:58 by Amitai Schleier | Files touched by this commit (1)
Log message:
Update to 1.4.21. From the changelog:

documentation:

* Consistently say "macOS" not "Mac OS X", "OS X", etc.

indexers:

* omindex:

  + Add support for gzip-compressed SVG files (.svgz).

  + Handle <title> in SVG.  Previously only <dc:title> inside \ 
<metadata> was
    considered.  If both are present, <title> now takes precedence.

testsuite:

* omegatest: Add skip-for-32-bit-time_t mechanism and use it to conditionally
  enable some testcases which fail on platforms with 32-bit time_t.

build system:

* Update to use AX_CXX_COMPILE_STDCXX which is a replacement for
  AX_CXX_COMPILE_STDCXX_11 (which we were using) which also supports newer C++
  standards versions which will be useful.  For C++11 the only difference seems
  to be that the macro now checks for attribute support - we use C++11
  attributes so that seems a good thing.

Updating during the freeze for the bug and portability fixes.
   2022-07-29 17:21:42 by Amitai Schleier | Files touched by this commit (1)
Log message:
Needs pkg-config to find pcre2 during configure.
   2022-07-11 20:27:07 by Amitai Schleier | Files touched by this commit (2)
Log message:
Update to 1.4.20. From the changelog:

indexers:

* omindex:

  + OpenDocument: Previously we only inserted an implicit space before each
    paragraph.  Now we insert them both before and after each paragraph and
    heading, and before forced each line-break and tab.

  + Add extension mapping for .awt (Abiword templates).

  + Index metadata from XPS files.

  + -G and -C short options were documented in --help but not previously
    actually handled. Reported by David Bremner.

  + Show --max-size required argument in --help output.

  + Remove lingering handling for database backends without slot bounds since
    all backends have been required to support these since 1.4.11.

* scriptindex:

  + Process an incomplete final line from a dump file.  Previously if the final
    line lacked a newline scriptindex would quietly ignore it (unless it was
    the only line).

  + The `unique` action now takes an optional `missing` parameter to specify
    what to do if a record doesn't trigger the unique action or triggers it
    with an empty value.  The default is now to issue a warning and create a
    new document (the same as before, except that there was only previously a
    warning for the empty value case). In Omega 1.5.0 the default will change
    to an error as that seems a better default, but is less compatible with
    potential existing use.

  + Explicitly allow multiple blank lines in input files.  Previously such
    extra blank lines were treated as empty records and in many cases these
    got quietly skipped, but e.g. with the new UNIQUE checks this could result
    in a warning or error.

  + If we hit an error while parsing the index script we used to exit right
    away, but now we finish parsing the index script since it's more helpful to
    report all the errors in an index script rather than the user having to
    fix them one by one.  This requires us to sensibly recover after each index
    script parse error - if you find a case where this recovery triggers
    further bogus errors please report it and we'll try to improve the
    recovery.

  + In four cases while handling input data (two cases of bad hex data fed
    to `hextobin`, an input data line without a `=`, and `load` failing to
    load the specified file) we'd emit a diagnostic that was labelled as an
    "error" but really it was handled as a warning as we kept reading input
    and the "error" didn't affect the exit status.  It doesn't really make
    sense to continue in any of these cases so we now exit with non-zero status
    right away.

  + A parameter in the index script which should be an integer but isn't, or
    should be positive but isn't now gives an error rather than a warning since
    an error seems more helpful.

  + All diagnostics issued while parsing the index script now include column
    information.

  + Avoid forcibly flushing the output stream after every message.

testsuite:

* Improve test coverage for scriptindex.

portability:

* Require PCRE2 instead of PCRE. The original PCRE is now EOL and unmaintained
  (last release was June 2021).  In omega it's potentially used to process
  input from the internet, so security is a real concern hence we're switching
  to PCRE2.