./textproc/xapian-omega, Search engine application for websites using Xapian

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 1.4.23, Package name: xapian-omega-1.4.23, Maintainer: schmonz

Omega operates on a set of databases. Each database is created and
updated separately using either omindex or scriptindex. You can
search these databases (or any other Xapian database with suitable
contents) via a web front-end provided by omega, a CGI application.
A search can also be done over more than one database at once.

Required to run:
[lang/perl5] [textproc/xapian] [devel/pcre2]

Master sites:

Filesize: 558.199 KB

Version history: (Expand)

CVS history: (Expand)

   2023-07-10 17:08:30 by Amitai Schleier | Files touched by this commit (1) | Package updated
Log message:
Update to 1.4.23. From the changelog:


* Improve documentation for OmegaScript numerical and logical operators.  Patch
  from Vaibhav Kansagara.

* Improve documentation for DATEVALUE, xFILTERS and $filters.


* omindex:

  + Handle XPS files with multiple FixedDocument parts better.  Previously we
    only extracted text from the first FixedDocument part.

  + Prefer latter subparts of multipart/alternative which is what RFC2046 (and
    earlier RFCs which that obsoletes) say, but previously we used the first
    subpart that we could get text from.

  + Prefer latter subparts of multipart/alternative when indexing Outlook
    .msg files too.

  + Fix obscure bug in --mimetype option.  We keep track of the length of the
    longest extension we have a mapping for, but this was being updated using
    the length of the MIME type rather than the length of the extension.
    Theoretically this could have led to us effectively ignoring a --mimetype
    option, but in the real world the MIME type will probably always be longer
    so this just results in us testing long extensions unnecessarily.


* Ignore DATEVALUE CGI parameter if START.n, etc is specified on the same
  slot.  We explicitly document not to do this, but if that advice is ignored
  it's more helpful to at least preserve the property that we only have
  one date range per value slot.

* Add flag_ngrams as a preferred new alias for flag_cjk_ngram.  In the next
  release series this feature has been expanded to cover many more languages
  so the "cjk" in the name has become inaccurate as it stands for
  "Chinese, Japanese and Korean").

* Fix handling of Outlook .msg containing Unicode.  Codepoints <= U+00FF appear
  to have been handled correctly, but anything higher resulted in individual
  bytes of the UTF-8 encoding being treated as separate characters.

  Fixes https://github.com/xapian/xapian/pull/326, reported by uhuntu.


* Fix compatibility code for old libmagic versions.  The code we were using
  seems like it would never have worked.  Nobody's reported this (it was
  spotted while looking at the code) so we could just require libmagic >= 4.22,
  but it's trivial to actually handle so we've fixed the fallback code.

* Remove lingering traces of IRIX support as it's been dead for many years.
   2022-09-25 14:25:58 by Amitai Schleier | Files touched by this commit (1)
Log message:
Update to 1.4.21. From the changelog:


* Consistently say "macOS" not "Mac OS X", "OS X", etc.


* omindex:

  + Add support for gzip-compressed SVG files (.svgz).

  + Handle <title> in SVG.  Previously only <dc:title> inside \ 
<metadata> was
    considered.  If both are present, <title> now takes precedence.


* omegatest: Add skip-for-32-bit-time_t mechanism and use it to conditionally
  enable some testcases which fail on platforms with 32-bit time_t.

build system:

* Update to use AX_CXX_COMPILE_STDCXX which is a replacement for
  AX_CXX_COMPILE_STDCXX_11 (which we were using) which also supports newer C++
  standards versions which will be useful.  For C++11 the only difference seems
  to be that the macro now checks for attribute support - we use C++11
  attributes so that seems a good thing.

Updating during the freeze for the bug and portability fixes.
   2022-07-29 17:21:42 by Amitai Schleier | Files touched by this commit (1)
Log message:
Needs pkg-config to find pcre2 during configure.
   2022-07-11 20:27:07 by Amitai Schleier | Files touched by this commit (2)
Log message:
Update to 1.4.20. From the changelog:


* omindex:

  + OpenDocument: Previously we only inserted an implicit space before each
    paragraph.  Now we insert them both before and after each paragraph and
    heading, and before forced each line-break and tab.

  + Add extension mapping for .awt (Abiword templates).

  + Index metadata from XPS files.

  + -G and -C short options were documented in --help but not previously
    actually handled. Reported by David Bremner.

  + Show --max-size required argument in --help output.

  + Remove lingering handling for database backends without slot bounds since
    all backends have been required to support these since 1.4.11.

* scriptindex:

  + Process an incomplete final line from a dump file.  Previously if the final
    line lacked a newline scriptindex would quietly ignore it (unless it was
    the only line).

  + The `unique` action now takes an optional `missing` parameter to specify
    what to do if a record doesn't trigger the unique action or triggers it
    with an empty value.  The default is now to issue a warning and create a
    new document (the same as before, except that there was only previously a
    warning for the empty value case). In Omega 1.5.0 the default will change
    to an error as that seems a better default, but is less compatible with
    potential existing use.

  + Explicitly allow multiple blank lines in input files.  Previously such
    extra blank lines were treated as empty records and in many cases these
    got quietly skipped, but e.g. with the new UNIQUE checks this could result
    in a warning or error.

  + If we hit an error while parsing the index script we used to exit right
    away, but now we finish parsing the index script since it's more helpful to
    report all the errors in an index script rather than the user having to
    fix them one by one.  This requires us to sensibly recover after each index
    script parse error - if you find a case where this recovery triggers
    further bogus errors please report it and we'll try to improve the

  + In four cases while handling input data (two cases of bad hex data fed
    to `hextobin`, an input data line without a `=`, and `load` failing to
    load the specified file) we'd emit a diagnostic that was labelled as an
    "error" but really it was handled as a warning as we kept reading input
    and the "error" didn't affect the exit status.  It doesn't really make
    sense to continue in any of these cases so we now exit with non-zero status
    right away.

  + A parameter in the index script which should be an integer but isn't, or
    should be positive but isn't now gives an error rather than a warning since
    an error seems more helpful.

  + All diagnostics issued while parsing the index script now include column

  + Avoid forcibly flushing the output stream after every message.


* Improve test coverage for scriptindex.


* Require PCRE2 instead of PCRE. The original PCRE is now EOL and unmaintained
  (last release was June 2021).  In omega it's potentially used to process
  input from the internet, so security is a real concern hence we're switching
  to PCRE2.
   2022-06-28 13:38:00 by Thomas Klausner | Files touched by this commit (3952)
Log message:
*: recursive bump for perl 5.36
   2022-01-02 10:32:06 by Amitai Schleier | Files touched by this commit (2)
Log message:
Update to 1.4.19. From the changelog:


* configure: Add missing AC_ARG_VAR for all programs so that they are
  documented in --help output, and so that autoconf knows they are \ 
  and preserves them if configure is rerun even when they're specified via an
  environment variable.

* Add usage examples for $jsonobject.

* Fix path to omega in quickstart document.  Fixes #813, reported by Jim Lynch.

* Update for the IRC channel move from freenode to libera.chat.


* Fix handling of UTF-16 BOMs in XML and HTML - we had the sense of the
  endianness indicated by the BOM the wrong way round.

* Avoid making an extra temporary copy of HTML/XML data which has a UTF16 BOM.

* We now ignore an end of line immediately after a PHP close tag to match what
  PHP does.

* omindex:

  + Fix handling of formatted xlsx dates in certain cases.

* scriptindex:

  + Add new scriptindex whitespace removal actions `ltrim`, `rtrim`, `squash`,
    and `trim`.

  + Improve `truncate` action - if a word ends exactly on the requested length
    we now leave it in place rather than removing it.

  + Report the location of previous `unique` action in the error given when
    `unique` is used more than once.


* Clamp START and END with packed timestamps.  The 4-byte unsigned packed
  time_t format can't represent dates before 1970 or after Sun 07 Feb 2106
  06:28:15 UTC so clamp dates before or after these - previously they would
  wrap around.

* The JSON produced by $jsonobject no longer contains newlines, which makes it
  usable as a single line serialisation format without post-processing.

* Add $base64 OmegaScript command.

* omega: Add flag_no_positions to wrap new


* Fix topterms template to not trigger early matching.  We were checking $msize
  before including the `query` template, but doing so would trigger the query
  to be run, which means that settings early in the `query` template which
  should affect the result (such as $setmap{prefix,...}) were being ignored
  when the `topterms` template was used.  Partly addresses #815, reported by

* Add field support to opensearch and xml templates.  These templates now also
  search title, topic and filename by default and support `title:`, `author:`
  and `topic:` in the query string (both like the template `query` already
  does). Fixes remaining issue in #815, reported by Gennadiy.


* Expand omegatest.  All scriptindex actions now have test coverage.

build system:

* Replace uses of obsolete autoconf macros, fixing warnings if configure is
  regenerated with a recent release of autoconf.


* Don't automatically use _FORTIFY_SOURCE on mingw-w64.  Recent mingw-w64
  versions require -lssp to be linked when _FORTIFY_SOURCE is enabled, so just
  skip the automatic enabling.  Users who want to enable it can specify it

  Fixes #808, reported by xpbxf4.

* Automatically enable GCC warnings -Wduplicated-cond and -Wduplicated-branches
  if using a GCC version new enough to support them.  The usefulness of
  -Wduplicated-cond was highlighted by dcb in #816.

* Fix GCC -Wshadow warning.

* Use clock_gettime() and nanosleep() under modern mingw as these allow higher
  precision than what we previously used.
   2021-10-26 13:23:42 by Nia Alarie | Files touched by this commit (1161)
Log message:
textproc: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes

Unfetchable distfiles (fetched conditionally?):
./textproc/convertlit/distinfo clit18src.zip
   2021-10-07 17:02:49 by Nia Alarie | Files touched by this commit (1162)
Log message:
textproc: Remove SHA1 hashes for distfiles