./textproc/xapian-omega, Search engine application for websites using Xapian

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 1.4.9, Package name: xapian-omega-1.4.9, Maintainer: schmonz

Omega operates on a set of databases. Each database is created and
updated separately using either omindex or scriptindex. You can
search these databases (or any other Xapian database with suitable
contents) via a web front-end provided by omega, a CGI application.
A search can also be done over more than one database at once.

Required to run:
[lang/perl5] [devel/pcre] [textproc/xapian]

Required to build:

Master sites:

SHA1: 99e64ae87a09eff57f143dc809913fa62f4bb44d
RMD160: 29436b8581bdb81aa251b074b058f46208a49463
Filesize: 509.863 KB

Version history: (Expand)

CVS history: (Expand)

   2018-11-05 06:42:59 by Amitai Schleier | Files touched by this commit (1) | Package updated
Log message:
Update to 1.4.9. From the changelog:


* omindex:

  + Try harder to avoid opening a file being indexed more than once by
    reusing the file descriptor in more cases.

  + Hint to the OS not to cache output from external filters which require
    using a temporary file.

* scriptindex:

  + If the LOAD action successfully opens a file but hits a read error the
    error message now reports the file name correctly.  Previously it would
    report the partial file contents read so far instead of the file name.


* We no longer call posix_fadvise() with POSIX_FADV_NOREUSE under Linux,
  since it's still not implemented there.  We also now only call
  posix_fadvise() with POSIX_FADV_DONTNEED right before we close the file
  descriptor under Linux.
   2018-10-28 04:44:06 by Amitai Schleier | Files touched by this commit (1) | Package updated
Log message:
Update to 1.4.8. From the changelog:


* omindex:

  + Improve date handling in .eml files.  We now handle a "Date:" header
    without the day of the week, which is allowed by RFC822 and RFC2822
    (though seems rare in practice).  If the date can't be parsed, we now
    just omit the date information rather than failing to process the file.

  + Add support for indexing Apple iWork documents (Keynote (.key), Numbers
    (.numbers) and Pages (.pages)) using libetonyek.  Currently only the file
    variants are handled since omindex doesn't currently support indexing a
    directory as a document.

  + Index Visio files using vsd2xhtml.

  + Extend --filter to support filters which produce SVG as output.

  + Handle SVG embedded in XML with svg: namespace prefix.

  + Add --read-filters option to read a list of filters from a file, each line
    of which is a rule as passed to --filter.  Based on a patch from Gaurav

  + Add new --mime-type-match option which allows specifying a MIME
    Content-Type for a given shell filename pattern pattern (with the special
    Content-Type values "ignore" and "skip" supported, as \ 
for --mime-type).

  + Adjust --mime-type to allow ':' in the extension.  A valid MIME
    Content-Type can't contain a colon, so if the argument to --mime-type
    contains more than one colon it makes more sense to split at the *last*
    colon (we used to split at the first), as an extension could conceivably
    contain a colon.  Mostly this change is for consistency with the new
    --mime-type-match option, where the leafname pattern could reasonably
    contain a colon.

  + Remove failed entries for ignored files.  If a file is mapped to
    pseudo-mimetype "ignore" then remove any existing failure record \ 
for it so
    that ignored files so we don't potentially end up with a lot of cruft
    failure records for files we are no longer trying to index.

  + If a file fails to index due to failing to allocate enough memory we now
    try to flag it as failed to index so it will be skipped by default on
    future runs.  This should help to avoid indexing getting stuck on
    problematic files.

  + Add a "pages" field with the number of pages in the document where we
    know how to determine this (currently only for PDF files for which pdfinfo
    reports this information).

  + Handle initially empty database exactly the same was as when --overwrite
    is specified.  This probably has no user-visible consequences, but it's
    cleaner for the handling to be exactly the same.

* scriptindex:

  + Improve scriptindex diagnostic messages.  All diagnostics are now labelled
    as "error", "warning" or "note" as \ 
appropriate, and we now consistently
    report "FILE:LINE:" (and also "COLUMN:" in most cases) \ 
to make it clearer
    where the problem lies.

  + Add new "split" action which splits the text on a specified \ 
delimiter and
    executes the following actions for each piece.  Based on a patch by Gaurav

  + Missing whitespace after the closing " on an action argument is now
    flagged as an error.  Previously scriptindex would attempt to parse
    the following characters as the next action.

  + Support C-like escapes for quoted parameter values.  Notably this means it
    is now possible to include `"` in quoted parameter values.


  + Value-based date range filters can now be specified via CGI parameters
    START.N, END.N and/or SPAN.N where N is a value slot number, allowing
    multiple concurrent filters on different slots to be specified.

  + Support YYYY and YYYYMM limits in term-based date ranges.  Previously
    value-based date ranges supported these as limits, but term-based date
    ranges gave an error.

  + Add stem_strategy option and deprecate existing stem_all option in favour
    of this new more versatile option.

  + Support "natural" $sort option via new flag "#" which \ 
sorts embedded
    natural numbers in numerical order.

  + Support numeric $sort option via new flag "n", similar to GNU sort -n.

  + Rewrite field parsing to be more efficient, and store fields in an
    unordered_map for faster lookup.
   2018-08-26 15:26:12 by Amitai Schleier | Files touched by this commit (2) | Package updated
Log message:
Update to 1.4.7. From the changelog:


* New OmegaScript $unique command.  The existing $uniq only removes adjacent
  entries (like the Unix uniq command) so to fully remove duplicates you need a
  sorted input.  Sometimes it is desirable to remove duplicates from an
  unsorted list without changing the order of the entries which are left, so
  add $unique to do that.  If the list is sorted already, then $uniq is more

* Fix $map to cleanly reject a single argument.


* templates/query: Merge multiple entries in the term frequency information,
  which came from searching several prefixes by default.  Reported by Alistair
  Buxton on #xapian-discuss.

* When multiple words with the same stem are in the query string we now fully
  eliminate duplicates when showing term frequency information.
   2018-08-22 11:48:07 by Thomas Klausner | Files touched by this commit (3558)
Log message:
Recursive bump for perl5-5.28.0
   2018-07-06 18:23:55 by Amitai Schleier | Files touched by this commit (1) | Package updated
Log message:
Update to 1.4.6. From the changelog:


* Fix generate_sample() (used by OmegaScript $truncate and omindex) to return
  an empty sample instead of throwing an exception when the requested sample
  size is less than the size of the truncation indicator string.  Patch from
  Addy.  Fixes https://trac.xapian.org/ticket/754 reported by Gaurav Arora.


* Check for the HTML5 doctype or legacy doctype declaration and use default
  charset UTF-8 if either is present.  Previously we always used ISO-8859-1,
  which is correct for older HTML versions, but not for HTML5.

* omindex:

  + When running commands without going through the shell, emulate shell exit
    codes 127 (for command not found) and 126 (for other cases where we fail to
    run the command).  This means the "missing filter" handling should \ 
now work
    properly for such commands.  Noted by Gaurav Arora.

  + Index POD files despite minor formatting errors.  We now pass
    --errors=stderr to pod2text so that minor formatting errors don't prevent
    us from indexing a file.  (It may seem that --errors=none is a better
    option, but for podlators < 4.11 that results in an ERRATA section in the
    generated text version which we then end up indexing; 4.11 fixed that but
    we can't assume that's in use).  Reported by Gaurav Arora.

* omindex:

  + Check file size before calling libmagic to get the mime type, since
    reading the file size is a much cheaper check and we can skip the
    libmagic test if the file is empty or larger than the specified
    maximum size.  Patch from caiyulun.

* scriptindex:

  + Avoid some unnecessary copying of Action objects by making use of C++11

  + Consistently send errors to stderr - some were sent to stdout.
    Patch from Gaurav Arora.

  + Add new "hextobin" action.  Based on a patch from Gaurav Arora.

  + Warn about non-integer arg to hash.

  + Fix hash action without an argument, which was failing with an assertion.
    Based on a patch by Gaurav Arora: https://github.com/xapian/xapian/pull/189

  + Reject 'hash' with argument < 6.  The hashing truncates and then adds a
    6 character hash of the removed part, so can't produce a result shorter
    than 6 characters.  Patch from Gaurav Arora.

  + Look for alphanumerics when parsing index actions.  None of the current
    index actions contain digits, but we give more helpful error messages this

  + Deprecate allowing spaces around = in scripts.  This was never documented
    as supported, and leads to a missing argument quietly swallowing the next
    action rather than using an empty value or giving an error.  Reported by
    Gaurav Arora in https://github.com/xapian/xapian/pull/182

  + In boolean and unique actions, add a colon between prefix and term when
    the term starts with a colon.  This means the mapping is reversible, and
    matches what omega actually does in this case when it tries to reverse the
    mapping.  Thanks to Andy Chilton for pointing out this corner case.

  + Add parsedate and valuepacked actions.  Together these assist adding date
    values for sorting and date range filtering.  Based on a patch from Gaurav

  + Use DB_RETRY_LOCK to wait if the database is already in use rather than
    sleeping for a second and retrying.  On most platforms this means we make a
    blocking request for the lock, and even on platforms where that's not
    supported, we now sleep and retry inside libxapian, and without having to
    throw and catch an exception each time.

* scriptindex:

  + Reject index scripts with multiple "unique" actions.  We don't \ 
handle this
    case sensibly, and it doesn't seem like it really has a use, so better to
    give an error for people who do this inadvertently.


* $freq: Speed up some cases by avoiding throwing and catching an exception
  when we know the MSet has no term frequency information.

* $sort: New OmegaScript command which does a string sort on an OmegaScript
  list, with u (unique) and r (reverse) options.

* $cond: New OmegaScript conditional multi-way conditional.  Inspired by LISP's
  COND, this provides a neater way to write a cascade of $if checks.

* $switch: New OmegaScript multi-way conditional which provides an even neater
  way to write a cascade of $if{$eq{X,VALUE1},$if{$eq{X,VALUE2},...}}.

* $subdb and $subid: New commands which report the subdatabase name and the
  docid in that subdatabase.

+ $termprefix and $unprefix: New OmegaScript commands which expose the existing
  code inside omega for splitting up a term.

* Use str() to convert time_t to string, which is simpler code and faster than
  using snprintf().

* New $seterror command to set the error message.  Implemented by Gaurav Arora.

* Make $highlight more efficient.  Patch from Vivek Pal.


* query: Use $prettyurl for the URL shown at the end of each match (previously
  we only used it on the URL shown as a fallback when the document has no
  title).  Split off from changes by Vivek Pal in
   2017-09-06 11:03:07 by Thomas Klausner | Files touched by this commit (86)
Log message:
Follow some redirects.
   2017-07-10 19:43:25 by Amitai Schleier | Files touched by this commit (1)
Log message:
Use xapian/Makefile.common.
   2017-07-10 00:31:23 by Amitai Schleier | Files touched by this commit (5)
Log message:
Normalize patch filenames. No functional change.