./textproc/xapian-omega, Search engine application for websites using Xapian

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 1.4.7, Package name: xapian-omega-1.4.7, Maintainer: schmonz

Omega operates on a set of databases. Each database is created and
updated separately using either omindex or scriptindex. You can
search these databases (or any other Xapian database with suitable
contents) via a web front-end provided by omega, a CGI application.
A search can also be done over more than one database at once.

Required to run:
[lang/perl5] [devel/pcre] [textproc/xapian]

Required to build:

Master sites:

SHA1: 12da93cbd19657922756b845bf523adc8ae4e923
RMD160: 36b65f362365949f37694ba744ae9dc0a833fc8f
Filesize: 498.676 KB

Version history: (Expand)

CVS history: (Expand)

   2018-08-26 15:26:12 by Amitai Schleier | Files touched by this commit (2) | Package updated
Log message:
Update to 1.4.7. From the changelog:


* New OmegaScript $unique command.  The existing $uniq only removes adjacent
  entries (like the Unix uniq command) so to fully remove duplicates you need a
  sorted input.  Sometimes it is desirable to remove duplicates from an
  unsorted list without changing the order of the entries which are left, so
  add $unique to do that.  If the list is sorted already, then $uniq is more

* Fix $map to cleanly reject a single argument.


* templates/query: Merge multiple entries in the term frequency information,
  which came from searching several prefixes by default.  Reported by Alistair
  Buxton on #xapian-discuss.

* When multiple words with the same stem are in the query string we now fully
  eliminate duplicates when showing term frequency information.
   2018-08-22 11:48:07 by Thomas Klausner | Files touched by this commit (3558)
Log message:
Recursive bump for perl5-5.28.0
   2018-07-06 18:23:55 by Amitai Schleier | Files touched by this commit (1) | Package updated
Log message:
Update to 1.4.6. From the changelog:


* Fix generate_sample() (used by OmegaScript $truncate and omindex) to return
  an empty sample instead of throwing an exception when the requested sample
  size is less than the size of the truncation indicator string.  Patch from
  Addy.  Fixes https://trac.xapian.org/ticket/754 reported by Gaurav Arora.


* Check for the HTML5 doctype or legacy doctype declaration and use default
  charset UTF-8 if either is present.  Previously we always used ISO-8859-1,
  which is correct for older HTML versions, but not for HTML5.

* omindex:

  + When running commands without going through the shell, emulate shell exit
    codes 127 (for command not found) and 126 (for other cases where we fail to
    run the command).  This means the "missing filter" handling should \ 
now work
    properly for such commands.  Noted by Gaurav Arora.

  + Index POD files despite minor formatting errors.  We now pass
    --errors=stderr to pod2text so that minor formatting errors don't prevent
    us from indexing a file.  (It may seem that --errors=none is a better
    option, but for podlators < 4.11 that results in an ERRATA section in the
    generated text version which we then end up indexing; 4.11 fixed that but
    we can't assume that's in use).  Reported by Gaurav Arora.

* omindex:

  + Check file size before calling libmagic to get the mime type, since
    reading the file size is a much cheaper check and we can skip the
    libmagic test if the file is empty or larger than the specified
    maximum size.  Patch from caiyulun.

* scriptindex:

  + Avoid some unnecessary copying of Action objects by making use of C++11

  + Consistently send errors to stderr - some were sent to stdout.
    Patch from Gaurav Arora.

  + Add new "hextobin" action.  Based on a patch from Gaurav Arora.

  + Warn about non-integer arg to hash.

  + Fix hash action without an argument, which was failing with an assertion.
    Based on a patch by Gaurav Arora: https://github.com/xapian/xapian/pull/189

  + Reject 'hash' with argument < 6.  The hashing truncates and then adds a
    6 character hash of the removed part, so can't produce a result shorter
    than 6 characters.  Patch from Gaurav Arora.

  + Look for alphanumerics when parsing index actions.  None of the current
    index actions contain digits, but we give more helpful error messages this

  + Deprecate allowing spaces around = in scripts.  This was never documented
    as supported, and leads to a missing argument quietly swallowing the next
    action rather than using an empty value or giving an error.  Reported by
    Gaurav Arora in https://github.com/xapian/xapian/pull/182

  + In boolean and unique actions, add a colon between prefix and term when
    the term starts with a colon.  This means the mapping is reversible, and
    matches what omega actually does in this case when it tries to reverse the
    mapping.  Thanks to Andy Chilton for pointing out this corner case.

  + Add parsedate and valuepacked actions.  Together these assist adding date
    values for sorting and date range filtering.  Based on a patch from Gaurav

  + Use DB_RETRY_LOCK to wait if the database is already in use rather than
    sleeping for a second and retrying.  On most platforms this means we make a
    blocking request for the lock, and even on platforms where that's not
    supported, we now sleep and retry inside libxapian, and without having to
    throw and catch an exception each time.

* scriptindex:

  + Reject index scripts with multiple "unique" actions.  We don't \ 
handle this
    case sensibly, and it doesn't seem like it really has a use, so better to
    give an error for people who do this inadvertently.


* $freq: Speed up some cases by avoiding throwing and catching an exception
  when we know the MSet has no term frequency information.

* $sort: New OmegaScript command which does a string sort on an OmegaScript
  list, with u (unique) and r (reverse) options.

* $cond: New OmegaScript conditional multi-way conditional.  Inspired by LISP's
  COND, this provides a neater way to write a cascade of $if checks.

* $switch: New OmegaScript multi-way conditional which provides an even neater
  way to write a cascade of $if{$eq{X,VALUE1},$if{$eq{X,VALUE2},...}}.

* $subdb and $subid: New commands which report the subdatabase name and the
  docid in that subdatabase.

+ $termprefix and $unprefix: New OmegaScript commands which expose the existing
  code inside omega for splitting up a term.

* Use str() to convert time_t to string, which is simpler code and faster than
  using snprintf().

* New $seterror command to set the error message.  Implemented by Gaurav Arora.

* Make $highlight more efficient.  Patch from Vivek Pal.


* query: Use $prettyurl for the URL shown at the end of each match (previously
  we only used it on the URL shown as a fallback when the document has no
  title).  Split off from changes by Vivek Pal in
   2017-09-06 11:03:07 by Thomas Klausner | Files touched by this commit (86)
Log message:
Follow some redirects.
   2017-07-10 19:43:25 by Amitai Schleier | Files touched by this commit (1)
Log message:
Use xapian/Makefile.common.
   2017-07-10 00:31:23 by Amitai Schleier | Files touched by this commit (5)
Log message:
Normalize patch filenames. No functional change.
   2017-07-10 00:27:47 by Amitai Schleier | Files touched by this commit (4) | Package updated
Log message:
Update to 1.4.4. From the changelog:


* omindex:

  + 1.4.3 added a new --sample option, but contrary to the documentation
    the default behaviour was to take the sample from the meta description
    (which was the hard-wired behaviour in 1.4.2 and earlier).  The default
    has now been changed to take the sample from the body.

  + Index .shtm, .xhtml and .xhtm as HTML by default - .shtm is another
    extension used for server-parsed HTML (in addition to the more common
    .shtml), and .xhtm and .xhtml are XHTML.

  + Fix fallback lookup for extension containing upper case.  User mappings
    worked, but built-in extension to MIME type mappings were effectively being
    ignored (because the result of the function call was not being checked).
    Bug introduced in 1.3.4.

  + Fix term-based date ranges, broken by changes in 1.4.2.  Found and
    diagnosed by Gaurav Arora.

  + Handle date range with start after end better - with term-based ranges,
    this used to generate a bogus filter, but now just generates Dlatest.

  + Use Y-term when range starts/ends at year start/end.  Previously we used 12
    M-terms for these cases.

  + Use full leap-year check when constructing term-based date ranges -
    previous code was good until 2100, but even then it would only result
    in an extra term being included for a non-existent February 29th in
    rare cases.

  + Add support for indexing vCard files if Perl and its Text::vCard module
    are available.

  + Recognise application/x-rpm as alternative type since libmagic reports this
    rather than application/x-redhat-package-manager.

  + Use official MIME type application/vnd.debian.binary-package for debian
    packages.  We used to map .deb and .udeb to application/x-debian-package,
    but in 2014 (after we added that support for .deb) an official type was
    registered with IANA.  We now map extensions .deb and .udeb to the official
    type, but the unofficial type is still recognised (older versions of
    libmagic probably report it, and users may be mapping to it).

  + Handle PHP as MIME type text/x-php.  The main difference this makes is that
    PHP files which don't have extension '.php' (e.g. .phtml, .phps, .php5,
    .ph4, etc) get identified by libmagic as text/x-php and will now be indexed.
    It also means that the user can now more easily configure different filters
    for HTML and PHP.

  + Don't use meta description as sample by default.  Now we have dynamic
    snippets (via $snippet), the body text is a better default.  Also generated
    HTML sometimes has unhelpful content in the meta description.  To get the
    previous behaviour, use the new omindex command line option:


* New OmegaScript command $cgiparams which returns a list of the parameter

* Handle tab in a CGI parameter name in the same way as space.  Mostly this is
  a way to avoid having tabs in CGI parameter names - they aren't useful, but
  if they could have tabs in we can't put CGI parameter names in a list.


* query: Fix highlighting of matching terms.  We were using both $snippet and
  $highlight, which results in double highlighting and HTML escaping, most
  noticeable by literal <strong> and </strong> appearing around \ 
matching terms
  in the rendered HTML snippet.  Reported by Mark Thomas on xapian-discuss.

build system:

* If gen-mimemap failed after creating mimemap.h, the rule wouldn't get rerun.
   2017-05-08 14:02:06 by Amitai Schleier | Files touched by this commit (1)
Log message:
Needs C++11.