pkgsrc.se | The NetBSD package collection

Subject: CVS commit: pkgsrc/textproc/xapian-omega
From: Amitai Schleier
Date: 2018-10-28 04:44:06
Message id: 20181028034406.6D088FBEE@cvs.NetBSD.org
Log Message:
Update to 1.4.8. From the changelog:

indexers:

* omindex:

  + Improve date handling in .eml files.  We now handle a "Date:" header
    without the day of the week, which is allowed by RFC822 and RFC2822
    (though seems rare in practice).  If the date can't be parsed, we now
    just omit the date information rather than failing to process the file.

  + Add support for indexing Apple iWork documents (Keynote (.key), Numbers
    (.numbers) and Pages (.pages)) using libetonyek.  Currently only the file
    variants are handled since omindex doesn't currently support indexing a
    directory as a document.

  + Index Visio files using vsd2xhtml.

  + Extend --filter to support filters which produce SVG as output.

  + Handle SVG embedded in XML with svg: namespace prefix.

  + Add --read-filters option to read a list of filters from a file, each line
    of which is a rule as passed to --filter.  Based on a patch from Gaurav
    Arora.

  + Add new --mime-type-match option which allows specifying a MIME
    Content-Type for a given shell filename pattern pattern (with the special
    Content-Type values "ignore" and "skip" supported, as \ 
for --mime-type).

  + Adjust --mime-type to allow ':' in the extension.  A valid MIME
    Content-Type can't contain a colon, so if the argument to --mime-type
    contains more than one colon it makes more sense to split at the *last*
    colon (we used to split at the first), as an extension could conceivably
    contain a colon.  Mostly this change is for consistency with the new
    --mime-type-match option, where the leafname pattern could reasonably
    contain a colon.

  + Remove failed entries for ignored files.  If a file is mapped to
    pseudo-mimetype "ignore" then remove any existing failure record \ 
for it so
    that ignored files so we don't potentially end up with a lot of cruft
    failure records for files we are no longer trying to index.

  + If a file fails to index due to failing to allocate enough memory we now
    try to flag it as failed to index so it will be skipped by default on
    future runs.  This should help to avoid indexing getting stuck on
    problematic files.

  + Add a "pages" field with the number of pages in the document where we
    know how to determine this (currently only for PDF files for which pdfinfo
    reports this information).

  + Handle initially empty database exactly the same was as when --overwrite
    is specified.  This probably has no user-visible consequences, but it's
    cleaner for the handling to be exactly the same.

* scriptindex:

  + Improve scriptindex diagnostic messages.  All diagnostics are now labelled
    as "error", "warning" or "note" as \ 
appropriate, and we now consistently
    report "FILE:LINE:" (and also "COLUMN:" in most cases) \ 
to make it clearer
    where the problem lies.

  + Add new "split" action which splits the text on a specified \ 
delimiter and
    executes the following actions for each piece.  Based on a patch by Gaurav
    Arora.

  + Missing whitespace after the closing " on an action argument is now
    flagged as an error.  Previously scriptindex would attempt to parse
    the following characters as the next action.

  + Support C-like escapes for quoted parameter values.  Notably this means it
    is now possible to include `"` in quoted parameter values.

omega:

  + Value-based date range filters can now be specified via CGI parameters
    START.N, END.N and/or SPAN.N where N is a value slot number, allowing
    multiple concurrent filters on different slots to be specified.

  + Support YYYY and YYYYMM limits in term-based date ranges.  Previously
    value-based date ranges supported these as limits, but term-based date
    ranges gave an error.

  + Add stem_strategy option and deprecate existing stem_all option in favour
    of this new more versatile option.

  + Support "natural" $sort option via new flag "#" which \ 
sorts embedded
    natural numbers in numerical order.

  + Support numeric $sort option via new flag "n", similar to GNU sort -n.

  + Rewrite field parsing to be more efficient, and store fields in an
    unordered_map for faster lookup.
Files:
Revision	Action	file
1.25	modify	pkgsrc/textproc/xapian-omega/distinfo