2024-12-06 16:32:24 by Amitai Schleier | Files touched by this commit (1) | |
Log message:
xapian-omega: update to 1.4.27. Changes:
omega:
* Calculate date spans in days rather than converting to time_t, which
side-steps issues due to 32-bit time_t and some implementations not
handling negative time_t values.
portability:
* Fix build with UCRT64 variant of mingw-w64 by stopping defining
__MSVCRT_VERSION__ by default. We fixed this for xapian-core in 1.4.24
but missed that omega defined it too.
* Remove unnecessary 'using namespace std' to fix build on (at least)
FreeBSD where's nothing in the std namespace to import at this point
so we get a compiler error.
|
2024-07-24 12:55:07 by Amitai Schleier | Files touched by this commit (1) | |
Log message:
xapian-omega: update to 1.4.26. Changes:
indexers:
* omindex:
+ Make robust to the indexer process being run with stdin or stdout closed.
omega:
* Support "bm25+" and "pl2+" in "$set{weighting,...}".
* Deprecate "lm" in "$set{weighting,...}". This was meant \
to implement the
"Language Model" Weighting scheme, but we've discovered the \
implementation
was incorrect and fixing it requires ABI-incompatible changes in xapian-core.
For 1.4.x we need to leave it in place so as not to break existing code, but
we recommended avoiding using it. It will be removed in the next release
series and replaced with new separate classes implementing Language Model
weighting with each smoothing.
* Add "prob" as new preferred name for probabilistic query expansion in
"$set{expansion,...}", with the previous "trad" still \
being accepted for now.
build system:
* Report result of probe to determine compiler support for -Werror or
equivalent.
* If pkg-config is available, use it to probe for libmagic.
* configure: Probe for closefrom(). Patch from Qiu Yingbo in
https://github.com/xapian/xapian/pull/323
portability:
* configure: Fix clang detection which wasn't working when configure determined
a -std=X option was needed to get C++11 support. The obvious symptom was
that --enable-werror wouldn't add -Werror.
* configure: NetBSD automatically pulls in library dependencies, so set
link_all_deplibs_CXX=no there.
* Define __WIN32__/__WIN64__ like we do for xapian-core. Spotted by Baran Demir.
* Avoid using sprintf() if snprintf() is available, even in cases where the
output size is bounded to avoid deprecation warnings on macOS. For 1.4.x
we still fall back to sprintf() to avoid a point release breaking support
for any platform still lacking snprintf().
* Use `override` for subclassing functors. This is good practice as it gives a
clear compile error if we have to change the signature of an virtual method
on such a functor. See #830.
* Fix building with MSVC - it seems to support AR=lib we need to use AM_PROG_AR
which probes for AR's command line interface.
|
2024-03-08 20:01:39 by Amitai Schleier | Files touched by this commit (1) | |
Log message:
xapian-omega: update to 1.4.25. Changes:
testsuite:
* omegatest.pl: Correct program name in error message.
build system:
* configure: DragonflyBSD automatically pulls in library dependencies, so set
link_all_deplibs_CXX=no there.
* configure: Avoid compiler warning during GCC version check when compiler
needs an option to enable C++11 support (same fix as applied to xapian-core
in 1.4.23).
|
2023-11-07 23:36:54 by Amitai Schleier | Files touched by this commit (3) | |
Log message:
xapian-omega: update to 1.4.24. Changes:
documentation:
* Document $filesize error handling.
indexers:
* omindex:
+ Implement piped input to filters for __WIN32__. Previously it looks like
the filter was run but the input wasn't connected to its stdin so it would
probably block indefinitely.
+ Fix corner case in shell emulation - we no longer set environment variables
which start with a digit.
This issue was spotted from reading the code - in practice this isn't a
case that's likely to be encountered, and the previous behaviour doesn't
appear to have any security consequences even if a user was somehow tricked
into specifying an extraction command did this.
* scriptindex:
+ Check if we can actually support %z in parsedate action. Previously we
assumed we could if struct tm had a tm_gmtoff member, but that's only a
necessary condition and not sufficient, e.g. on Cygwin we have tm_gmtoff
but strptime() doesn't currently understand %z.
+ If we were expecting an action but didn't get an identifier this triggered
an infinitely repeating error:
Unknown index action ''
Now we instead give a single error:
Expected index action, found '...'
where '...' shows the sequence of non-whitespace characters encountered.
testsuite:
* Run tests under eatmydata if available.
* Turn off MSYS2 argument conversion for tests as it breaks omegatest, and we
shouldn't need this conversion there.
* omegatest: Rewrite in Perl as we were hitting non-portable quoting issues
with the shell implementation, and really it had grown too large to make
sense as a shell script anyway.
build system:
* Add --enable-werror configure option.
* configure: Only auto-enable -D_FORTIFY_SOURCE=2 if it works without
additional libraries and remove the hard-coded block against using it
on mingw. Mingw-w64 v11.0.0 eliminated the requirement to link with -lssp
so we now auto-enable -D_FORTIFY_SOURCE=2 there.
portability:
* Fix to build on Cygwin.
* Rename our bswap32 helper function to avoid clash with system-provided
function on FreeBSD and NetBSD.
|
2023-07-10 17:08:30 by Amitai Schleier | Files touched by this commit (1) | |
Log message:
Update to 1.4.23. From the changelog:
documentation:
* Improve documentation for OmegaScript numerical and logical operators. Patch
from Vaibhav Kansagara.
* Improve documentation for DATEVALUE, xFILTERS and $filters.
indexers:
* omindex:
+ Handle XPS files with multiple FixedDocument parts better. Previously we
only extracted text from the first FixedDocument part.
+ Prefer latter subparts of multipart/alternative which is what RFC2046 (and
earlier RFCs which that obsoletes) say, but previously we used the first
subpart that we could get text from.
+ Prefer latter subparts of multipart/alternative when indexing Outlook
.msg files too.
+ Fix obscure bug in --mimetype option. We keep track of the length of the
longest extension we have a mapping for, but this was being updated using
the length of the MIME type rather than the length of the extension.
Theoretically this could have led to us effectively ignoring a --mimetype
option, but in the real world the MIME type will probably always be longer
so this just results in us testing long extensions unnecessarily.
omega:
* Ignore DATEVALUE CGI parameter if START.n, etc is specified on the same
slot. We explicitly document not to do this, but if that advice is ignored
it's more helpful to at least preserve the property that we only have
one date range per value slot.
* Add flag_ngrams as a preferred new alias for flag_cjk_ngram. In the next
release series this feature has been expanded to cover many more languages
so the "cjk" in the name has become inaccurate as it stands for
"Chinese, Japanese and Korean").
* Fix handling of Outlook .msg containing Unicode. Codepoints <= U+00FF appear
to have been handled correctly, but anything higher resulted in individual
bytes of the UTF-8 encoding being treated as separate characters.
Fixes https://github.com/xapian/xapian/pull/326, reported by uhuntu.
portability:
* Fix compatibility code for old libmagic versions. The code we were using
seems like it would never have worked. Nobody's reported this (it was
spotted while looking at the code) so we could just require libmagic >= 4.22,
but it's trivial to actually handle so we've fixed the fallback code.
* Remove lingering traces of IRIX support as it's been dead for many years.
|
2022-09-25 14:25:58 by Amitai Schleier | Files touched by this commit (1) |
Log message:
Update to 1.4.21. From the changelog:
documentation:
* Consistently say "macOS" not "Mac OS X", "OS X", etc.
indexers:
* omindex:
+ Add support for gzip-compressed SVG files (.svgz).
+ Handle <title> in SVG. Previously only <dc:title> inside \
<metadata> was
considered. If both are present, <title> now takes precedence.
testsuite:
* omegatest: Add skip-for-32-bit-time_t mechanism and use it to conditionally
enable some testcases which fail on platforms with 32-bit time_t.
build system:
* Update to use AX_CXX_COMPILE_STDCXX which is a replacement for
AX_CXX_COMPILE_STDCXX_11 (which we were using) which also supports newer C++
standards versions which will be useful. For C++11 the only difference seems
to be that the macro now checks for attribute support - we use C++11
attributes so that seems a good thing.
Updating during the freeze for the bug and portability fixes.
|
2022-07-29 17:21:42 by Amitai Schleier | Files touched by this commit (1) |
Log message:
Needs pkg-config to find pcre2 during configure.
|
2022-07-11 20:27:07 by Amitai Schleier | Files touched by this commit (2) |
Log message:
Update to 1.4.20. From the changelog:
indexers:
* omindex:
+ OpenDocument: Previously we only inserted an implicit space before each
paragraph. Now we insert them both before and after each paragraph and
heading, and before forced each line-break and tab.
+ Add extension mapping for .awt (Abiword templates).
+ Index metadata from XPS files.
+ -G and -C short options were documented in --help but not previously
actually handled. Reported by David Bremner.
+ Show --max-size required argument in --help output.
+ Remove lingering handling for database backends without slot bounds since
all backends have been required to support these since 1.4.11.
* scriptindex:
+ Process an incomplete final line from a dump file. Previously if the final
line lacked a newline scriptindex would quietly ignore it (unless it was
the only line).
+ The `unique` action now takes an optional `missing` parameter to specify
what to do if a record doesn't trigger the unique action or triggers it
with an empty value. The default is now to issue a warning and create a
new document (the same as before, except that there was only previously a
warning for the empty value case). In Omega 1.5.0 the default will change
to an error as that seems a better default, but is less compatible with
potential existing use.
+ Explicitly allow multiple blank lines in input files. Previously such
extra blank lines were treated as empty records and in many cases these
got quietly skipped, but e.g. with the new UNIQUE checks this could result
in a warning or error.
+ If we hit an error while parsing the index script we used to exit right
away, but now we finish parsing the index script since it's more helpful to
report all the errors in an index script rather than the user having to
fix them one by one. This requires us to sensibly recover after each index
script parse error - if you find a case where this recovery triggers
further bogus errors please report it and we'll try to improve the
recovery.
+ In four cases while handling input data (two cases of bad hex data fed
to `hextobin`, an input data line without a `=`, and `load` failing to
load the specified file) we'd emit a diagnostic that was labelled as an
"error" but really it was handled as a warning as we kept reading input
and the "error" didn't affect the exit status. It doesn't really make
sense to continue in any of these cases so we now exit with non-zero status
right away.
+ A parameter in the index script which should be an integer but isn't, or
should be positive but isn't now gives an error rather than a warning since
an error seems more helpful.
+ All diagnostics issued while parsing the index script now include column
information.
+ Avoid forcibly flushing the output stream after every message.
testsuite:
* Improve test coverage for scriptindex.
portability:
* Require PCRE2 instead of PCRE. The original PCRE is now EOL and unmaintained
(last release was June 2021). In omega it's potentially used to process
input from the internet, so security is a real concern hence we're switching
to PCRE2.
|
2022-06-28 13:38:00 by Thomas Klausner | Files touched by this commit (3952) |
Log message:
*: recursive bump for perl 5.36
|
2022-01-02 10:32:06 by Amitai Schleier | Files touched by this commit (2) |
Log message:
Update to 1.4.19. From the changelog:
documentation:
* configure: Add missing AC_ARG_VAR for all programs so that they are
documented in --help output, and so that autoconf knows they are \
"precious"
and preserves them if configure is rerun even when they're specified via an
environment variable.
* Add usage examples for $jsonobject.
* Fix path to omega in quickstart document. Fixes #813, reported by Jim Lynch.
* Update for the IRC channel move from freenode to libera.chat.
indexers:
* Fix handling of UTF-16 BOMs in XML and HTML - we had the sense of the
endianness indicated by the BOM the wrong way round.
* Avoid making an extra temporary copy of HTML/XML data which has a UTF16 BOM.
* We now ignore an end of line immediately after a PHP close tag to match what
PHP does.
* omindex:
+ Fix handling of formatted xlsx dates in certain cases.
* scriptindex:
+ Add new scriptindex whitespace removal actions `ltrim`, `rtrim`, `squash`,
and `trim`.
+ Improve `truncate` action - if a word ends exactly on the requested length
we now leave it in place rather than removing it.
+ Report the location of previous `unique` action in the error given when
`unique` is used more than once.
omega:
* Clamp START and END with packed timestamps. The 4-byte unsigned packed
time_t format can't represent dates before 1970 or after Sun 07 Feb 2106
06:28:15 UTC so clamp dates before or after these - previously they would
wrap around.
* The JSON produced by $jsonobject no longer contains newlines, which makes it
usable as a single line serialisation format without post-processing.
* Add $base64 OmegaScript command.
* omega: Add flag_no_positions to wrap new
Xapian::QueryParser::FLAG_NO_POSITIONS.
templates:
* Fix topterms template to not trigger early matching. We were checking $msize
before including the `query` template, but doing so would trigger the query
to be run, which means that settings early in the `query` template which
should affect the result (such as $setmap{prefix,...}) were being ignored
when the `topterms` template was used. Partly addresses #815, reported by
Gennadiy.
* Add field support to opensearch and xml templates. These templates now also
search title, topic and filename by default and support `title:`, `author:`
and `topic:` in the query string (both like the template `query` already
does). Fixes remaining issue in #815, reported by Gennadiy.
testsuite:
* Expand omegatest. All scriptindex actions now have test coverage.
build system:
* Replace uses of obsolete autoconf macros, fixing warnings if configure is
regenerated with a recent release of autoconf.
portability:
* Don't automatically use _FORTIFY_SOURCE on mingw-w64. Recent mingw-w64
versions require -lssp to be linked when _FORTIFY_SOURCE is enabled, so just
skip the automatic enabling. Users who want to enable it can specify it
explicitly.
Fixes #808, reported by xpbxf4.
* Automatically enable GCC warnings -Wduplicated-cond and -Wduplicated-branches
if using a GCC version new enough to support them. The usefulness of
-Wduplicated-cond was highlighted by dcb in #816.
* Fix GCC -Wshadow warning.
* Use clock_gettime() and nanosleep() under modern mingw as these allow higher
precision than what we previously used.
|