Path to this page:
Subject: CVS commit: pkgsrc/textproc/xapian-omega
From: Amitai Schlair
Date: 2013-06-04 23:28:26
Message id: 20130604212826.DBAD596@cvs.netbsd.org
Log Message:
Update to 1.2.15. From the changelog:
Omega 1.2.15 (2013-04-16):
omega:
* Don't pointlessly link utf8convert.o into the omega CGI.
Omega 1.2.14 (2013-03-14):
indexers:
* omindex:
+ Correct "max" -> "min" when reserving space for \
shared strings in .xlsx
files. This just means we now reserve a more appropriate amount of space
to start with.
+ Ignore .com files by default.
Omega 1.2.13 (2013-01-09):
indexers:
* omindex:
+ Extracting text using external filters now works for filenames containing a
newline character - previously the newline got lost during escaping for the
shell.
+ Fix segfault when -F option without a ':' is passed.
+ Skip a file if we get a read error while calculating the MD5 checksum (used
for duplicate detection) - previously we used a checksum of the file up to
that point.
+ Avoid rereading SVG and Atom files when we calculate their MD5 checksums.
+ Improvement --help output and man page, most notably:
- Say explicitly that --sample-size accepts the same formats as --max-size.
- Note default size limit on files to index is unlimited.
+ When generating a sample for a CSV file, limit the size we pre-allocate to
the CSV file size if that's smaller than the requested sample size, in case
the user sets that limit very high.
omega:
* Fix to decode %-encoded character at the end of the query string.
Omega 1.2.12 (2012-06-27):
No changes since 1.2.11 except to bump the version - this release was made to
fix an incorrect library version information update in xapian-core 1.2.11.
Omega 1.2.11 (2012-06-26):
indexers:
* Change HTML parser's handling of multiple <body> tags and of text outside of
<body> to match the behaviour of modern web browsers. (ticket#599)
* omindex:
+ Add command line option to control the size of the document sample stored.
Patch from Mihai Bivol.
+ Rework .xlsx parsing to substitute the shared strings into the positions
they are used in, so that the sample actually matches what appears in the
spreadsheet, and to index calculated cell contents.
+ Improve handling of headers and footers in OpenDocument documents.
+ pdftotext outputs a formfeed between each page, which messes up our "empty
body" check, so trim any trailing formfeeds before this check.
Omega 1.2.10 (2012-05-09):
indexers:
* Add support for CDATA to HTML/XML parser.
* omindex:
+ Add --max-size option, based on patch from ndaley in ticket#587.
+ Add support for atom feed files, patch from Mihai Bivol in ticket#595.
+ If the document with the highest existing docid before the run was updated,
we were reporting it as "added", but now we correctly report it as
"updated". (Backported from 1.3.0).
+ Catch and report std::exception explicitly, so failing to allocate memory
is no longer reported as "Unknown exception". (Backported from 1.3.0).
Omega 1.2.9 (2012-03-08):
documentation:
* docs/overview.html:
+ Document that libmagic is used to determine the MIME type if the extension
isn't known. Partly addresses ticket#569.
+ We now limit time as well as CPU and memory for external filters.
indexers:
* Our HTML parser now ignores sections bracketed by <!--UdmComment--> and
<!--/UdmComment-->, like we already do for <!--htdig_noindex-->.
* omindex: Add more extensions to the default ignore list: bin dat db fon jar
lnk pyc pyd pyo sqlite sqlite3 sqlite-journal tmp ttf
Files: