Omega 1.4.27 (2024-12-06):
omega:
* Calculate date spans in days rather than converting to time_t, which
side-steps issues due to 32-bit time_t and some implementations not handling
negative time_t values.
portability:
* Fix build with UCRT64 variant of mingw-w64 by stopping defining
__MSVCRT_VERSION__ by default. We fixed this for xapian-core in 1.4.24 but
missed that omega defined it too.
* Remove unnecessary 'using namespace std' to fix build on (at least) FreeBSD
where's nothing in the std namespace to import at this point so we get a
compiler error.
Omega 1.4.26 (2024-07-18):
indexers:
* omindex:
+ Make robust to the indexer process being run with stdin or stdout closed.
omega:
* Support "bm25+" and "pl2+" in "$set{weighting,...}".
* Deprecate "lm" in "$set{weighting,...}". This was meant to implement the
"Language Model" Weighting scheme, but we've discovered the implementation
was incorrect and fixing it requires ABI-incompatible changes in xapian-core.
For 1.4.x we need to leave it in place so as not to break existing code, but
we recommended avoiding using it. It will be removed in the next release
series and replaced with new separate classes implementing Language Model
weighting with each smoothing.
* Add "prob" as new preferred name for probabilistic query expansion in
"$set{expansion,...}", with the previous "trad" still being accepted for now.
build system:
* Report result of probe to determine compiler support for -Werror or
equivalent.
* If pkg-config is available, use it to probe for libmagic.
* configure: Probe for closefrom(). Patch from Qiu Yingbo in
https://github.com/xapian/xapian/pull/323
portability:
* configure: Fix clang detection which wasn't working when configure determined
a -std=X option was needed to get C++11 support. The obvious symptom was
that --enable-werror wouldn't add -Werror.
* configure: NetBSD automatically pulls in library dependencies, so set
link_all_deplibs_CXX=no there.
* Define __WIN32__/__WIN64__ like we do for xapian-core. Spotted by Baran Demir.
* Avoid using sprintf() if snprintf() is available, even in cases where the
output size is bounded to avoid deprecation warnings on macOS. For 1.4.x
we still fall back to sprintf() to avoid a point release breaking support
for any platform still lacking snprintf().
* Use `override` for subclassing functors. This is good practice as it gives a
clear compile error if we have to change the signature of an virtual method
on such a functor. See #830.
* Fix building with MSVC - it seems to support AR=lib we need to use AM_PROG_AR
which probes for AR's command line interface.
Omega 1.4.25 (2024-03-08):
testsuite:
* omegatest.pl: Correct program name in error message.
build system:
* configure: DragonflyBSD automatically pulls in library dependencies, so set
link_all_deplibs_CXX=no there.
* configure: Avoid compiler warning during GCC version check when compiler
needs an option to enable C++11 support (same fix as applied to xapian-core
in 1.4.23).
Omega 1.4.24 (2023-11-06):
documentation:
* Document $filesize error handling.
indexers:
* omindex:
+ Implement piped input to filters for __WIN32__. Previously it looks like
the filter was run but the input wasn't connected to its stdin so it would
probably block indefinitely.
+ Fix corner case in shell emulation - we no longer set environment variables
which start with a digit.
This issue was spotted from reading the code - in practice this isn't a
case that's likely to be encountered, and the previous behaviour doesn't
appear to have any security consequences even if a user was somehow tricked
into specifying an extraction command that did this.
* scriptindex:
+ Check if we can actually support %z in parsedate action. Previously we
assumed we could if struct tm had a tm_gmtoff member, but that's only a
necessary condition and not sufficient, e.g. on Cygwin we have tm_gmtoff
but strptime() doesn't currently understand %z.
+ If we were expecting an action but didn't get an identifier this triggered
an infinitely repeating error:
Unknown index action ''
Now we instead give a single error:
Expected index action, found '...'
where '...' shows the sequence of non-whitespace characters encountered.
testsuite:
* Run tests under eatmydata if available.
* Turn off MSYS2 argument conversion for tests as it breaks omegatest, and we
shouldn't need this conversion there.
* omegatest: Rewrite in Perl as we were hitting non-portable quoting issues
with the shell implementation, and really it had grown too large to make
sense as a shell script anyway.
build system:
* Add --enable-werror configure option.
* configure: Only auto-enable -D_FORTIFY_SOURCE=2 if it works without
additional libraries and remove the hard-coded block against using it
on mingw. Mingw-w64 v11.0.0 eliminated the requirement to link with -lssp
so we now auto-enable -D_FORTIFY_SOURCE=2 there.
portability:
* Fix to build on Cygwin.
* Rename our bswap32 helper function to avoid clash with system-provided
function on FreeBSD and NetBSD.
Omega 1.4.23 (2023-07-07):
documentation:
* Improve documentation for OmegaScript numerical and logical operators. Patch
from Vaibhav Kansagara.
* Improve documentation for DATEVALUE, xFILTERS and $filters.
indexers:
* omindex:
+ Handle XPS files with multiple FixedDocument parts better. Previously we
only extracted text from the first FixedDocument part.
+ Prefer latter subparts of multipart/alternative which is what RFC2046 (and
earlier RFCs which that obsoletes) say, but previously we used the first
subpart that we could get text from.
+ Prefer latter subparts of multipart/alternative when indexing Outlook
.msg files too.
+ Fix obscure bug in --mimetype option. We keep track of the length of the
longest extension we have a mapping for, but this was being updated using
the length of the MIME type rather than the length of the extension.
Theoretically this could have led to us effectively ignoring a --mimetype
option, but in the real world the MIME type will probably always be longer
so this just results in us testing long extensions unnecessarily.
omega:
* Ignore DATEVALUE CGI parameter if START.n, etc is specified on the same
slot. We explicitly document not to do this, but if that advice is ignored
it's more helpful to at least preserve the property that we only have
one date range per value slot.
* Add flag_ngrams as a preferred new alias for flag_cjk_ngram. In the next
release series this feature has been expanded to cover many more languages
so the "cjk" in the name has become inaccurate as it stands for
"Chinese, Japanese and Korean".
* Fix handling of Outlook .msg containing Unicode. Codepoints <= U+00FF appear
to have been handled correctly, but anything higher resulted in individual
bytes of the UTF-8 encoding being treated as separate characters.
Fixes https://github.com/xapian/xapian/pull/326, reported by uhuntu.
portability:
* Fix compatibility code for old libmagic versions. The code we were using
seems like it would never have worked. Nobody's reported this (it was
spotted while looking at the code) so we could just require libmagic >= 4.22,
but it's trivial to actually handle so we've fixed the fallback code.
* Remove lingering traces of IRIX support as it's been dead for many years.
Omega 1.4.22 (2023-02-02):
documentation:
* Improve term prefix documentation.
indexers:
* omindex:
+ Add --date-terms and --no-date-terms options.
+ Extract page/sheet count for OpenDocument text documents and spreadsheets.
+ Extract created date and keywords for MS XML formats.
* scriptindex:
+ Fix handling of an unterminated final line in input file.
omega:
* Add OmegaScript commands to report value slot bounds.
* Add OmegaScript $sortableunserialise{} command.
Omega 1.4.21 (2022-09-22):
documentation:
* Consistently say "macOS" not "Mac OS X", "OS X", etc.
indexers:
* omindex:
+ Add support for gzip-compressed SVG files (.svgz).
+ Handle
in SVG. Previously only inside was
considered. If both are present, now takes precedence.
testsuite:
* omegatest: Add skip-for-32-bit-time_t mechanism and use it to conditionally
enable some testcases which fail on platforms with 32-bit time_t.
build system:
* Update to use AX_CXX_COMPILE_STDCXX which is a replacement for
AX_CXX_COMPILE_STDCXX_11 (which we were using) which also supports newer C++
standards versions which will be useful. For C++11 the only difference seems
to be that the macro now checks for attribute support - we use C++11
attributes so that seems a good thing.
Omega 1.4.20 (2022-07-04):
indexers:
* omindex:
+ OpenDocument: Previously we only inserted an implicit space before each
paragraph. Now we insert them both before and after each paragraph and
heading, and before forced each line-break and tab.
+ Add extension mapping for .awt (Abiword templates).
+ Index metadata from XPS files.
+ -G and -C short options were documented in --help but not previously
actually handled. Reported by David Bremner.
+ Show --max-size required argument in --help output.
+ Remove lingering handling for database backends without slot bounds since
all backends have been required to support these since 1.4.11.
* scriptindex:
+ Process an incomplete final line from a dump file. Previously if the final
line lacked a newline scriptindex would quietly ignore it (unless it was
the only line).
+ The `unique` action now takes an optional `missing` parameter to specify
what to do if a record doesn't trigger the unique action or triggers it
with an empty value. The default is now to issue a warning and create a
new document (the same as before, except that there was only previously a
warning for the empty value case). In Omega 1.5.0 the default will change
to an error as that seems a better default, but is less compatible with
potential existing use.
+ Explicitly allow multiple blank lines in input files. Previously such
extra blank lines were treated as empty records and in many cases these
got quietly skipped, but e.g. with the new UNIQUE checks this could result
in a warning or error.
+ If we hit an error while parsing the index script we used to exit right
away, but now we finish parsing the index script since it's more helpful to
report all the errors in an index script rather than the user having to
fix them one by one. This requires us to sensibly recover after each index
script parse error - if you find a case where this recovery triggers
further bogus errors please report it and we'll try to improve the
recovery.
+ In four cases while handling input data (two cases of bad hex data fed
to `hextobin`, an input data line without a `=`, and `load` failing to
load the specified file) we'd emit a diagnostic that was labelled as an
"error" but really it was handled as a warning as we kept reading input
and the "error" didn't affect the exit status. It doesn't really make
sense to continue in any of these cases so we now exit with non-zero status
right away.
+ A parameter in the index script which should be an integer but isn't, or
should be positive but isn't now gives an error rather than a warning since
an error seems more helpful.
+ All diagnostics issued while parsing the index script now include column
information.
+ Avoid forcibly flushing the output stream after every message.
testsuite:
* Improve test coverage for scriptindex.
portability:
* Require PCRE2 instead of PCRE. The original PCRE is now EOL and unmaintained
(last release was June 2021). In omega it's potentially used to process
input from the internet, so security is a real concern hence we're switching
to PCRE2.
Omega 1.4.19 (2021-12-31):
documentation:
* configure: Add missing AC_ARG_VAR for all programs so that they are
documented in --help output, and so that autoconf knows they are "precious"
and preserves them if configure is rerun even when they're specified via an
environment variable.
* Add usage examples for $jsonobject.
* Fix path to omega in quickstart document. Fixes #813, reported by Jim Lynch.
* Update for the IRC channel move from freenode to libera.chat.
indexers:
* Fix handling of UTF-16 BOMs in XML and HTML - we had the sense of the
endianness indicated by the BOM the wrong way round.
* Avoid making an extra temporary copy of HTML/XML data which has a UTF16 BOM.
* We now ignore an end of line immediately after a PHP close tag to match what
PHP does.
* omindex:
+ Fix handling of formatted xlsx dates in certain cases.
* scriptindex:
+ Add new scriptindex whitespace removal actions `ltrim`, `rtrim`, `squash`,
and `trim`.
+ Improve `truncate` action - if a word ends exactly on the requested length
we now leave it in place rather than removing it.
+ Report the location of previous `unique` action in the error given when
`unique` is used more than once.
omega:
* Clamp START and END with packed timestamps. The 4-byte unsigned packed
time_t format can't represent dates before 1970 or after Sun 07 Feb 2106
06:28:15 UTC so clamp dates before or after these - previously they would
wrap around.
* The JSON produced by $jsonobject no longer contains newlines, which makes it
usable as a single line serialisation format without post-processing.
* Add $base64 OmegaScript command.
* omega: Add flag_no_positions to wrap new
Xapian::QueryParser::FLAG_NO_POSITIONS.
templates:
* Fix topterms template to not trigger early matching. We were checking $msize
before including the `query` template, but doing so would trigger the query
to be run, which means that settings early in the `query` template which
should affect the result (such as $setmap{prefix,...}) were being ignored
when the `topterms` template was used. Partly addresses #815, reported by
Gennadiy.
* Add field support to opensearch and xml templates. These templates now also
search title, topic and filename by default and support `title:`, `author:`
and `topic:` in the query string (both like the template `query` already
does). Fixes remaining issue in #815, reported by Gennadiy.
testsuite:
* Expand omegatest. All scriptindex actions now have test coverage.
build system:
* Replace uses of obsolete autoconf macros, fixing warnings if configure is
regenerated with a recent release of autoconf.
portability:
* Don't automatically use _FORTIFY_SOURCE on mingw-w64. Recent mingw-w64
versions require -lssp to be linked when _FORTIFY_SOURCE is enabled, so just
skip the automatic enabling. Users who want to enable it can specify it
explicitly.
Fixes #808, reported by xpbxf4.
* Automatically enable GCC warnings -Wduplicated-cond and -Wduplicated-branches
if using a GCC version new enough to support them. The usefulness of
-Wduplicated-cond was highlighted by dcb in #816.
* Fix GCC -Wshadow warning.
* Use clock_gettime() and nanosleep() under modern mingw as these allow higher
precision than what we previously used.
Omega 1.4.18 (2021-01-14):
indexers:
* omindex:
+ Add default MIME mapping for application/rtf. IANA have registrations for
text/rtf and (more recently) application/rtf (it seems because newer
versions of the RTF format can contain 8-bit data) so we now recognise
application/rtf by default and handle it the same way as text/rtf.
Current libmagic seems to always return text/rtf (no matches for
application/rtf in magic.mgc) and we continue to map extension rtf to
text/rtf, so this change is mainly future-proofing against libmagic future
changes.
+ Add support for indexing OpenXPS, which is effectively the same as XPS
internally in ways we care about, but it uses a different mimetype and a
different filename extension.
omega:
* Explicitly use OR for MORELIKE queries.
Since 1.3.0 the default value of DEFAULTOP has been AND, which typically
makes MORELIKE queries much less useful since they'll only match documents
containing all the terms from the query expansion. We now explicitly insert
" OR " between the terms if DEFAULTOP hasn't been set to OR, which makes them
work much more like they did in 1.2.x.
* Make $stoplist and $unstem consider all query strings by always passing the
new Xapian::QueryParser::FLAG_ACCUMULATE flag.
* Add $foreach command which works like $map, but just concatenates the
evaluated results rather than adding tabs to turn them into an OmegaScript
list.
* Extend $include{} to allow handling failure to open the specified file via an
optional second argument which if specified will be evaluated and returned
instead. Patch from Gaurav Arora.
* Support multiple MORELIKE parameters - we now form an RSet from all the
specified documents and use that to generate the query to run (previously
only one of multiple MORELIKE parameters was used).
Omega 1.4.17 (2020-08-21):
documentation:
* Document comment format supported by scriptindex index scripts. We've
supported comments on a line by themselves and introduced with a # since
scriptindex was first added back in 2002, but it seems have never actually
been documented before now.
omega:
* Check for SERVER_PROTOCOL=INCLUDED before anything which might throw an
exception so that if it is set we suppress the Content-Type: when reporting
such exceptions. Spotted by Gaurav Arora.
* Report get_description() for Xapian::Error exceptions instead of get_msg().
This means we now report the exception's type, context (useful for network
errors), and errno information.
* Avoid leaking MyStopper object. The object essentially has the lifespan of
omega itself, but becomes unreachable when the QueryParser object is
destroyed. To make it easier to use leak-checking tools, hand ownership of
this object to the QueryParser object.
testsuite:
* omegatest: Tell leak sanitizer not to report leaks for allocations which
aren't explicitly released on exit - the OS will reclaim all memory from the
process at this point and explicitly releasing everything just takes time for
no real benefit. We will still see leaks of objects which become unreachable
during a run.
Omega 1.4.16 (2020-06-08):
indexers:
* Fix handling of XML empty tag syntax when there's a quoted parameter right
before the closing `/>`. This caused `` to treat
the body text as the document title. Spotted by Gaurav Arora.
* omindex: Fix killing of filter child process if the parent process receives a
signal. Spotted by Gaurav Arora.
omega:
* Reject $setrelevant without an argument list. This has never been documented
as allowed, and previously crashed with a segfault. Fixes #802, reported by
Gaurav Arora.
* If there's an error opening the databases we now close any we managed to open
successfully before the error so that things like $dbsize can't end up
reporting values for a subset of the specified databases.
portability:
* Use our own autoconf cache variable namespace (xo_cv_ prefix instead of
ac_cv_) to avoid colliding with standard autoconf macro use if config.site or
a shared config.cache is used. The former case caused a build failure for
the OpenBSD port with 1.4.15, reported by Lucas R.
Omega 1.4.15 (2020-02-24):
documentation:
* Update documentation about how to add a new format to omindex. Patch from
Bruno Baruffaldi.
indexers:
* Check for a BOM on HTML files, which for HTML5 should determine the encoding.
omega:
* Allow $if{COND} without any actions which is useful as a way to evaluate
something but ignore the result if you just want the side effects. Indeed
we were already recommending to use it if you want to ignore the return value
of $log. Fixes bug introduced in 1.4.14, reported by tuftedocelot.
* Add OmegaScript support for $jsonbool{COND} for encoding a boolean value for
use in JSON. This is equivalent to $if{COND,true,false} but more readable.
* Add OmegaScript support for $jsonobject{} which allows producing a JSON
object from an OmegaScript map.
* Allow specifying a format to $jsonarray{} so it is no longer restricted to
producing an array of strings.
* Add $keys{MAP} OmegaScript command which gives a sorted list of the keys from
an OmegaScript map.
portability:
* Simplify probes for snprintf. The broken snprintf in libbsd in Linux libc4
is from ~25 years ago so way too ancient to matter now, and all callers
already handle the pre-ISO semantics of returning -1 for an undersize buffer
so we don't need to run a test program to probe for this at configure time,
which is more cross-compile friendly.
* Avoid deprecation warning on recent Linux. We were including sys/sysctl.h if
it existed, which it does on Linux but we don't actually use it there.
Including it now warns that it is deprecated, so skip including it under
Linux. Reported on IRC by kumaran.
Omega 1.4.14 (2019-11-23):
documentation:
* Improve omindex --help docs for --duplicates.
indexers:
* Add built-in support for iso-8859-15 so we can handle it without iconv.
This charset is a variant of iso-8859-1 with 8 characters changed, most
notably including the euro currency symbol. It's the most commonly seen
charset we didn't have built-in support for.
omega:
* Fix error handling in $lookup. We now check for errors from cdb_init()
and cdb_get(). We've never checked for errors from cdb_init(), while
for cdb_get() this bug was introduced by a warning fix in 1.2.20.
Omega 1.4.13 (2019-10-14):
documentation:
* Document that $log will start to return an error message in 1.5.0, and that
one can wrap it using a $if with no action now to be future-proof.
indexers:
* Optimise converting us-ascii to UTF-8 to do nothing, like we already do when
converting UTF-8 to UTF-8.
* scriptindex:
+ Add new 'gap' action which provides a way to leave a gap in the term
positions between fields to prevent phrases and positional operators from
matching across fields.
templates:
* Future-proof use of $log against changes in 1.5.0.
Omega 1.4.12 (2019-07-23):
documentation:
* Improve docs for OmegaScript $hitlist{}.
* Fix RST formatting errors in omega docs.
* Clarify use of Q prefix for unique ID terms - it was described as "reserved",
but the use of "Q" is really just a convention (and in fact omindex uses "U"
not "Q").
* Clarify scriptindex's weight action takes parameter >= 0.
* Correct typo in OmegaScript $add parameter documentation.
indexers:
* omindex:
+ Fix typo in mimetypes used for Apple iWork documents ("apply" instead of
"apple") which meant that these documents weren't actually being indexed.
Patch from Bruno Baruffaldi.
+ Pipe input to ps2pdf as this accepts input on stdin. Possibility pointed
out by Gaurav Arora.
* scriptindex:
+ If parsedate action's format includes %z adjust for the timezone if
possible (this requires the non-POSIX tm_gmtoff member of struct tm)
and flag an error for other platforms.
+ If parsedate action's format include %Z flag an error as that doesn't
seem to be usefully supported by strptime() anywhere.
+ Fix parsedate action to treat formats without a timezone as being UTC
instead of localtime.
+ Add date=unixutc. The existing date=unix works in localtime which is
unhelpful if you want to use it on the output of parsedate since that's in
UTC; date=unixutc is just like date=unix except it always works in UTC.
+ The date action now emits a warning for invalid values. The documentation
used to say "invalid values are ignored at present", but it's more helpful
to flag bad data than quietly ignore it.
+ We now check the date action's parameter at script parse time and unknown
values result in an error and nothing being indexed. Previously an unknown
format uselessly resulted in the terms D, M and Y literally being added to
every document.
+ The split action now supports a new "prefixes" split style. This gives all
the prefixes from the split, so split=/,prefixes on a file path gives all
parent directories.
omega:
* Remove documented limitation of $subdb and $subid - the implementation
assumed that each omega database name corresponded to a single Xapian
database, and if a database name referred to a stub database file expanding
to multiple Xapian databases then they would misbehave. Such cases are now
handled properly as well.
* Extend $addfilter to support adding negated filters via a new optional second
argument which specifies the type of filter to add.
* Stop $sort from needlessly ensuring the match has run.
* Handle corner case of nested $hitlist gracefully instead of potentially
entering an infinite loop.
testsuite:
* omegatest: Avoid setting TZ globally during tests as that hides bugs where
behaviour depends on the local timezone when it shouldn't.
* omegatest: Support testing when built using LeakSanitizer by suppressing
leak reports for cached compiled pcre regular expressions. These aren't
released when the program exits but aren't memory leaks.
build system:
* Remove outdated deprecation warning suppression which was there to support
building from git in the run up to 1.3.2 - a development version which is
nearly 5 years ago now.
portability:
* Fix problems with fallback strptime() implementation which was being included
in the wrong binary, and was lacking a required const_cast on the return
value.
* Rework setenv() compatibility handling. Now that Solaris 9 is dead we can
assume setenv() is provided by Unix-like platforms (POSIX requires it). For
other platforms, provide a compatibility implementation of setenv() so the
compatibility code is encapsulated in one place rather than replicated at
every use.
Omega 1.4.11 (2019-03-02):
indexers:
* omindex:
+ outlookmsg2html: Handle Subject, Date, and From headers.
omega:
* In $div and $mod we were converting a non-zero denominator from string to int
twice for no good reason.
testsuite:
* omegatest: Fix testcase which was failing if the local timezone was behind
UTC. This testcase was added in 1.4.10.
* omegatest: Tweak to not fail when $time not supported - it seems that the
OS time functions we use report an error on GNU Hurd for unknown reasons.
build system:
* Sync up probes for OS time functions in omega's configure with those in
xapian-core which may solve $time not being supported on GNU Hurd.
portability:
* Add missing includes of . Fixes #776, reported by Matthieu Gautier.
* Stop using htonl()/ntohl() in a non-network context which should improve
portability to platforms without a POSIX-like socket API.
Omega 1.4.10 (2019-02-12):
documentation:
* Use https for URLs where supported.
indexers:
* omindex:
+ Index .apxl and .kth files as Apple Keynote. The .apxl extension is used
for the XML files inside .key bundles/directories which hold the text
content of the presentation, and by handling them we can index .key
directories more usefully. It seems they are also sometimes found by
themselves. Keynote themes have a .kth extension, and key2text can also
handle these.
+ Pipe input to pdftotext, pdfinto and dpkg. These tools all support piping
an input file on stdin, which can be a little more efficient when we
already have the file open (e.g. to determine its type using libmagic, or
to calculate its checksum).
+ An empty string for the start directory is now flagged as an error.
Previously `/` was used instead, which is unlikely to be what is wanted
(and `/` can be explicitly specified if that really is what is wanted).
+ Fix emulation of stderr redirection when the indexer's stderr has been
closed. We try to avoid using the shell when running external filters, and
emulate 2>/dev/null in commands, but if the indexer's stderr was closed
this emulation was buggy and would make give the filter a closed stderr
instead of one redirected to /dev/null.
+ When emulating redirection to /dev/null, we now open /dev/null once and
dup that fd each time which is a little more efficient and simplifies the
code.
* scriptindex:
+ date=unix is now a no-op for empty input - previously it would unhelpfully
add boolean date terms for 1970-01-01.
+ Warn for empty filename in LOAD action. Previously this gave a slightly
confusing error: "Couldn't load file '': No such file or directory"
+ Unknown command-line options now cause scriptindex to give a non-zero exit
status.
testsuite:
* omegatest: Add testcase for SPAN.n on different slots.
* omegatest: Update expected QueryParser output for the xapian-core change to
produce flatter Query trees.
build system:
* Use AM_ICONV to detect iconv() which should handle non-system install of GNU
libiconv properly. Fixes #775, reported by Ryan Schmidt.
portability:
* Provide fall-back strptime() implementation for platforms which don't provide
it, using the C++11 std::get_time() function. We use strptime() directly
where it's available as some older C++11 compilers seem to lack
std::get_time() (GCC 4.8 for example). This is used by the parsedate action,
which was added in 1.4.6.
Omega 1.4.9 (2018-11-02):
indexers:
* omindex:
+ Try harder to avoid opening a file being indexed more than once by
reusing the file descriptor in more cases.
+ Hint to the OS not to cache output from external filters which require
using a temporary file.
* scriptindex:
+ If the LOAD action successfully opens a file but hits a read error the
error message now reports the file name correctly. Previously it would
report the partial file contents read so far instead of the file name.
portability:
* We no longer call posix_fadvise() with POSIX_FADV_NOREUSE under Linux,
since it's still not implemented there. We also now only call
posix_fadvise() with POSIX_FADV_DONTNEED right before we close the file
descriptor under Linux.
Omega 1.4.8 (2018-10-25):
documentation:
* Assorted minor documentation improvements.
indexers:
* omindex:
+ Improve date handling in .eml files. We now handle a "Date:" header
without the day of the week, which is allowed by RFC822 and RFC2822
(though seems rare in practice). If the date can't be parsed, we now
just omit the date information rather than failing to process the file.
+ Add support for indexing Apple iWork documents (Keynote (.key), Numbers
(.numbers) and Pages (.pages)) using libetonyek. Currently only the file
variants are handled since omindex doesn't currently support indexing a
directory as a document.
+ Index Visio files using vsd2xhtml.
+ Extend --filter to support filters which produce SVG as output.
+ Handle SVG embedded in XML with svg: namespace prefix.
+ Add --read-filters option to read a list of filters from a file, each line
of which is a rule as passed to --filter. Based on a patch from Gaurav
Arora.
+ Add new --mime-type-match option which allows specifying a MIME
Content-Type for a given shell filename pattern pattern (with the special
Content-Type values "ignore" and "skip" supported, as for --mime-type).
+ Adjust --mime-type to allow ':' in the extension. A valid MIME
Content-Type can't contain a colon, so if the argument to --mime-type
contains more than one colon it makes more sense to split at the *last*
colon (we used to split at the first), as an extension could conceivably
contain a colon. Mostly this change is for consistency with the new
--mime-type-match option, where the leafname pattern could reasonably
contain a colon.
+ Remove failed entries for ignored files. If a file is mapped to
pseudo-mimetype "ignore" then remove any existing failure record for it so
that ignored files so we don't potentially end up with a lot of cruft
failure records for files we are no longer trying to index.
+ If a file fails to index due to failing to allocate enough memory we now
try to flag it as failed to index so it will be skipped by default on
future runs. This should help to avoid indexing getting stuck on
problematic files.
+ Add a "pages" field with the number of pages in the document where we
know how to determine this (currently only for PDF files for which pdfinfo
reports this information).
+ Handle initially empty database exactly the same was as when --overwrite
is specified. This probably has no user-visible consequences, but it's
cleaner for the handling to be exactly the same.
* scriptindex:
+ Improve scriptindex diagnostic messages. All diagnostics are now labelled
as "error", "warning" or "note" as appropriate, and we now consistently
report "FILE:LINE:" (and also "COLUMN:" in most cases) to make it clearer
where the problem lies.
+ Add new "split" action which splits the text on a specified delimiter and
executes the following actions for each piece. Based on a patch by Gaurav
Arora.
+ Missing whitespace after the closing " on an action argument is now
flagged as an error. Previously scriptindex would attempt to parse
the following characters as the next action.
+ Support C-like escapes for quoted parameter values. Notably this means it
is now possible to include `"` in quoted parameter values.
omega:
+ Value-based date range filters can now be specified via CGI parameters
START.N, END.N and/or SPAN.N where N is a value slot number, allowing
multiple concurrent filters on different slots to be specified.
+ Support YYYY and YYYYMM limits in term-based date ranges. Previously
value-based date ranges supported these as limits, but term-based date
ranges gave an error.
+ Add stem_strategy option and deprecate existing stem_all option in favour
of this new more versatile option.
+ Support "natural" $sort option via new flag "#" which sorts embedded
natural numbers in numerical order.
+ Support numeric $sort option via new flag "n", similar to GNU sort -n.
+ Rewrite field parsing to be more efficient, and store fields in an
unordered_map for faster lookup.
testsuite:
* htmlparsetest: Test whitespace collapsing.
portability:
* omegatest: Avoid "set -". The autoconf manual notes that POSIX no longer
requires this, and that with traditional shells it resets -v and -x which
makes debugging harder.
* omegatest: Fix shell printf quoting issues which were a latent bug on macOS.
* Drop special handling for Compaq C++. We never actually achieved a working
build using it, and I can find no evidence that this compiler still exists,
let alone that it was updated for C++11 which we now require.
Omega 1.4.7 (2018-07-19):
omega:
* New OmegaScript $unique command. The existing $uniq only removes adjacent
entries (like the Unix uniq command) so to fully remove duplicates you need a
sorted input. Sometimes it is desirable to remove duplicates from an
unsorted list without changing the order of the entries which are left, so
add $unique to do that. If the list is sorted already, then $uniq is more
efficient.
* Fix $map to cleanly reject a single argument.
templates:
* templates/query: Merge multiple entries in the term frequency information,
which came from searching several prefixes by default. Reported by Alistair
Buxton on #xapian-discuss.
* When multiple words with the same stem are in the query string we now fully
eliminate duplicates when showing term frequency information.
Omega 1.4.6 (2018-07-02):
general:
* Fix generate_sample() (used by OmegaScript $truncate and omindex) to return
an empty sample instead of throwing an exception when the requested sample
size is less than the size of the truncation indicator string. Patch from
Addy. Fixes https://trac.xapian.org/ticket/754 reported by Gaurav Arora.
documentation:
* Use terminology "value slot number" instead of "value number".
* Stop talking about "probabilistic terms" and "probabilistic queries" - we've
supported other families of weighting schemes since 1.3.2.
indexers:
* Check for the HTML5 doctype or legacy doctype declaration and use default
charset UTF-8 if either is present. Previously we always used ISO-8859-1,
which is correct for older HTML versions, but not for HTML5.
* omindex:
+ When running commands without going through the shell, emulate shell exit
codes 127 (for command not found) and 126 (for other cases where we fail to
run the command). This means the "missing filter" handling should now work
properly for such commands. Noted by Gaurav Arora.
+ Index POD files despite minor formatting errors. We now pass
--errors=stderr to pod2text so that minor formatting errors don't prevent
us from indexing a file. (It may seem that --errors=none is a better
option, but for podlators < 4.11 that results in an ERRATA section in the
generated text version which we then end up indexing; 4.11 fixed that but
we can't assume that's in use). Reported by Gaurav Arora.
* scriptindex:
+ Avoid some unnecessary copying of Action objects by making use of C++11
features.
+ Consistently send errors to stderr - some were sent to stdout.
Patch from Gaurav Arora.
+ Add new "hextobin" action. Based on a patch from Gaurav Arora.
+ Warn about non-integer arg to hash.
+ Fix hash action without an argument, which was failing with an assertion.
Based on a patch by Gaurav Arora: https://github.com/xapian/xapian/pull/189
+ Reject 'hash' with argument < 6. The hashing truncates and then adds a
6 character hash of the removed part, so can't produce a result shorter
than 6 characters. Patch from Gaurav Arora.
+ Look for alphanumerics when parsing index actions. None of the current
index actions contain digits, but we give more helpful error messages this
way.
+ Deprecate allowing spaces around = in scripts. This was never documented
as supported, and leads to a missing argument quietly swallowing the next
action rather than using an empty value or giving an error. Reported by
Gaurav Arora in https://github.com/xapian/xapian/pull/182
+ In boolean and unique actions, add a colon between prefix and term when
the term starts with a colon. This means the mapping is reversible, and
matches what omega actually does in this case when it tries to reverse the
mapping. Thanks to Andy Chilton for pointing out this corner case.
+ Add parsedate and valuepacked actions. Together these assist adding date
values for sorting and date range filtering. Based on a patch from Gaurav
Arora.
+ Use DB_RETRY_LOCK to wait if the database is already in use rather than
sleeping for a second and retrying. On most platforms this means we make a
blocking request for the lock, and even on platforms where that's not
supported, we now sleep and retry inside libxapian, and without having to
throw and catch an exception each time.
omega:
* $freq: Speed up some cases by avoiding throwing and catching an exception
when we know the MSet has no term frequency information.
* $sort: New OmegaScript command which does a string sort on an OmegaScript
list, with u (unique) and r (reverse) options.
* $cond: New OmegaScript conditional multi-way conditional. Inspired by LISP's
COND, this provides a neater way to write a cascade of $if checks.
* $switch: New OmegaScript multi-way conditional which provides an even neater
way to write a cascade of $if{$eq{X,VALUE1},$if{$eq{X,VALUE2},...}}.
* $subdb and $subid: New commands which report the subdatabase name and the
docid in that subdatabase.
+ $termprefix and $unprefix: New OmegaScript commands which expose the existing
code inside omega for splitting up a term.
* Use str() to convert time_t to string, which is simpler code and faster than
using snprintf().
testsuite:
* omegatest: Fix message when faketime is not installed - we were misreporting
this case as "faketime not working".
* omegatest: Add feature tests of $map.
* Add testcases for XML charset. We already handle both default and specified
charsets for XML, but we didn't have any testcases for it.
build system:
* configure: Fix potentially confusing messages suggesting snprintf was added
in C90 - it was actually standardised in C99.
* Improve handling of multitarget rule stamp files. Clean them on "make
maintainer-clean" and ship them so that --enable-maintainer-mode when
building from a tarball doesn't needlessly rerun the multitarget rules.
portability:
* Check for EAGAIN as well as EINTR from select(). The Linux select(2) man
page says: "Portable programs may wish to check for EAGAIN and loop, just as
with EINTR" and that seems to be necessary for Cygwin at least.
packaging:
* Use https for tarball URLs in .spec files. This provides protection against
MITM attacks on people building packages using these spec files, and is also
slightly more efficient as the http: URLs redirect to the https: versions
anyway.
Omega 1.4.5 (2017-10-16):
documentation:
* Direct users towards $set{flag_spelling_correction,true} rather than the
deprecated $set{spelling,true} (which is slated for removal in 1.5.0).
* Fix typo in docs.
indexers:
* omindex:
+ Check file size before calling libmagic to get the mime type, since
reading the file size is a much cheaper check and we can skip the
libmagic test if the file is empty or larger than the specified
maximum size. Patch from caiyulun.
* scriptindex:
+ Reject index scripts with multiple "unique" actions. We don't handle this
case sensibly, and it doesn't seem like it really has a use, so better to
give an error for people who do this inadvertently.
omega:
* New $seterror command to set the error message. Implemented by Gaurav Arora.
* Make $highlight more efficient. Patch from Vivek Pal.
templates:
* query: Use $prettyurl for the URL shown at the end of each match (previously
we only used it on the URL shown as a fallback when the document has no
title). Split off from changes by Vivek Pal in
https://github.com/xapian/xapian/pull/161
testsuite:
* omegatest: Tell faketime to freeze the clock - previously the clock ran on
from the specified fake time, and on a slow and/or heavily loaded machine a
test taking more than a second might fail due to this.
* Start adding feature tests for scriptindex (so far, checking that specifying
multiple 'unique' actions results in an error).
Omega 1.4.4 (2017-04-19):
indexers:
* omindex:
+ 1.4.3 added a new --sample option, but contrary to the documentation
the default behaviour was to take the sample from the meta description
(which was the hard-wired behaviour in 1.4.2 and earlier). The default
has now been changed to take the sample from the body.
+ Index .shtm, .xhtml and .xhtm as HTML by default - .shtm is another
extension used for server-parsed HTML (in addition to the more common
.shtml), and .xhtm and .xhtml are XHTML.
+ Fix fallback lookup for extension containing upper case. User mappings
worked, but built-in extension to MIME type mappings were effectively being
ignored (because the result of the function call was not being checked).
Bug introduced in 1.3.4.
+ Fix term-based date ranges, broken by changes in 1.4.2. Found and
diagnosed by Gaurav Arora.
+ Handle date range with start after end better - with term-based ranges,
this used to generate a bogus filter, but now just generates Dlatest.
+ Use Y-term when range starts/ends at year start/end. Previously we used 12
M-terms for these cases.
+ Use full leap-year check when constructing term-based date ranges -
previous code was good until 2100, but even then it would only result
in an extra term being included for a non-existent February 29th in
rare cases.
omega:
* New OmegaScript command $cgiparams which returns a list of the parameter
names.
* Handle tab in a CGI parameter name in the same way as space. Mostly this is
a way to avoid having tabs in CGI parameter names - they aren't useful, but
if they could have tabs in we can't put CGI parameter names in a list.
templates:
* query: Fix highlighting of matching terms. We were using both $snippet and
$highlight, which results in double highlighting and HTML escaping, most
noticeable by literal and appearing around matching terms
in the rendered HTML snippet. Reported by Mark Thomas on xapian-discuss.
build system:
* If gen-mimemap failed after creating mimemap.h, the rule wouldn't get rerun.
Omega 1.4.3 (2017-01-25):
indexers:
* omindex:
+ Add support for indexing vCard files if Perl and its Text::vCard module
are available.
+ Recognise application/x-rpm as alternative type since libmagic reports this
rather than application/x-redhat-package-manager.
+ Use official MIME type application/vnd.debian.binary-package for debian
packages. We used to map .deb and .udeb to application/x-debian-package,
but in 2014 (after we added that support for .deb) an official type was
registered with IANA. We now map extensions .deb and .udeb to the official
type, but the unofficial type is still recognised (older versions of
libmagic probably report it, and users may be mapping to it).
+ Handle PHP as MIME type text/x-php. The main difference this makes is that
PHP files which don't have extension '.php' (e.g. .phtml, .phps, .php5,
.ph4, etc) get identified by libmagic as text/x-php and will now be indexed.
It also means that the user can now more easily configure different filters
for HTML and PHP.
+ Don't use meta description as sample by default. Now we have dynamic
snippets (via $snippet), the body text is a better default. Also generated
HTML sometimes has unhelpful content in the meta description. To get the
previous behaviour, use the new omindex command line option:
--sample=description
Omega 1.4.2 (2016-12-26):
documentation:
* Replace auto-generated list of the supported MIME types with an
auto-generated table showing the extensions that are mapped to each MIME type
by default. Partly addresses #569, reported by catkin.
indexers:
* omindex: Add support for indexing markdown files (extension .md or .markdown,
mime-type text/markdown, using "markdown" to convert to HTML).
testsuite:
* Add support for "make installcheck" to run tests against installed version.
build system:
* configure: Fail with clear error with xapian-core < 1.4.0.
portability:
* Fix GCC -Wimplicit-fallthrough warning.
* Add missing for time_t.
* Avoid snprintf for formatting fixed-width integers - it results in warnings
about possible output truncation with GCC7 (which aren't actually possible
due to limited input range) and it's a bit heavyweight for this job anyway.
Omega 1.4.1 (2016-10-21):
documentation:
* Document bug in how $filters encodes DOCIDORDER=A.
* Suggest DOCIDORDER=X for DONT_CARE.
* Correct mentions of C++ API method MSet::get_snippet() to MSet::snippet().
* Fix typo in Omega 1.4.0 NEWS entry. Patch from James Aylett.
indexers:
* omindex: Also index leafname with _ and & replaced by spaces. Literal spaces
are often avoided in filenames, and "hello_world.txt" ought to be searchable
for via "hello" and "world". Partly addresses #618, reported by Julien
Pfefferkorn.
omega:
* Add support for sorting by more than one value - e.g. SORT=+1,-2
* Add $msizelower and $msizeupper which provide access to the lower and upper
bounds on the number of matches.
* Add support for $set{weighting,coord}.
* Add weightingpurefilter option. Normally a query consisting only of filter
terms won't have relevance weights calculated. This new option allows you to
specify a weighting scheme to use for such queries, with the same values
supported as for the existing weighting option. For example,
$set{weightingpurefilter,coord} will weight such queries by how many filter
terms match each document.
* $filters now includes DATEVALUE, which means we'll force the first page when
reloading or changing page starting from existing URLs upon upgrade to 1.4.1,
but the exact same existing URL could be for a search without the date filter
where we want to force the first page, so there's an inherent ambiguity
there. Forcing first page in this case seems the least problematic
side-effect. Omission noted by Gaurav Arora.
testsuite:
* Add feature test for boolprefix and prefix maps.
* Add more feature tests for $filters.
build system:
* GCC 4.7 is now enforced as the minimum version.
* Drop unused configure check for symbol visibility
* Drop compiler options that are no longer useful:
+ -fshow-column is the default in all GCC versions we now support
(checked as GCC 4.6).
+ -Wno-long-long is no longer necessary now that we require C++11 where
"long long" is a standard type.
portability:
* Fix build on platforms which don't provide timegm(), such as Cygwin.
Reported on xapian-discuss by John Bankert.
Omega 1.4.0 (2016-06-24):
documentation:
* Clarify $allterms and $terms documentation. Make it clearer how they differ,
and document that $allterms without a parameter list gives all terms indexing
the current hit. Noted by Andy Chilton.
Omega 1.3.7 (2016-06-01):
indexers:
* Make named entity look-up (e.g. é -> 233) use the same keyword-lookup
table approach we already use for HTML tags and built-in MIME content-types,
rather than a std::map, which makes it faster while using less memory.
Omega 1.3.6 (2016-05-09):
documentation:
* Fix overview.rst processing in VPATH build. Our workaround for lack of an
include path in docutils was only handling the first include in the file.
omega:
* Implement $match command for omegascript. Patch from Richhiey Thomas.
templates:
* Lower case all HTML tags, attributes and values; explicitly close