Omega 1.4.5 (2017-10-16):
documentation:
* Direct users towards $set{flag_spelling_correction,true} rather than the
deprecated $set{spelling,true} (which is slated for removal in 1.5.0).
* Fix typo in docs.
indexers:
* omindex:
+ Check file size before calling libmagic to get the mime type, since
reading the file size is a much cheaper check and we can skip the
libmagic test if the file is empty or larger than the specified
maximum size. Patch from caiyulun.
* scriptindex:
+ Reject index scripts with multiple "unique" actions. We don't handle this
case sensibly, and it doesn't seem like it really has a use, so better to
give an error for people who do this inadvertently.
omega:
* New $seterror command to set the error message. Implemented by Gaurav Arora.
* Make $highlight more efficient. Patch from Vivek Pal.
templates:
* query: Use $prettyurl for the URL shown at the end of each match (previously
we only used it on the URL shown as a fallback when the document has no
title). Split off from changes by Vivek Pal in
https://github.com/xapian/xapian/pull/161
testsuite:
* omegatest: Tell faketime to freeze the clock - previously the clock ran on
from the specified fake time, and on a slow and/or heavily loaded machine a
test taking more than a second might fail due to this.
* Start adding feature tests for scriptindex (so far, checking that specifying
multiple 'unique' actions results in an error).
Omega 1.4.4 (2017-04-19):
indexers:
* omindex:
+ 1.4.3 added a new --sample option, but contrary to the documentation
the default behaviour was to take the sample from the meta description
(which was the hard-wired behaviour in 1.4.2 and earlier). The default
has now been changed to take the sample from the body.
+ Index .shtm, .xhtml and .xhtm as HTML by default - .shtm is another
extension used for server-parsed HTML (in addition to the more common
.shtml), and .xhtm and .xhtml are XHTML.
+ Fix fallback lookup for extension containing upper case. User mappings
worked, but built-in extension to MIME type mappings were effectively being
ignored (because the result of the function call was not being checked).
Bug introduced in 1.3.4.
+ Fix term-based date ranges, broken by changes in 1.4.2. Found and
diagnosed by Gaurav Arora.
+ Handle date range with start after end better - with term-based ranges,
this used to generate a bogus filter, but now just generates Dlatest.
+ Use Y-term when range starts/ends at year start/end. Previously we used 12
M-terms for these cases.
+ Use full leap-year check when constructing term-based date ranges -
previous code was good until 2100, but even then it would only result
in an extra term being included for a non-existent February 29th in
rare cases.
omega:
* New OmegaScript command $cgiparams which returns a list of the parameter
names.
* Handle tab in a CGI parameter name in the same way as space. Mostly this is
a way to avoid having tabs in CGI parameter names - they aren't useful, but
if they could have tabs in we can't put CGI parameter names in a list.
templates:
* query: Fix highlighting of matching terms. We were using both $snippet and
$highlight, which results in double highlighting and HTML escaping, most
noticeable by literal and appearing around matching terms
in the rendered HTML snippet. Reported by Mark Thomas on xapian-discuss.
build system:
* If gen-mimemap failed after creating mimemap.h, the rule wouldn't get rerun.
Omega 1.4.3 (2017-01-25):
indexers:
* omindex:
+ Add support for indexing vCard files if Perl and its Text::vCard module
are available.
+ Recognise application/x-rpm as alternative type since libmagic reports this
rather than application/x-redhat-package-manager.
+ Use official MIME type application/vnd.debian.binary-package for debian
packages. We used to map .deb and .udeb to application/x-debian-package,
but in 2014 (after we added that support for .deb) an official type was
registered with IANA. We now map extensions .deb and .udeb to the official
type, but the unofficial type is still recognised (older versions of
libmagic probably report it, and users may be mapping to it).
+ Handle PHP as MIME type text/x-php. The main difference this makes is that
PHP files which don't have extension '.php' (e.g. .phtml, .phps, .php5,
.ph4, etc) get identified by libmagic as text/x-php and will now be indexed.
It also means that the user can now more easily configure different filters
for HTML and PHP.
+ Don't use meta description as sample by default. Now we have dynamic
snippets (via $snippet), the body text is a better default. Also generated
HTML sometimes has unhelpful content in the meta description. To get the
previous behaviour, use the new omindex command line option:
--sample=description
Omega 1.4.2 (2016-12-26):
documentation:
* Replace auto-generated list of the supported MIME types with an
auto-generated table showing the extensions that are mapped to each MIME type
by default. Partly addresses #569, reported by catkin.
indexers:
* omindex: Add support for indexing markdown files (extension .md or .markdown,
mime-type text/markdown, using "markdown" to convert to HTML).
testsuite:
* Add support for "make installcheck" to run tests against installed version.
build system:
* configure: Fail with clear error with xapian-core < 1.4.0.
portability:
* Fix GCC -Wimplicit-fallthrough warning.
* Add missing for time_t.
* Avoid snprintf_for formatting fixed-width integers - it results in warnings
about possible output truncation with GCC7 (which aren't actually possible
due to limited input range) and it's a bit heavyweight for this job anyway.
Omega 1.4.1 (2016-10-21):
general:
documentation:
* Document bug in how $filters encodes DOCIDORDER=A.
* Suggest DOCIDORDER=X for DONT_CARE.
* Correct mentions of C++ API method MSet::get_snippet() to MSet::snippet().
* Fix typo in Omega 1.4.0 NEWS entry. Patch from James Aylett.
indexers:
* omindex:
* Also index leafname with _ and & replaced by spaces. Literal spaces are
often avoided in filenames, and "hello_world.txt" ought to be searchable for
via "hello" and "world". Partly addresses #618, reported by Julien
Pfefferkorn.
omega:
* Add support for sorting by more than one value - e.g. SORT=+1,-2
* Add $msizelower and $msizeupper which provide access to the lower and upper
bounds on the number of matches.
* Add support for $set{weighting,coord}.
* Add weightingpurefilter option. Normally a query consisting only of filter
terms won't have relevance weights calculated. This new option allows you to
specify a weighting scheme to use for such queries, with the same values
supported as for the existing weighting option. For example,
$set{weightingpurefilter,coord} will weight such queries by how many filter
terms match each document.
* $filters now includes DATEVALUE, which means we'll force the first page when
reloading or changing page starting from existing URLs upon upgrade to 1.4.1,
but the exact same existing URL could be for a search without the date filter
where we want to force the first page, so there's an inherent ambiguity
there. Forcing first page in this case seems the least problematic
side-effect. Omission noted by Gaurav Arora.
testsuite:
* Add feature test for boolprefix and prefix maps.
* Add more feature tests for $filters.
build system:
* GCC 4.7 is now enforced as the minimum version.
* Drop unused configure check for symbol visibility
* Drop compiler options that are no longer useful:
+ -fshow-column is the default in all GCC versions we now support
(checked as GCC 4.6).
+ -Wno-long-long is no longer necessary now that we require C++11 where
"long long" is a standard type.
portability:
* Fix build on platforms which don't provide timegm(), such as Cygwin.
Reported on xapian-discuss by John Bankert.
Omega 1.4.0 (2016-06-24):
general:
documentation:
* Clarify $allterms and $terms documentation. Make it clearer how they differ,
and document that $allterms without a parameter list gives all terms indexing
the current hit. Noted by Andy Chilton.
Omega 1.3.7 (2016-06-01):
indexers:
* Make named entity look-up (e.g. é -> 233) use the same keyword-lookup
table approach we already use for HTML tags and built-in MIME content-types,
rather than a std::map, which makes it faster while using less memory.
Omega 1.3.6 (2016-05-09):
documentation:
* Fix overview.rst processing in VPATH build. Our workaround for lack of an
include path in docutils was only handling the first include in the file.
omega:
* Implement $match command for omegascript. Patch from Richhiey Thomas.
templates:
* Lower case all HTML tags, attributes and values; explicitly close
tags. Patches from Vivek Pal and Nirmal Singhania.
* Migrate Omega Templates to HTML5. Patch from Nirmal Sighania.
* templates/query: Remove stray double quote from generated URL for spelling
suggestion when THRESHOLD is set. Patch from Nirmal Singhania.
* templates/opensearch: Change response feeds to support OpenSearch 1.1.
Patch from Nirmal Singhania.
testsuite:
* Update omegatest - the order of subqueries has changed in some cases, due
to the "grouping" changes in the C++ API.
build system:
* Drop workaround for old git master before 1.3.2
Omega 1.3.5 (2016-04-01):
This release includes all changes from 1.2.23 which are relevant.
omega:
* Add optional prefix argument to $terms.
* $snippet now uses MSet::snippet() instead of the Snipper class.
* Add $contains{STRING1,STRING2}. Contributed by Ayush Gupta.
* Add support for negated boolean filter terms, specified by CGI parameter "N".
* Support a direction prefix on SORT: '+' for ascending, '-' for descending.
SORTREVERSE set to non-0 now flips the direction. Fixes #697, reported by
Andy Chilton.
build system:
* Need to AC_SUBST probed value of ZLIB_LIBS. Noted by Paul Wise.
portability:
* omegatest; Test faketime actually works, and if it doesn't work skip
testcases which use it. On OS X 10.11, faketime from homebrew doesn't seem
to work, probably due to the new "System Integrity Protection". Fixes part
of #707, reported by James Aylett.
Omega 1.3.4 (2016-01-01):
This release includes all changes from 1.2.22 which are relevant.
documentation:
* The lists of recognised MIME types and of ignored extensions are now
generated along with the corresponding source code from a single master list.
Partly addresses #569, reported by Charles Atkinson.
* Note when $json and $jsonarray were added.
indexers:
* omindex:
+ Avoid using the shell to run most external commands as it's unnecessary
overhead. For the built-in filters, the only cases which now use a shell
are where we run two unzip commands. For user-specified commands, a simple
and slightly conservative test is used, which should avoid a shell in most
common cases where it isn't needed. Notably, environment variables set
before the command are handled.
+ Track files which couldn't be indexed in the user metadata and skip them by
default on subsequent runs to avoid the costs of repeatedly running a
filter on a file it can't handle. Run omindex with --retry-failed to retry
such files.
+ Overhaul the "per-site" terms:
- 'H' prefix is hostname as before, except that if the term would be > 240
bytes (unlikely but possible) the end is hashed is the same way 'U'
prefix terms are.
- 'P' terms are now added for every directory level, not just the start
URL's path.
- A new 'J' prefix term is added with the start URL (less any trailing
'/'), which means all files indexed from a particular "site" are now
indexed by one term. See #376.
+ Add 'skip' pseudo-mimetype which extensions can be mapped to, and they will
then be reported and skipped (to complement the existing 'ignore'
pseudo-mimetype which causes files with the specified extension to be
quietly ignored).
+ Treat a command of 'true' specially as meaning make the text extraction a
no-op (as actually running /bin/true effectively would). This provides a
way to index some file types by only meta-data. Fixes #519, reported by
Brian Burton.
+ Add support for wildcard mimetypes */* and *. Combined with filter command
``true`` for indexing by meta-data only, you can specify a fall back case
of indexing by meta-data only using ``--filter '*:true'``. From a
suggestion by Brian Burton on xapian-discuss.
+ Index message/rfc822 and message/news. These are individually saved email
messages and news articles.
+ Index archived web page formats MAFF and MHTML.
+ Handle .xla, yet another XL extension.
+ Handle metadata in LibreOffice HTML export (dcterms.subject,
dcterms.description, dcterms.creator and dcterms.contributor).
+ Use zlib's gzopen() instead of invoking "gzip -dc" for compressed Abiword
documents.
omega:
* Add options argument to $transform.
* Cache compiled regexps used in $transform.
* Add $ord OmegaScript command which returns the Unicode codepoint for the
first character of a UTF-8 string.
* Add $chr OmegaScript command which returns the UTF-8 string for given Unicode
codepoint.
* Add $csv OmegaScript command which escapes a string for use as a field in a
CSV file ("always quote" mode inspired by patch from Gaurav Arora.)
* New $filters encoding which avoids collisions. We also compare CGI parameter
xFILTERS to what $filters would have returned in previous releases, so that
on upgrades old format serialised filters are handled correctly.
* Fix $jsonarray not to prepend ']' to the first array element.
* Skip weighting scheme setup for a pure date range query - it won't be
weighted anyway, so we can avoid having to parse weighting scheme parameters,
etc.
* Use value ranges when date range filtering by value. Should be more
efficient than a MatchDecider, and will automatically take advantage of any
future value range optimisations in xapian-core.
* Add default_db and default_template config options. These allow the default
template and default database name to be set via the config file, rather than
being stuck with the respective defaults of "default" and "query". Fixes
#310, reported by Marco Hennigs.
* Add support for non-exclusive filters. Fixes #234, reported by Thomas
Viehmann.
testsuite:
* Add start of testsuite for omega CGI.
build system:
* configure script now defaults to looking for xapian-config-1.3. This is now
automatically done for development series (odd middle component of the
version number), but not for stable series (even middle component). Fixes
#695, reported by Jorge C. Leitão
* Don't pointlessly link omega binary with libmagic (as we have since 1.3.1).
portability:
* Fix "make check" compilation failure on platforms without timegm().
Omega 1.3.3 (2015-06-01):
This release includes all changes from 1.2.20-1.2.21 which are relevant.
documentation:
* INSTALL: IRIX is past EOL so drop information about IRIX make.
indexers:
* omindex:
+ Add support for %f in command passed to --filter to allow specifying
commands where the input file is not the final argument. Fixed #570,
reported by Charles Atkinson.
+ Allow --filter to handle commands which produce output in a temporary file
rather than on stdout.
+ Allow --filter to specify the character set of the output the filter
produces.
+ Handle application/vnd.ms-excel, text/x-perl and application/x-dvi via
default --filter settings instead of hardcoded cases (now possible thanks
to the new abilities that --filter has).
+ Add support for specifying a MIME subtype of '*' in --filter arguments.
+ Add -track-ctime option to allow omindex to pick up changes to file
ownership and permissions.
+ Index terms from the leafname with an 'F' prefix, rather than treating them
as more body text. (Fixes #633, reported by Emmanuel Garette)
omega:
* Fix handling of multiple P. fields - previously only the first seen
was used. These fields are also now taken into account when deciding if the
query has changed. $query now returns an OmegaScript list with one entry for
each CGI parameter passed.
templates:
* templates/query: Fix setting setting of prefix map for P - in 1.3.2, this
would failed to also search in the subject. Now it also searches in the
subject and topic.
build system:
* configure: Fix typo in message: 'libmagic-devl' -> 'libmagic-devel'
portability:
* Require a compiler with good C++11 support, like xapian-core now does.
* Now we require C++11, just include for uint32_t.
* Link omindex-list with our (GNU) getopt for platforms which don't use GNU libc.
Thanks to James Aylett
* Add timegm.cc to scriptindex_SOURCES to fix build on platforms which don't
provide timegm().
* Suppress bogus uninitialised variable warning with -Os under GCC 4.7.2.
packaging:
Omega 1.3.2 (2014-11-24):
This release includes all changes from 1.2.16-1.2.19 which are relevant.
general:
documentation:
* docs/overview.rst: Document built-in list of stopwords.
* docs/termprefixes.rst: Update for renaming of 'brass' backend to 'glass'.
indexers:
* omindex:
+ The starting URL wasn't previously URL encoded. In 1.2.18, a minimally
intrusive fix was implemented. In 1.3.2, we now encode the starting URL
as we do for the rest of the filename.
+ Don't assume .doc is application/msword but let libmagic decide, since .doc
files may actually be RTF, and sometimes people use .doc for plain-text
documentation.
+ Add support for indexing 'topic' and 'created date' meta-data for
OpenDocument format and HTML.
+ Index "topic" for PDF documents.
+ Commit changes and exit, rather than skipping the current file on most
unexpected errors reading directories or initialising libmagic - otherwise
we can end up deleting a lot of database entries on errors like EHOSTDOWN
when indexing network mounts.
+ Add --opendir-sleep=SECS option to allow working around problems with
indexing files on Microsoft DFS shares.
+ If we get ENOTDIR trying to index a file, skip it quietly (unless in
verbose mode) as we already do if we get ENOENT, since ENOTDIR is what we
get if the file and the directory it was in got removed between us getting
the filename and trying to open it.
+ Handle ENOENT, ENOTDIR and EACCES from readdir().
+ If we've already opened the file (as we often will have if using a modern
libmagic with magic_descriptor() available), then use fstat() on that fd
rather than stat()/lstat() on the pathname.
+ Pass error message string and errno value in ReadError exceptions.
+ Report strerror(errno) if we can't read a file.
+ Filtering via text/html now handles HTML documents which specify a charset.
+ Add support for indexing Microsoft Publisher files using pub2xhtml.
+ Restrict the length of what we consider to be an extension, currently to 7
characters or whatever the longest extension in the mime_map is if it is
longer.
+ Avoid '//' in temporary filenames (cosmetic only).
* omindex-list: New tool to list URLs of all the documents in a database (or
list of databases) indexed by omindex.
omega:
* Allow setting query expansion scheme to "bo1".
* Make the $json and $jsonarray force the text to be valid UTF-8, since
otherwise the output isn't valid JSON.
* Check parameters to $set{weighting,bm25 ...} and $set{weighting,trad ...}
converted OK. Based on patch from Aarsh Shah.
* Add support to $set{weighting,...} for bb2, dlh, dph, ifb2, ineb2, inl2, lm,
pl2 when we're built against a xapian-core which is new enough to have these
schemes.
* Add $snippet to generate a snippet of text tailored to the search.
build system:
* configure: Enable GCC's -Woverloaded-virtual warning.
portability:
* Ship common/safewinsock2.h, needed under mingw.
Omega 1.3.1 (2013-05-03):
This release includes all changes from 1.2.10-1.2.15 which are relevant.
documentation:
* INSTALL,configure: Provide hints as to what package to install for magic.h.
indexers:
* The HTML parser now explicitly handles , and .
* Use a generated compact and efficient table to convert HTML tag names
to enum codes - this is both faster and smaller than the approach we were
using, with the benefit that the table is auto-generated.
* Always use our built-in conversion code for the character sets it can handle
(previously we'd use iconv if available; now we only use iconv for other
character sets). This gives us more consistent results, and in particular
means we now handle BOMs better (at least when using GNU iconv).
* A lot of data labelled as "iso-8859-1" is actually "windows-1252". The two
only differ in characters which are control characters in iso-8859-1, so
assume the latter when we see the former.
* omindex:
+ Extend --filter to handle commands which produce HTML on stdout.
+ Don't report an error if a file is deleted (or renamed) between us reading
the directory entry for it and trying to read the file itself by default.
In --verbose mode, the situation is still reported, but now with a
specific message.
+ If omindex receives any of the signals SIGHUP, SIGINT, SIGQUIT or SIGTERM,
then kill any active external filter child process, then handle the signal
as we did before. If setpgid() is available, put each external filter in
its own process group and kill the whole process group when we get a
signal.
+ Use magic_descriptor() if the version of libmagic we're building against
is new enough to have it. This eliminates an extra opening of a file
being indexed in certain cases.
+ Use rst2html to handle .rst and .rest files.
omega:
* Add new $json and $jsonarray OmegaScript commands to support producing JSON
output.
* Add $truncate command which truncates a string after a word.
* Add support for $set{weighting,tfidf} to allow the new TfIdfWeight weighting
scheme to be used.
build system:
* configure: Now looks for libmagic in MAGIC_PREFIX, to allow building with
libmagic installed in a non-standard location.
* Remove support for 'configure --enable-quiet', 'make QUIET=' and 'make
QUIET=y' - automake now supports 'configure --enable-silent-rules', 'make
V=1' and 'make V=0' which are broadly equivalent and more standard.
portability:
* tmpdir.cc: Add safeunistd.h for rmdir, required by GCC 4.7 (reported by
Gaurav Arora).
Omega 1.3.0 (2012-03-14):
general:
* Make libmagic a required dependency.
documentation:
* docs/termprefixes.html: Document how to map a user prefix to multiple term
prefixes.
* docs/overview.html: Improve documentation of htdig_noindex.
indexers:
* omindex:
+ Index title with an 'S' prefix rather than no prefix.
+ If the document with the highest existing docid before the run was updated,
we were reporting it as "added", but now we correctly report it as
"updated".
+ Catch and report std::exception explicitly, so failing to allocate memory
is no longer reported as "Unknown exception".
* scriptindex:
+ Remove special error handling case noting that index=nopos was replaced
with indexnopos - this was removed in 1.1.0 so there's been enough time to
upgrade.
omega:
+ DEFAULTOP now defaults to AND rather than OR, since that matches what pretty
much every search engine does these days. Closes ticket#512.
* Allow mapping a query string prefix to more than one term prefix (which
xapian-core has supported since 1.0.4).
* Add support for search inputs for multiple probabilistic prefixes, with
support for per-prefix stemmers.
* Drop legacy support for handling '.' separated terms in xP - that changed in
Omega 0.9.7, more than 5 years ago now.
* Remove support for OLDP CGI parameter which was superseded by xP
approximately a decade ago, and isn't even documented!
* Drop special handling for R-prefixed terms in $prettyterm - we stopped
generating these in Xapian 1.0.
templates:
* templates/query:
+ We now map unprefixed queries to include S-prefixed terms to match the
change in omindex to prefixing terms from the title with S. You may want
to make the same update to your own templates.
+ Set up prefixes for 'author:' and 'title:'.
packaging:
* xapian-omega.spec: We're ABI compatible within a release series so make
dependency on xapian-core-libs >= rather than =.
Omega 1.2.23 (2016-03-28):
documentation:
* Update links to Xapian website and trac to use https, which is now supported,
thanks to James Aylett.
indexers:
* Fix HTML/XML entity decoding to be O(n) not O(n²) - processing HTML/XML with
a lot of entities is now much faster.
templates:
* Remove unused country code to name maps. These were intended as examples,
but they aren't very useful as such, and really just bloat the templates
needlessly.
Omega 1.2.22 (2015-12-29):
documentation:
* Stop maintaining ChangeLog files. They make merging patches harder, and stop
'git cherry-pick' from working as it should. The git repo history should be
sufficient for complying with GPLv2 2(a).
* Clarify help text for omindex --mime-type option.
* docs/omegascript.rst:
+ Fix documentation of $last to say it's the MSet index *one beyond* the end
of the current page. Reported by Andrew Chilton.
+ Clarify that $split and $substr work in bytes. Previously we said
"characters" which could be taken as meaning they work with UTF-8
characters.
+ Update documentation for $filters - it was missing these CGI parameters
from the list of those serialised: COLLAPSE, DOCIDORDER, SORT, SORTREVERSE,
SORTAFTER
+ Explicitly note user can use $setmap to create their own maps.
* docs/overview.rst:
+ SVG extraction is built-in too.
+ Expand paragraph about command `false`. Note the versions where explicit
support was added, and that this will also work with any version on Unix,
where `false` is a command.
+ Document `cdb_dir`.
* docs/cgiparams.rst: Document behaviour if xDB is not set.
* Change "characters" to "bytes" in a few places to clarify that we don't mean
Unicode code points.
indexers:
* omindex:
+ Add '--title-size' option.
+ Handle .oft the same way as .msg - it's some sort of template email, and
has essentially the same format.
omega:
* Make $querydescription ensure the match has been run, so that it includes
filters.
* Avoid $allterms, $cgilist, $filterterms and $terms being O(n²) in the number
of items in the returned list.
* If xFILTERS is not set, don't force the first page as that's unhelpful if
someone fails to set it in their template.
* When environment variable SERVER_PROTOCOL is set to INCLUDED (as it is when
we're being included in a page), we already suppress the HTTP headers, but
now we suppress the blank line after the header too.
* Support option flag_cjk_ngram if built against xapian-core >= 1.2.22.
testsuite:
* Add test coverage for parsing of HTML entities.
build system:
* Fix error reporting if PCRE isn't installed. Fixes #693, reported by lhz7370.
portability:
* Avoid warning when building with glibc >= 2.21.
* Don't provide our own implementation of sleep() under __WIN32__ if there
already is one - mingw provides one, and in some situations it seems to clash
with ours. Reported to xapian-discuss by John Alveris.
* Stop trying to use O_STREAMING - the patch to implement it was never merged
into the Linux kernel, and I can't find any evidence that other platforms
implement it. The constant value O_STREAMING used now seems to be used for
the part of O_SYNC which isn't covered by O_DSYNC, which seems likely to hurt
performance if anything.
Omega 1.2.21 (2015-05-20):
documentation:
* docs/overview.rst: Document 'E' prefixed boolean terms for filtering by
extension (see #668, reported by bramvdh).
* docs/encodings.rst: Add a document about character encoding, as suggested by
James Aylett in #550.
indexers:
* omindex:
+ outlookmsg2html: Fix handling of message/rfc822 subparts.
omega:
* $prettyurl now decodes valid UTF-8 sequences, and some additional ASCII
characters in the path part: []@!$&'()*+.;= (Fixes #550 and #644, reported by
catkin and terencz.)
* $prettyurl now leaves the query and fragment parts of the URL alone and won't
decode an escaped "/" (omindex doesn't create URLs with any of these, so we
only risk breaking other URLs which have them).
* Drop compilation date and time from output when run from the command line -
they prevent reproducible builds and the version number is sufficient
information.
templates:
* templates/query: When listing matching terms, don't make the commas italic.
* templates/query: Eliminate blank line before .
* templates/xml: Add XML declaration.
* templates/godmode: Specify charset utf-8 in the content-type.
build system:
* Link test programs with libtool's '-no-install' or '-no-fast-install', like
we already do in xapian-core, which means that libtool doesn't need to
generate shell script wrappers for them on most platforms.
portability:
* Add spaces between literal strings and macros which expand to literal strings
for C++11 compatibility.
* Remove 'register' as it's deprecated and clang spits out warnings because of
that. Any modern compiler likely just ignores it as an optimisation hint
anyway.
Omega 1.2.20 (2015-03-04):
documentation:
* docs/cgiparams.rst: Improve wording of docs for SORT parameter.
* docs/omegascript.rst: Update documentation references to DATE1, DATE2, and
DAYSMINUS which were renamed in 0.6.x and the compatibility aliases removed
in 1.0.0.
indexers:
* omindex:
+ Ignore extensions .msi and .msp, which are Microsoft installer files, but
which libmagic sometimes incorrectly identifies as application/msword.
+ Interpret a command of "false" in "--filter" as meaning to ignore files
with that MIME type.
omega:
* Handle CGI parameter [=0 as [=1.
templates:
* templates/xml: Update handling of DATE1, DATE2 and DAYSMINUS which were
renamed in 0.6.x and the compatibility aliases removed in 1.0.0.
build system:
* configure: Use pkg-config in preference to determine flags needed to
compile and link with PCRE, as this will just work when cross-compiling
(at least under MXE).
* configure: Define MINGW_HAS_SECURE_API under mingw to get _putenv_s()
declared in stdlib.h.
* Enable automake option 'subdir-objects' to avoid warning from newer automake.
portability:
* Avoid doing link tests with libmagic in configure as they fail on mingw due
to not automatically picking up libraries which libmagic itself depends on.
Omega 1.2.19 (2014-10-21):
documentation:
* docs/overview.rst: Note that pdftotext is part of poppler as well as xpdf.
(Noted by Paul Wise)
Omega 1.2.18 (2014-06-22):
indexers:
* omindex:
+ Work around libmagic returning a MIME content-type of "Composite Document
File V2 Document[...]" or "application/CDFV2-corrupt" by returning a more
suitable filetype based on looking at the file's extension.
+ The starting URL wasn't previously URL encoded. In 1.3.2, this will be
fixed by URL encoding it as we do for the rest of the path, for the 1.2
branch we only URL encode it if it contains a character <= 31 or at least
one of '#', '%', ':' or '?'. This avoids a one-off reindex of every
document in the database in cases which work OK in practice.
+ When we skip a file because it exceeds the configured size limit, include
that size limit in the message.
omega:
* Add support for setting the query expansion scheme to use.
portability:
* Don't compile in unixperm.cc - it isn't currently used, and it fails to build
with mingw. (fixes #635, reported by Alexis Denis)
* Fix warning when built with GCC 4.7.2 using -Os.
* Removed unused inline function, fixing compiler warning.
Omega 1.2.17 (2014-01-29):
documentation:
* docs/overview.html: Add Abiword as an example use of --filter, based on patch
from Frank J Bruzzaniti (fixes#383).
portability:
* Fix "no previous declaration" warning on platforms which don't have
mkdtemp().
Omega 1.2.16 (2013-12-04):
indexers:
* omindex:
+ Fix off-by-one when finding documents to delete which would sometimes cause
omindex to fail to delete documents from the database when they weren't
refound during an index update.
+ Decode dates in xlsx files.
+ Ignore extensions 'adm', 'cur', and 'ico' by default.
+ Group-readable files which are owner-readable but not world-readable should
still get a "readable by owner" term added. Reported by Emmanuel Garette.
build system:
* Compress source tarballs with xz instead of gzip.
* configure: Sync compiler warning flag machinery against xapian-core. The
changes are special handling for clang, passing -fshow-column where
supported, and handling for new warning flags in GCC 4.6 and 4.7.
Omega 1.2.15 (2013-04-16):
omega:
* Don't pointlessly link utf8convert.o into the omega CGI.
Omega 1.2.14 (2013-03-14):
indexers:
* omindex:
+ Correct "max" -> "min" when reserving space for shared strings in .xlsx
files. This just means we now reserve a more appropriate amount of space
to start with.
+ Ignore .com files by default.
Omega 1.2.13 (2013-01-09):
indexers:
* omindex:
+ Extracting text using external filters now works for filenames containing a
newline character - previously the newline got lost during escaping for the
shell.
+ Fix segfault when -F option without a ':' is passed.
+ Skip a file if we get a read error while calculating the MD5 checksum (used
for duplicate detection) - previously we used a checksum of the file up to
that point.
+ Avoid rereading SVG and Atom files when we calculate their MD5 checksums.
+ Improvement --help output and man page, most notably:
- Say explicitly that --sample-size accepts the same formats as --max-size.
- Note default size limit on files to index is unlimited.
+ When generating a sample for a CSV file, limit the size we pre-allocate to
the CSV file size if that's smaller than the requested sample size, in case
the user sets that limit very high.
omega:
* Fix to decode %-encoded character at the end of the query string.
build system:
* INCLUDES is now deprecated in automake, so use AM_CPPFLAGS instead.
Omega 1.2.12 (2012-06-27):
No changes since 1.2.11 except to bump the version - this release was made to
fix an incorrect library version information update in xapian-core 1.2.11.
Omega 1.2.11 (2012-06-26):
indexers:
* Change HTML parser's handling of multiple tags and of text outside of
to match the behaviour of modern web browsers. (ticket#599)
* omindex:
+ Add command line option to control the size of the document sample stored.
Patch from Mihai Bivol.
+ Rework .xlsx parsing to substitute the shared strings into the positions
they are used in, so that the sample actually matches what appears in the
spreadsheet, and to index calculated cell contents.
+ Improve handling of headers and footers in OpenDocument documents.
+ pdftotext outputs a formfeed between each page, which messes up our "empty
body" check, so trim any trailing formfeeds before this check.
build system:
* Don't explicitly link indirect shared library dependencies on FreeBSD,
OpenBSD, and Solaris.
Omega 1.2.10 (2012-05-09):
indexers:
* Add support for CDATA to HTML/XML parser.
* omindex:
+ Add --max-size option, based on patch from ndaley in ticket#587.
+ Add support for atom feed files, patch from Mihai Bivol in ticket#595.
+ If the document with the highest existing docid before the run was updated,
we were reporting it as "added", but now we correctly report it as
"updated". (Backported from 1.3.0).
+ Catch and report std::exception explicitly, so failing to allocate memory
is no longer reported as "Unknown exception". (Backported from 1.3.0).
* scriptindex:
portability:
* Fix to build with GCC 4.7 by adding cast to rlim_t to fix error about C++11
compatibility (reported by Gaurav Arora).
Omega 1.2.9 (2012-03-08):
documentation:
* docs/overview.html:
+ Document that libmagic is used to determine the MIME type if the extension
isn't known. Partly addresses ticket#569.
+ We now limit time as well as CPU and memory for external filters.
indexers:
* Our HTML parser now ignores sections bracketed by and
, like we already do for .
* omindex: Add more extensions to the default ignore list: bin dat db fon jar
lnk pyc pyd pyo sqlite sqlite3 sqlite-journal tmp ttf
Omega 1.2.8 (2011-12-13):
documentation:
* scriptindex.cc: Add link to http://xapian.org/docs/omega/scriptindex.html to
--help output (and so also to the man page which is generated from this).
* omegascript.html: Add note to discourage use of percentage scores.
indexers:
* omindex:
+ If we don't get any data from an external filter for 5 minutes, give up -
it has probably ended up blocked indefinitely.
+ Improve --help output (and man page which is generated from it). Closes
bug#572.
* scriptindex:
+ If no rules are found in the index script, report an error and give up -
this is inevitably the result of a mistake, and adding empty documents to
the database isn't helpful.
omega:
+ Add new $prettyurl{} command which undoes RFC3986 URL escaping which
doesn't affect semantics in practice. Partly addresses ticket#550.
+ Replace URL decoder with new implementation which handles various corner
cases better. Fixes bug#578.
+ If CGI parameter P has trailing spaces, we now remove them all rather than
leaving one.
templates:
* templates/query: HTML escape topterms.
* templates/godmode: HTML escape the contents of document values.
* templates/query: Don't show the percentage score in the default template.
testsuite:
* Add new urlenctest unit test of URL encoding and decoding.
portability:
* configure: Sync changes from xapian-core: Don't pass -Wshadow for GCC < 4.1;
don't pass -Wstrict-null-sentinel for GCC 4.0.x; only enable symbol
visibility on platforms where it is supported.
packaging:
* xapian-omega.spec: Package outlookmsg2html helper.
Omega 1.2.7 (2011-08-10):
documentation:
* docs/termprefixes.html: Document how to map a user prefix to multiple term
prefixes.
* docs/overview.html: Improve documentation of htdig_noindex.
omega:
* Improve $version output from "Xapian - xapian-omega 1.2.7" to "xapian-omega
1.2.7".
packaging:
* xapian-omega.spec: We're ABI compatible within a release series so make
dependency on xapian-core-libs >= rather than =.
Omega 1.2.6 (2011-06-12):
documentation:
* docs/omegascript.html: Correct the documentation of the colours used by
$highlight{}.
* docs/overview.html: Add using unoconv as more complex example of using
--filter (ticket#324).
templates:
* templates/query:
+ Make search query input type=search.
+ Autofocus the search query input (using HTML autofocus attribute with
Javascript fallback for older browsers). (ticket#544)
portability:
* Fix a compiler warning.
Omega 1.2.5 (2011-04-04):
documentation:
* Add index page which links to all the other documentation pages.
* INSTALL: Copy new Multi-Arch section from xapian-core/INSTALL. Replace VPATH
section with better equivalent from Xapian-core/INSTALL.
* docs/omegascript.html: Minor improvements.
indexers:
* The HTML parser no longer uses an exception to signify it has finished in
the normal case as exceptions are typically costly to handle. In tests,
this made omindex ~0.23% faster when indexing a lot of HTML files.
* omindex:
+ Add --ignore-exclusions option, which will index HTML files despite meta
robots tags, etc - omindex is often used in environments where such
exclusions aren't relevant.
+ Fix to compile with older versions of libmagic which don't have
MAGIC_MIME_TYPE (e.g. on Ubuntu hardy).
+ Tell xls2csv to separate fields with spaces rather than commas, and not to
quote them. Fixes indexing of numeric fields, and means we don't need to
use our CSV parser to get a sample.
+ Add whitespace between chunks of text extracted from Microsoft Office 2007
formats to prevent words in adjacent chunks from being run together.
+ Encode reserved characters in URLs - links to files with names containing
'#' and '?' now work.
+ Handle .xlr extension the same way as .xls (later Microsoft Works versions
apparently produce such files which are really the same format).
+ Index filename extension with new standard prefix E.
+ Just report the mimetype as unknown instead of saying "unknown Office 2007
MIME subtype".
+ Ignore *.css and *.js by default too.
+ Messages reporting skipping files are now more consistent and always report
the filename.
+ New --empty-docs option to allow documents we extract no body text from to
be indexed (existing behaviour), skipped, or reported and then indexed.
omega:
* Fix double Content-Type header in some error reporting situations (regression
introduced in 1.2.4).
* Update $url's URL encoding to follow RFC3986.
* Allow QueryParser flags to be set from OmegaScript (ticket#418). The
FLAG_SPELLING_CORRECTION flag can now be set using
$opt{flag_spelling_correction,1} - the old $opt{spelling,true} way to
enable this flag still works, but it now deprecated.
templates:
* templates/emptydocs,templates/godmode,templates/opensearch,templates/query,
templates/xml: Add missing escaping. Some of these instances may allow
cross-site scripting, so upgrading your templates is recommended, especially
if you have any sensitive cookies set on the domain Omega is running on.
* templates/xml:
+ Try $field{caption} (which is what omindex sets) before $field{title} when
getting a value for the hit tag's title attribute - this is consistent with
how the query template gets the title.
+ Add new 'type' attribute which gives $field{type}.
+ Add 'DBSize' attribute to element.
+ Fix double escaping of matching terms. This is only likely to affect cases
where a matching term contains '&'.
+ Remove support for undocumented HILITECLASS CGI variable. There's no
evidence I can find using Google code search or web search that this has
been used anywhere, and it's difficult to handle escaping it properly in
the face of all the ways it could reasonably be used.
portability:
* Fix to compile on Microsoft Windows (ticket#350).
Omega 1.2.4 (2010-12-19):
documentation:
* Minor documentation improvements.
indexers:
* Some iconv implementations (such as that on Mac OS X) don't handle many of
the commonly seen mis-punctuated charset names (e.g. UTF16, UTF_16). We now
check for this if iconv fails, fix up the charset name, and retry.
* The built-in character encoding converter now handles spaces in charset
names.
* Use O_NOATIME if available and either the file is owned by the current euid,
or the current euid is 0 (i.e. we're running as root). This avoids updating
the access time of files we index which saves time. Fixes ticket#222.
* Report get_description() for Xapian exceptions, which provides additional
information above get_msg().
* Add boolean terms with add_boolean_term() so they get wdf of 0 and don't
contribute to document length.
* omindex:
+ Escape wildcard patterns being passed to unzip - in the unlikely event that
one of these matched files in or under the current directory, we might fail
to extract all the files we wanted to.
+ Add explicit support for indexing CSV files (better samples than from
using '-Mcsv:text/plain').
+ Add support for indexing .msg files from Microsoft Outlook (using the Perl
module Email::Outlook::Message. (ticket#334)
+ Improve --help for --mime-type option.
+ Optionally use libmagic to detect MIME types for files for which we have no
extension mapping, which allows us to handle files with a misleading
extension, or no extension at all. (ticket#114)
+ Add new --filter option which allows the user to specify new filters
provided they return UTF-8 text on stdout.
+ If a filter command isn't installed, previously we wouldn't try it again
for the same file extension - now we won't try it again for the same
mime-type.
+ Index the leafname of the file (without any extension) as extra keywords.
+ Extract author from HTML, OpenDocument, and PDF files. Index it with an A
prefix, and add it as a field.
+ Add support for indexing text and metadata from SVG files.
+ Extract metadata from Microsoft Office 2007 file formats.
+ Index text in headers and footers for .odt and .docx files.
+ Use the CSV parser to generate a nicer sample for files of type
application/vnd.ms-excel.
+ Add support for indexing Debian and RPM package files (ticket#493).
+ Make the memory limit for filter processes the size of physical memory,
which is a little less arbitrary than 7/8 of this value (ticket#424).
+ Under --duplicate=ignore, fix so that old documents which aren't seen get
deleted, which wasn't implemented before (to suppress this deletion, pass
-p as well).
+ Rename the short option for --version from -v to -V for consistency with
scriptindex and many other packages, and to free up -v as the short option
for --verbose. For backward compatibility, "omindex -v" is handled
specially and still reports the version.
+ Add --verbose option, and disable the less interesting output unless it is
specified.
+ Deprecate "--preserve-nonduplicates" in favour of new long option
"--no-delete" which does the same thing, but has a clearer name.
+ The deletion of documents pass at the end of indexing is now more
efficient. We track how many documents in the database we haven't seen so
we can stop once we've found them all (a particularly big improvement if
there are no documents to delete), and we now use a PostingIterator over
all documents which avoids needing to catch an exception for every gap in
the used document ids.
+ Quietly ignore files with mimetype set to "ignore". The initial list of
extensions set to ignore is: .a .dll .dylib .exe .lib .o .obj .so
+ Index file owner and read permissions, to allow finding documents with a
particular owner, and so searches can be restricted to documents a user is
able to read.
+ Add file size as a document value, so you can sort on it and filter by it.
* scriptindex:
+ Fix file descriptor leak if the LOADFILE action is used on something which
isn't a file.
omega:
* Make sure we write out HTTP headers when reporting an error early on.
* Extend $field to take an optional DOCID argument, rather than always using
the context from $hitlist.
* Add new $emptydocs command which returns a list of documents with doclength
zero.
* Add support for size: range filtering. Currently the end points of the range
have to be specified in bytes (e.g. size:102400..204800 for 100-200KB).
templates:
* templates/emptydocs: New template which lists documents with doclength zero.
build system:
* configure: Probe for any options needed to enable large file support.
Handling files >= 2GB isn't especially useful, but more importantly this is
needed to allow omindex to index files on filing systems with 64 bit inodes
on some platforms (e.g. 32-bit Linux).
* Use -no-undefined on platforms which need it to dynamically link such as
cygwin (need to do this taken from ticket#282).
portability:
* Fix to compile with Sun C++.
Omega 1.2.3 (2010-08-24):
documentation:
* docs/termprefixes.html: Update "flint and quartz" to "flint and chert" as
quartz is no longer supported. Give exact term length limit for flint and
chert.
packaging:
* xapian-omega.spec: Don't run autoreconf - it's no longer required.
Omega 1.2.2 (2010-06-27):
portability:
* Apply getopt portability fixes from xapian-core 1.2.0, fixing build failures
on Mac OS X (and probably some other platforms with non-GNU getopt
implementations). (ticket#469)
Omega 1.2.1 (2010-06-22):
This release includes all changes from 1.0.21 which are relevant.
Omega 1.2.0 (2010-04-28):
This release includes all changes from 1.0.20 which are relevant.
build system:
* configure: Tell libtool not to link in deplibs on platforms where we know
they aren't needed.
* configure: On Linux, extract the library search path from ldconfig which
gives us the default entries reliably.
Omega 1.1.5 (2010-04-15):
This release includes all changes from 1.0.19 which are relevant.
Omega 1.1.4 (2010-02-15):
This release includes all changes from 1.0.18 which are relevant.
omega:
* Use the optimised integer to string conversion routines from xapian-core.
Omega 1.1.3 (2009-11-18):
This release includes all changes from 1.0.15-1.0.17 which are relevant.
templates:
* templates/query: If JavaScript is available, convert $field{modtime} to a
string on the client-side so that the timezone is correct. If JavaScript
isn't available, fall back to the existing behaviour of using UTC.
(ticket#314)
build system:
* configure: Default to looking for xapian-config-1.1 unless XAPIAN_CONFIG is
specified.
Omega 1.1.2 (2009-07-23):
This release includes all changes from 1.0.14 which are relevant.
indexers:
* omindex:
+ Handle the "macroenabled" versions of MS Office 2007 files too
(ticket#290).
+ Extract pptx notesSlides and comments, if present. (ticket#290).
Omega 1.1.1 (2009-06-09):
This release includes all changes from 1.0.13 which are relevant.
indexers:
* omindex:
+ Check the last modification time of files before reindexing (ticket#342).
+ Add "--spelling" option to index spelling correction data.
* scriptindex:
+ Add new "spell" action for indexing spelling correction data (ticket#296).
omega:
* Add $suggestion and $opt{spelling} to provide access to spelling correction
(ticket#296).
* Add $opt{weighting} to allow the weighting scheme and parameters to be
specified (ticket#298).
* If SERVER_PROTOCOL in the environment is set to INCLUDED, then our output is
being included in another page (e.g. using SSI) so suppress the output of any
HTTP headers.
templates:
* templates/query: Offer any spelling correction QueryParser gives.
build system:
* configure: Sync warning flags used with GCC with xapian-core apart from
-Woverloaded-virtual which fires for MyHtmlParser::parse_html(). That
probably should be tidied up at some point, but not right now.
Omega 1.1.0 (2009-04-23):
indexers:
* scriptindex:
+ Make deprecated "index=nopos" an error.
omega:
* New OmegaScript command $transform{} which performs regular expression
substitutions using the PCRE library (which is now required to build Omega).
(ticket#231)
build system:
* The build system is now bootstrapped with newer versions of autoconf and
libtool which should produce smaller files and speed up configure and
make.
Omega 1.0.23 (2011-01-14):
indexers:
* omindex:
+ Escape wildcard patterns being passed to unzip - in the unlikely event that
one of these matched files in or under the current directory, we might fail
to extract all the files we wanted to when indexing document formats like
OpenDocument which use a zip file container.
+ The parser for OpenDocument metadata wasn't initialising its "state" field.
Often you'd be lucky and it would be initialised to zero, but this could
have caused misparsing of metadata in some cases.
* scriptindex: Fix file descriptor leak if the LOADFILE action is used on
something that isn't a file.
* If fstat() fails when trying to load a file, preserve the errno value from
the fstat call to report to the user.
portability:
* configure: Probe for any options needed to enable large file support.
Handling files >= 2GB isn't especially useful, but more importantly this is
needed to allow omindex to index files on filing systems with 64 bit inodes
on some platforms (e.g. 32-bit Linux).
* Add -no-undefined to AM_LDFLAGS on platforms which need it to dynamically
link such as cygwin (need to do this taken from ticket#282).
Omega 1.0.22 (2010-10-03):
portability:
* Fix to compile with Sun C++.
Omega 1.0.21 (2010-05-18):
portability:
* Fix build failure in freemem.cc on Microsoft Windows.
Omega 1.0.20 (2010-04-27):
portability:
* Fix build failure on Mac OS X and possibly some other platforms (regression
caused by fix for getopt-related warnings on Cygwin in 1.0.19).
Omega 1.0.19 (2010-04-15):
portability:
* Fix getopt-related warning on Cygwin.
Omega 1.0.18 (2010-02-14):
indexers:
* Make the default charset "utf-8" not "UTF-8" as we lower case explicitly
specified character sets to compare to see if we need to reparse. Previously
XML documents which explicitly specified their character set as UTF-8 would
cause needless restart or the parser.
* omindex:
+ Increase the wdf boost for the document title from 2 to 5, since 2 isn't
really enough.
* scriptindex:
+ Don't abort with "Unknown Exception" if indexing is disallowed or we hit
for a document which had an overridden character set. Fixes
ticket#410.
Omega 1.0.17 (2009-11-18):
indexers:
* omindex:
+ On Linux, change the memory limit on external filters to use _SC_PHYS_PAGES
since _SC_AVPHYS_PAGES excludes pages used by the OS cache and so will
often report a really low value. Fixes Debian bug#548987 and ticket#358.
+ Fix likely crash when reading output from external filter program if read()
is interrupted by a signal.
+ Fix potential crash when indexing PostScript files (fixed by using delete[]
(not delete) for array allocated by new[]).
testsuite:
* utf8converttest: Charset "8859_1" isn't understood by Solaris libiconv, and
isn't a standard charset name, so just test it when using our built-in
converter and GNU libc.
portability:
* Fix build failure on Mac OS X 10.6.
* Also check for socketpair() in -lxnet if it isn't found without, which
enables resource limits on external filter programs called by omindex on
Solaris, and possibly some other platforms. Fixes ticket#412.
Omega 1.0.16 (2009-09-10):
* omega: Fix cross-site scripting vulnerability in reporting of exceptions
(CVE-2009-2947).
Omega 1.0.15 (2009-08-26):
general:
* omegascript.vim: The list of OmegaScript commands in the vim mode was rather
out of date, and a few commands were misclassified. Fix both problems and
avoid future recurrences by automatically generating those lists from the
command list in query.cc.
documentation:
* omegascript.html: Document that $date uses UTC. (ticket#314)
templates:
* query: Link to "xapian.org" rather than "www.xapian.org".
* inc/toptermsjs: Use double-quotes rather than single quotes for parameter
values on the