Omega 1.2.24 (2016-09-16): build system: * Drop unused configure check for symbol visibility. Omega 1.2.23 (2016-03-28): documentation: * Update links to Xapian website and trac to use https, which is now supported, thanks to James Aylett. indexers: * Fix HTML/XML entity decoding to be O(n) not O(n²) - processing HTML/XML with a lot of entities is now much faster. templates: * Remove unused country code to name maps. These were intended as examples, but they aren't very useful as such, and really just bloat the templates needlessly. Omega 1.2.22 (2015-12-29): documentation: * Stop maintaining ChangeLog files. They make merging patches harder, and stop 'git cherry-pick' from working as it should. The git repo history should be sufficient for complying with GPLv2 2(a). * Clarify help text for omindex --mime-type option. * docs/omegascript.rst: + Fix documentation of $last to say it's the MSet index *one beyond* the end of the current page. Reported by Andrew Chilton. + Clarify that $split and $substr work in bytes. Previously we said "characters" which could be taken as meaning they work with UTF-8 characters. + Update documentation for $filters - it was missing these CGI parameters from the list of those serialised: COLLAPSE, DOCIDORDER, SORT, SORTREVERSE, SORTAFTER + Explicitly note user can use $setmap to create their own maps. * docs/overview.rst: + SVG extraction is built-in too. + Expand paragraph about command `false`. Note the versions where explicit support was added, and that this will also work with any version on Unix, where `false` is a command. + Document `cdb_dir`. * docs/cgiparams.rst: Document behaviour if xDB is not set. * Change "characters" to "bytes" in a few places to clarify that we don't mean Unicode code points. indexers: * omindex: + Add '--title-size' option. + Handle .oft the same way as .msg - it's some sort of template email, and has essentially the same format. omega: * Make $querydescription ensure the match has been run, so that it includes filters. * Avoid $allterms, $cgilist, $filterterms and $terms being O(n²) in the number of items in the returned list. * If xFILTERS is not set, don't force the first page as that's unhelpful if someone fails to set it in their template. * When environment variable SERVER_PROTOCOL is set to INCLUDED (as it is when we're being included in a page), we already suppress the HTTP headers, but now we suppress the blank line after the header too. * Support option flag_cjk_ngram if built against xapian-core >= 1.2.22. testsuite: * Add test coverage for parsing of HTML entities. build system: * Fix error reporting if PCRE isn't installed. Fixes #693, reported by lhz7370. portability: * Avoid warning when building with glibc >= 2.21. * Don't provide our own implementation of sleep() under __WIN32__ if there already is one - mingw provides one, and in some situations it seems to clash with ours. Reported to xapian-discuss by John Alveris. * Stop trying to use O_STREAMING - the patch to implement it was never merged into the Linux kernel, and I can't find any evidence that other platforms implement it. The constant value O_STREAMING used now seems to be used for the part of O_SYNC which isn't covered by O_DSYNC, which seems likely to hurt performance if anything. Omega 1.2.21 (2015-05-20): documentation: * docs/overview.rst: Document 'E' prefixed boolean terms for filtering by extension (see #668, reported by bramvdh). * docs/encodings.rst: Add a document about character encoding, as suggested by James Aylett in #550. indexers: * omindex: + outlookmsg2html: Fix handling of message/rfc822 subparts. omega: * $prettyurl now decodes valid UTF-8 sequences, and some additional ASCII characters in the path part: []@!$&'()*+.;= (Fixes #550 and #644, reported by catkin and terencz.) * $prettyurl now leaves the query and fragment parts of the URL alone and won't decode an escaped "/" (omindex doesn't create URLs with any of these, so we only risk breaking other URLs which have them). * Drop compilation date and time from output when run from the command line - they prevent reproducible builds and the version number is sufficient information. templates: * templates/query: When listing matching terms, don't make the commas italic. * templates/query: Eliminate blank line before . * templates/xml: Add XML declaration. * templates/godmode: Specify charset utf-8 in the content-type. build system: * Link test programs with libtool's '-no-install' or '-no-fast-install', like we already do in xapian-core, which means that libtool doesn't need to generate shell script wrappers for them on most platforms. portability: * Add spaces between literal strings and macros which expand to literal strings for C++11 compatibility. * Remove 'register' as it's deprecated and clang spits out warnings because of that. Any modern compiler likely just ignores it as an optimisation hint anyway. Omega 1.2.20 (2015-03-04): documentation: * docs/cgiparams.rst: Improve wording of docs for SORT parameter. * docs/omegascript.rst: Update documentation references to DATE1, DATE2, and DAYSMINUS which were renamed in 0.6.x and the compatibility aliases removed in 1.0.0. indexers: * omindex: + Ignore extensions .msi and .msp, which are Microsoft installer files, but which libmagic sometimes incorrectly identifies as application/msword. + Interpret a command of "false" in "--filter" as meaning to ignore files with that MIME type. omega: * Handle CGI parameter [=0 as [=1. templates: * templates/xml: Update handling of DATE1, DATE2 and DAYSMINUS which were renamed in 0.6.x and the compatibility aliases removed in 1.0.0. build system: * configure: Use pkg-config in preference to determine flags needed to compile and link with PCRE, as this will just work when cross-compiling (at least under MXE). * configure: Define MINGW_HAS_SECURE_API under mingw to get _putenv_s() declared in stdlib.h. * Enable automake option 'subdir-objects' to avoid warning from newer automake. portability: * Avoid doing link tests with libmagic in configure as they fail on mingw due to not automatically picking up libraries which libmagic itself depends on. Omega 1.2.19 (2014-10-21): documentation: * docs/overview.rst: Note that pdftotext is part of poppler as well as xpdf. (Noted by Paul Wise) Omega 1.2.18 (2014-06-22): indexers: * omindex: + Work around libmagic returning a MIME content-type of "Composite Document File V2 Document[...]" or "application/CDFV2-corrupt" by returning a more suitable filetype based on looking at the file's extension. + The starting URL wasn't previously URL encoded. In 1.3.2, this will be fixed by URL encoding it as we do for the rest of the path, for the 1.2 branch we only URL encode it if it contains a character <= 31 or at least one of '#', '%', ':' or '?'. This avoids a one-off reindex of every document in the database in cases which work OK in practice. + When we skip a file because it exceeds the configured size limit, include that size limit in the message. omega: * Add support for setting the query expansion scheme to use. portability: * Don't compile in unixperm.cc - it isn't currently used, and it fails to build with mingw. (fixes #635, reported by Alexis Denis) * Fix warning when built with GCC 4.7.2 using -Os. * Removed unused inline function, fixing compiler warning. Omega 1.2.17 (2014-01-29): documentation: * docs/overview.html: Add Abiword as an example use of --filter, based on patch from Frank J Bruzzaniti (fixes#383). portability: * Fix "no previous declaration" warning on platforms which don't have mkdtemp(). Omega 1.2.16 (2013-12-04): indexers: * omindex: + Fix off-by-one when finding documents to delete which would sometimes cause omindex to fail to delete documents from the database when they weren't refound during an index update. + Decode dates in xlsx files. + Ignore extensions 'adm', 'cur', and 'ico' by default. + Group-readable files which are owner-readable but not world-readable should still get a "readable by owner" term added. Reported by Emmanuel Garette. build system: * Compress source tarballs with xz instead of gzip. * configure: Sync compiler warning flag machinery against xapian-core. The changes are special handling for clang, passing -fshow-column where supported, and handling for new warning flags in GCC 4.6 and 4.7. Omega 1.2.15 (2013-04-16): omega: * Don't pointlessly link utf8convert.o into the omega CGI. Omega 1.2.14 (2013-03-14): indexers: * omindex: + Correct "max" -> "min" when reserving space for shared strings in .xlsx files. This just means we now reserve a more appropriate amount of space to start with. + Ignore .com files by default. Omega 1.2.13 (2013-01-09): indexers: * omindex: + Extracting text using external filters now works for filenames containing a newline character - previously the newline got lost during escaping for the shell. + Fix segfault when -F option without a ':' is passed. + Skip a file if we get a read error while calculating the MD5 checksum (used for duplicate detection) - previously we used a checksum of the file up to that point. + Avoid rereading SVG and Atom files when we calculate their MD5 checksums. + Improvement --help output and man page, most notably: - Say explicitly that --sample-size accepts the same formats as --max-size. - Note default size limit on files to index is unlimited. + When generating a sample for a CSV file, limit the size we pre-allocate to the CSV file size if that's smaller than the requested sample size, in case the user sets that limit very high. omega: * Fix to decode %-encoded character at the end of the query string. build system: * INCLUDES is now deprecated in automake, so use AM_CPPFLAGS instead. Omega 1.2.12 (2012-06-27): No changes since 1.2.11 except to bump the version - this release was made to fix an incorrect library version information update in xapian-core 1.2.11. Omega 1.2.11 (2012-06-26): indexers: * Change HTML parser's handling of multiple
tags and of text outside of to match the behaviour of modern web browsers. (ticket#599) * omindex: + Add command line option to control the size of the document sample stored. Patch from Mihai Bivol. + Rework .xlsx parsing to substitute the shared strings into the positions they are used in, so that the sample actually matches what appears in the spreadsheet, and to index calculated cell contents. + Improve handling of headers and footers in OpenDocument documents. + pdftotext outputs a formfeed between each page, which messes up our "empty body" check, so trim any trailing formfeeds before this check. build system: * Don't explicitly link indirect shared library dependencies on FreeBSD, OpenBSD, and Solaris. Omega 1.2.10 (2012-05-09): indexers: * Add support for CDATA to HTML/XML parser. * omindex: + Add --max-size option, based on patch from ndaley in ticket#587. + Add support for atom feed files, patch from Mihai Bivol in ticket#595. + If the document with the highest existing docid before the run was updated, we were reporting it as "added", but now we correctly report it as "updated". (Backported from 1.3.0). + Catch and report std::exception explicitly, so failing to allocate memory is no longer reported as "Unknown exception". (Backported from 1.3.0). * scriptindex: portability: * Fix to build with GCC 4.7 by adding cast to rlim_t to fix error about C++11 compatibility (reported by Gaurav Arora). Omega 1.2.9 (2012-03-08): documentation: * docs/overview.html: + Document that libmagic is used to determine the MIME type if the extension isn't known. Partly addresses ticket#569. + We now limit time as well as CPU and memory for external filters. indexers: * Our HTML parser now ignores sections bracketed by and , like we already do for . * omindex: Add more extensions to the default ignore list: bin dat db fon jar lnk pyc pyd pyo sqlite sqlite3 sqlite-journal tmp ttf Omega 1.2.8 (2011-12-13): documentation: * scriptindex.cc: Add link to http://xapian.org/docs/omega/scriptindex.html to --help output (and so also to the man page which is generated from this). * omegascript.html: Add note to discourage use of percentage scores. indexers: * omindex: + If we don't get any data from an external filter for 5 minutes, give up - it has probably ended up blocked indefinitely. + Improve --help output (and man page which is generated from it). Closes bug#572. * scriptindex: + If no rules are found in the index script, report an error and give up - this is inevitably the result of a mistake, and adding empty documents to the database isn't helpful. omega: + Add new $prettyurl{} command which undoes RFC3986 URL escaping which doesn't affect semantics in practice. Partly addresses ticket#550. + Replace URL decoder with new implementation which handles various corner cases better. Fixes bug#578. + If CGI parameter P has trailing spaces, we now remove them all rather than leaving one. templates: * templates/query: HTML escape topterms. * templates/godmode: HTML escape the contents of document values. * templates/query: Don't show the percentage score in the default template. testsuite: * Add new urlenctest unit test of URL encoding and decoding. portability: * configure: Sync changes from xapian-core: Don't pass -Wshadow for GCC < 4.1; don't pass -Wstrict-null-sentinel for GCC 4.0.x; only enable symbol visibility on platforms where it is supported. packaging: * xapian-omega.spec: Package outlookmsg2html helper. Omega 1.2.7 (2011-08-10): documentation: * docs/termprefixes.html: Document how to map a user prefix to multiple term prefixes. * docs/overview.html: Improve documentation of htdig_noindex. omega: * Improve $version output from "Xapian - xapian-omega 1.2.7" to "xapian-omega 1.2.7". packaging: * xapian-omega.spec: We're ABI compatible within a release series so make dependency on xapian-core-libs >= rather than =. Omega 1.2.6 (2011-06-12): documentation: * docs/omegascript.html: Correct the documentation of the colours used by $highlight{}. * docs/overview.html: Add using unoconv as more complex example of using --filter (ticket#324). templates: * templates/query: + Make search query input type=search. + Autofocus the search query input (using HTML autofocus attribute with Javascript fallback for older browsers). (ticket#544) portability: * Fix a compiler warning. Omega 1.2.5 (2011-04-04): documentation: * Add index page which links to all the other documentation pages. * INSTALL: Copy new Multi-Arch section from xapian-core/INSTALL. Replace VPATH section with better equivalent from Xapian-core/INSTALL. * docs/omegascript.html: Minor improvements. indexers: * The HTML parser no longer uses an exception to signify it has finished in the normal case as exceptions are typically costly to handle. In tests, this made omindex ~0.23% faster when indexing a lot of HTML files. * omindex: + Add --ignore-exclusions option, which will index HTML files despite meta robots tags, etc - omindex is often used in environments where such exclusions aren't relevant. + Fix to compile with older versions of libmagic which don't have MAGIC_MIME_TYPE (e.g. on Ubuntu hardy). + Tell xls2csv to separate fields with spaces rather than commas, and not to quote them. Fixes indexing of numeric fields, and means we don't need to use our CSV parser to get a sample. + Add whitespace between chunks of text extracted from Microsoft Office 2007 formats to prevent words in adjacent chunks from being run together. + Encode reserved characters in URLs - links to files with names containing '#' and '?' now work. + Handle .xlr extension the same way as .xls (later Microsoft Works versions apparently produce such files which are really the same format). + Index filename extension with new standard prefix E. + Just report the mimetype as unknown instead of saying "unknown Office 2007 MIME subtype". + Ignore *.css and *.js by default too. + Messages reporting skipping files are now more consistent and always report the filename. + New --empty-docs option to allow documents we extract no body text from to be indexed (existing behaviour), skipped, or reported and then indexed. omega: * Fix double Content-Type header in some error reporting situations (regression introduced in 1.2.4). * Update $url's URL encoding to follow RFC3986. * Allow QueryParser flags to be set from OmegaScript (ticket#418). The FLAG_SPELLING_CORRECTION flag can now be set using $opt{flag_spelling_correction,1} - the old $opt{spelling,true} way to enable this flag still works, but it now deprecated. templates: * templates/emptydocs,templates/godmode,templates/opensearch,templates/query, templates/xml: Add missing escaping. Some of these instances may allow cross-site scripting, so upgrading your templates is recommended, especially if you have any sensitive cookies set on the domain Omega is running on. * templates/xml: + Try $field{caption} (which is what omindex sets) before $field{title} when getting a value for the hit tag's title attribute - this is consistent with how the query template gets the title. + Add new 'type' attribute which gives $field{type}. + Add 'DBSize' attribute to