Xapian - Perl frontend to the Xapian C++ search library.

SYNOPSIS

  use Xapian;

  my $parser = Xapian::QueryParser->new();
  my $query = $parser->parse_query( '[QUERY STRING]' );

  my $db = Xapian::Database->new( '[DATABASE DIR]' );
  my $enq = $db->enquire();

  printf "Running query '%s'\n", $query->get_description();

  $enq->set_query( $query );
  my @matches = $enq->matches(0, 10);

  print scalar(@matches) . " results found\n";

  foreach my $match ( @matches ) {
    my $doc = $match->get_document();
    printf "ID %d %d%% [ %s ]\n", $match->get_docid(), $match->get_percent(), $doc->get_data();
  }

DESCRIPTION

This module is a pretty-much complete wrapping of the Xapian C++ API. The main omissions are features which aren't useful to wrap for Perl, such as Xapian::UTF8Iterator.

This module is generated using SWIG. It is intended as a replacement for the older Search::Xapian module which is easier to keep up to date and which more completely wraps the C++ API. It is largely compatible with Search::Xapian, but see the COMPATIBILITY section below if you have code using Search::Xapian which you want to get working with this new module.

There are some gaps in the POD documentation for wrapped classes, but you can read the Xapian C++ API documentation at https://xapian.org/docs/apidoc/html/annotated.html for details of these. Alternatively, take a look at the code in the examples and tests.

If you want to use Xapian and the threads module together, make sure you're using Perl >= 5.8.7 as then Xapian uses CLONE_SKIP to make sure that the perl wrapper objects aren't copied to new threads - without this the underlying C++ objects can get destroyed more than once which leads to undefined behaviour.

If you encounter problems, or have any comments, suggestions, patches, etc please email the Xapian-discuss mailing list (details of which can be found at https://xapian.org/lists).

COMPATIBILITY

This module is mostly compatible with Search::Xapian. The following are known differences, with details of how to write code which works with both.

Search::Xapian overloads stringification - e.g. "$query" is equivalent to $query->get_description(), while "$termiterator" is equivalent to $termiterator->get_term(). This module doesn't support overloaded stringification, so you should instead explicitly call the method you want. The technical reason for this change is that stringification is hard to support in SWIG-generated bindings, but this context-sensitive stringification where the operation performed depends on the object type seems unhelpful in hindsight anyway.

Search::Xapian overloads conversion to an integer for some classes - e.g. 0+$positioniterator is equivalent to $positioniterator->get_termpos while 0+$postingiterator is equivalent to $postingiterator->get_docid. This module doesn't provide these overloads so you should instead explicitly call the method you want. As above, we think this context-sensitive behaviour wasn't helpful in hindsight.

This module is fussier about whether a passed scalar value is a string or an integer than Search::Xapian, so e.g. Xapian::Query->new(2001) will fail but the equivalent worked with Search::Xapian. If $term might not be a string use Xapian::Query->new("$term") to ensure it is converted to a string. Whether explicit stringification is needed depends on whether the scalar is marked as having a string representation by Perl; prior to Perl 5.36.0 retrieving the string value of an integer could set this flag, but that's no longer the case in Perl 5.36.0 and later. The simple rule is to always explicitly stringify if the value might be numeric.

This behaviour isn't very Perlish, but is likely to be hard to address universally as it comes from SWIG. Let us know if you find particular places where it's annoying and we can look at addressing those.

Both this module and Search::Xapian support passing a Perl sub (which can be anonymous) for the functor classes MatchDecider and ExpandDecider. In some cases Search::Xapian accepts a string naming a Perl sub, but this module never accepts this. Instead of passing "::mymatchdecider", pass \&mymatchdecider which will work with either module. If you really want to dynamically specify the function name, you can pass sub {eval "&$dynamicmatchdecider"}.

Search::Xapian provides a PerlStopper class which is supposed to be subclassable in Perl to implement your own stopper, but this mechanism doesn't actually seem to work. This module instead supports user-implemented stoppers by accepting a Perl sub in place of a Stopper object.

Importing Either Module

If you want your code to use either this module or Search::Xapian depending what's installed, then instead of use Search::Xapian (':all'); you can use:

  BEGIN {
    eval {
      require Xapian;
      Xapian->import(':all');
      Xapian::search_xapian_compat();
    };
    if ($@) {
      require Search::Xapian;
      Search::Xapian->import(':all');
    }
  }

If you just use Search::Xapian; then the import() calls aren't needed.

The Xapian::search_xapian_compat() call sets up aliases in the Search::Xapian namespace so you can write code which refers to Search::Xapian but can actually use this module instead.

EXPORT

None by default.

:db

DB_OPEN

Open a database, fail if database doesn't exist.

DB_CREATE

Create a new database, fail if database exists.

DB_CREATE_OR_OPEN

Open an existing database, without destroying data, or create a new database if one doesn't already exist.

DB_CREATE_OR_OVERWRITE

Overwrite database if it exists.

:ops

OP_AND

Match if both subqueries are satisfied.

OP_OR

Match if either subquery is satisfied.

OP_AND_NOT

Match if left but not right subquery is satisfied.

OP_XOR

Match if left or right, but not both queries are satisfied.

OP_AND_MAYBE

Match if left is satisfied, but use weights from both.

OP_FILTER

Like OP_AND, but only weight using the left query.

OP_NEAR

Match if the words are near each other. The window should be specified, as a parameter to Xapian::Query->new(), but it defaults to the number of terms in the list.

OP_PHRASE

Match as a phrase (All words in order).

OP_ELITE_SET

Select an elite set from the subqueries, and perform a query with these combined as an OR query.

OP_VALUE_RANGE

Filter by a range test on a document value.

:qpflags

FLAG_DEFAULT

This gives the QueryParser default flag settings, allowing you to easily add flags to the default ones.

FLAG_BOOLEAN

Support AND, OR, etc and bracketted subexpressions.

FLAG_LOVEHATE

Support + and -.

FLAG_PHRASE

Support quoted phrases.

FLAG_BOOLEAN_ANY_CASE

Support AND, OR, etc even if they aren't in ALLCAPS.

FLAG_WILDCARD

Support right truncation (e.g. Xap*).

FLAG_PURE_NOT

Allow queries such as 'NOT apples'.

These require the use of a list of all documents in the database which is potentially expensive, so this feature isn't enabled by default.

FLAG_PARTIAL

Enable partial matching.

Partial matching causes the parser to treat the query as a "partially entered" search. This will automatically treat the final word as a wildcarded match, unless it is followed by whitespace, to produce more stable results from interactive searches.

FLAG_SPELLING_CORRECTION
FLAG_SYNONYM
FLAG_ACCUMULATE
FLAG_AUTO_SYNONYMS
FLAG_AUTO_MULTIWORD_SYNONYMS
FLAG_CJK_NGRAM
FLAG_NO_POSITIONS

:qpstem

STEM_ALL

Stem all terms.

STEM_ALL_Z

Stem all terms and add a "Z" prefix.

STEM_NONE

Don't stem any terms.

STEM_SOME

Stem some terms, in a manner compatible with Omega (capitalised words and those in phrases aren't stemmed).

STEM_SOME_FULL_POS

Like STEM_SOME but also store term positions for stemmed terms.

:enq_order

ENQ_ASCENDING

docids sort in ascending order (default)

ENQ_DESCENDING

docids sort in descending order

ENQ_DONT_CARE

docids sort in whatever order is most efficient for the backend

:standard

Standard is db + ops + qpflags + qpstem

Version functions

major_version

Returns the major version of the Xapian C++ library being used. E.g. for Xapian 1.4.15 this would return 1.

minor_version

Returns the minor version of the Xapian C++ library being used. E.g. for Xapian 1.4.15 this would return 4.

revision

Returns the revision of the Xapian C++ library being used. E.g. for Xapian 1.4.15 this would return 15. In a stable release series, Xapian libraries with the same minor and major versions are usually ABI compatible, so this often won't match the third component of $Xapian::VERSION (which is the version of the Xapian wrappers).

Numeric encoding functions

sortable_serialise NUMBER

Convert a floating point number to a string, preserving sort order.

This method converts a floating point number to a string, suitable for using as a value for numeric range restriction, or for use as a sort key.

The conversion is platform independent.

The conversion attempts to ensure that, for any pair of values supplied to the conversion algorithm, the result of comparing the original values (with a numeric comparison operator) will be the same as the result of comparing the resulting values (with a string comparison operator). On platforms which represent doubles with the precisions specified by IEEE_754, this will be the case: if the representation of doubles is more precise, it is possible that two very close doubles will be mapped to the same string, so will compare equal.

Note also that both zero and -zero will be converted to the same representation: since these compare equal, this satisfies the comparison constraint, but it's worth knowing this if you wish to use the encoding in some situation where this distinction matters.

Handling of NaN isn't (currently) guaranteed to be sensible.

sortable_unserialise SERIALISED_NUMBER

Convert a string encoded using sortable_serialise back to a floating point number.

This expects the input to be a string produced by sortable_serialise(). If the input is not such a string, the value returned is undefined (but no error will be thrown).

The result of the conversion will be exactly the value which was supplied to sortable_serialise() when making the string on platforms which represent doubles with the precisions specified by IEEE_754, but may be a different (nearby) value on other platforms.

TODO

Documentation

Add POD documentation for all classes, where possible just adapted from Xapian docs.

Unwrapped classes

The following Xapian classes are not yet wrapped: ErrorHandler, user-defined Weight subclasses.

CREDITS

These SWIG-generated Perl bindings were originally implemented by Kosei Moriyama in GSoC 2009, and made their debut in the 1.2.4 release.

They take a lot of inspiration and some code from Search::Xapian, a set of hand-written XS bindings, originally written by Alex Bowley, and later maintained by Olly Betts.

Search::Xapian owed thanks to Tye McQueen <tye@metronet.com> for explaining the finer points of how best to write XS frontends to C++ libraries, and James Aylett <james@tartarus.org> for clarifying the less obvious aspects of the Xapian API. Patches for wrapping missing classes and other things were contributed by Olly Betts, Tim Brody, Marcus Ramberg, Peter Karman, Benjamin Smith, Rusty Conover, Frank Lichtenheld, Henry Combrinck, Jess Robinson, David F. Skoll, Dave O'Neill, Andreas Marienborg, Adam Sjøgren, Dmitry Karasik, and Val Rosca.

AUTHOR

Please report any bugs/suggestions to <xapian-discuss@lists.xapian.org> or use the Xapian bug tracker https://xapian.org/bugs. Please do NOT use the CPAN bug tracker or mail contributors individually.

LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

Xapian::BM25Weight, Xapian::BoolWeight, Xapian::Database, Xapian::Document, Xapian::Enquire, Xapian::MultiValueSorter, Xapian::PositionIterator, Xapian::PostingIterator, Xapian::Query, Xapian::QueryParser, Xapian::Stem, Xapian::TermGenerator, Xapian::TermIterator, Xapian::TradWeight, Xapian::ValueIterator, Xapian::Weight, Xapian::WritableDatabase, and https://xapian.org/.