xapian-core  1.4.27
Public Types | Public Member Functions | Static Public Attributes | List of all members
Xapian::Enquire Class Reference

This class provides an interface to the information retrieval system for the purpose of searching. More...

#include <enquire.h>

Public Types

enum  docid_order { ASCENDING = 1 , DESCENDING = 0 , DONT_CARE = 2 }
 Ordering of docids. More...
 

Public Member Functions

 Enquire (const Enquire &other)
 Copying is allowed (and is cheap).
 
void operator= (const Enquire &other)
 Assignment is allowed (and is cheap).
 
 Enquire (const Database &database)
 Create a Xapian::Enquire object.
 
 Enquire (const Database &database, ErrorHandler *errorhandler_)
 Create a Xapian::Enquire object.
 
 ~Enquire ()
 Close the Xapian::Enquire object.
 
void set_query (const Xapian::Query &query, Xapian::termcount qlen=0)
 Set the query to run.
 
const Xapian::Queryget_query () const
 Get the current query.
 
void add_matchspy (MatchSpy *spy)
 Add a matchspy.
 
void clear_matchspies ()
 Remove all the matchspies.
 
void set_weighting_scheme (const Weight &weight_)
 Set the weighting scheme to use for queries.
 
void set_expansion_scheme (const std::string &eweightname_, double expand_k_=1.0) const
 Set the weighting scheme to use for expansion.
 
void set_collapse_key (Xapian::valueno collapse_key, Xapian::doccount collapse_max=1)
 Set the collapse key to use for queries.
 
void set_docid_order (docid_order order)
 Set sort order for document IDs.
 
void set_cutoff (int percent_cutoff, double weight_cutoff=0)
 Set the percentage and/or weight cutoffs.
 
void set_sort_by_relevance ()
 Set the sorting to be by relevance only.
 
void set_sort_by_value (Xapian::valueno sort_key, bool reverse)
 Set the sorting to be by value only.
 
void set_sort_by_key (Xapian::KeyMaker *sorter, bool reverse)
 Set the sorting to be by key generated from values only.
 
void set_sort_by_value_then_relevance (Xapian::valueno sort_key, bool reverse)
 Set the sorting to be by value, then by relevance for documents with the same value.
 
void set_sort_by_key_then_relevance (Xapian::KeyMaker *sorter, bool reverse)
 Set the sorting to be by keys generated from values, then by relevance for documents with identical keys.
 
void set_sort_by_relevance_then_value (Xapian::valueno sort_key, bool reverse)
 Set the sorting to be by relevance then value.
 
void set_sort_by_relevance_then_key (Xapian::KeyMaker *sorter, bool reverse)
 Set the sorting to be by relevance, then by keys generated from values.
 
void set_time_limit (double time_limit)
 Set a time limit for the match.
 
MSet get_mset (Xapian::doccount first, Xapian::doccount maxitems, Xapian::doccount checkatleast=0, const RSet *omrset=0, const MatchDecider *mdecider=0) const
 Get (a portion of) the match set for the current query.
 
MSet get_mset (Xapian::doccount first, Xapian::doccount maxitems, const RSet *omrset, const MatchDecider *mdecider=0) const
 Get (a portion of) the match set for the current query.
 
ESet get_eset (Xapian::termcount maxitems, const RSet &omrset, int flags=0, const Xapian::ExpandDecider *edecider=0, double min_wt=0.0) const
 Get the expand set for the given rset.
 
ESet get_eset (Xapian::termcount maxitems, const RSet &omrset, const Xapian::ExpandDecider *edecider) const
 Get the expand set for the given rset.
 
ESet get_eset (Xapian::termcount maxitems, const RSet &rset, int flags, double k, const Xapian::ExpandDecider *edecider=NULL, double min_wt=0.0) const
 Get the expand set for the given rset.
 
TermIterator get_matching_terms_begin (Xapian::docid did) const
 Get terms which match a given document, by document id.
 
TermIterator get_matching_terms_end (Xapian::docid) const
 End iterator corresponding to get_matching_terms_begin()
 
TermIterator get_matching_terms_begin (const MSetIterator &it) const
 Get terms which match a given document, by match set item.
 
TermIterator get_matching_terms_end (const MSetIterator &) const
 End iterator corresponding to get_matching_terms_begin()
 
std::string get_description () const
 Return a string describing this object.
 

Static Public Attributes

static const int INCLUDE_QUERY_TERMS = 1
 Terms in the query may be returned by get_eset().
 
static const int USE_EXACT_TERMFREQ = 2
 Calculate exact term frequencies in get_eset().
 

Detailed Description

This class provides an interface to the information retrieval system for the purpose of searching.

Databases are usually opened lazily, so exceptions may not be thrown where you would expect them to be. You should catch Xapian::Error exceptions when calling any method in Xapian::Enquire.

Exceptions
Xapian::InvalidArgumentErrorwill be thrown if an invalid argument is supplied, for example, an unknown database type.

Member Enumeration Documentation

◆ docid_order

Ordering of docids.

Parameter to Enquire::set_docid_order().

Enumerator
ASCENDING 

docids sort in ascending order (default)

DESCENDING 

docids sort in descending order.

DONT_CARE 

docids sort in whatever order is most efficient for the backend.

Constructor & Destructor Documentation

◆ Enquire() [1/2]

Xapian::Enquire::Enquire ( const Database database)
explicit

Create a Xapian::Enquire object.

This specification cannot be changed once the Xapian::Enquire is opened: you must create a new Xapian::Enquire object to access a different database, or set of databases.

The database supplied must have been initialised (ie, must not be the result of calling the Database::Database() constructor). If you need to handle a situation where you have no databases gracefully, a database created with DB_BACKEND_INMEMORY can be passed here to provide a completely empty database.

Parameters
databaseSpecification of the database or databases to use.
Exceptions
Xapian::InvalidArgumentErrorwill be thrown if an empty Database object is supplied.

◆ Enquire() [2/2]

Xapian::Enquire::Enquire ( const Database database,
ErrorHandler errorhandler_ 
)

Create a Xapian::Enquire object.

This specification cannot be changed once the Xapian::Enquire is opened: you must create a new Xapian::Enquire object to access a different database, or set of databases.

The database supplied must have been initialised (ie, must not be the result of calling the Database::Database() constructor). If you need to handle a situation where you have no databases gracefully, a database created with DB_BACKEND_INMEMORY can be passed here to provide a completely empty database.

Parameters
databaseSpecification of the database or databases to use.
errorhandler_This parameter is deprecated (since Xapian 1.3.1), and as of 1.3.5 it's ignored completely.
Exceptions
Xapian::InvalidArgumentErrorwill be thrown if an empty Database object is supplied.

Member Function Documentation

◆ add_matchspy()

void Xapian::Enquire::add_matchspy ( MatchSpy spy)

Add a matchspy.

This matchspy will be called with some of the documents which match the query, during the match process. Exactly which of the matching documents are passed to it depends on exactly when certain optimisations occur during the match process, but it can be controlled to some extent by setting the checkatleast parameter to get_mset().

In particular, if there are enough matching documents, at least the number specified by checkatleast will be passed to the matchspy. This means that you can force the matchspy to be shown all matching documents by setting checkatleast to the number of documents in the database.

Parameters
spyThe MatchSpy subclass to add. The caller must ensure that this remains valid while the Enquire object remains active, or until clear_matchspies() is called, or else allocate the MatchSpy object with new and then disown it by calling spy->release() before passing it in.

◆ get_eset() [1/3]

ESet Xapian::Enquire::get_eset ( Xapian::termcount  maxitems,
const RSet omrset,
const Xapian::ExpandDecider edecider 
) const
inline

Get the expand set for the given rset.

Parameters
maxitemsthe maximum number of items to return.
omrsetthe relevance set to use when performing the expand operation.
edecidera decision functor to use to decide whether a given term should be put in the ESet
Returns
An ESet object containing the results of the expand.
Exceptions
Xapian::InvalidArgumentErrorSee class documentation.

◆ get_eset() [2/3]

ESet Xapian::Enquire::get_eset ( Xapian::termcount  maxitems,
const RSet omrset,
int  flags = 0,
const Xapian::ExpandDecider edecider = 0,
double  min_wt = 0.0 
) const

Get the expand set for the given rset.

Parameters
maxitemsthe maximum number of items to return.
omrsetthe relevance set to use when performing the expand operation.
flagszero or more of these values |-ed together:
edecidera decision functor to use to decide whether a given term should be put in the ESet
min_wtthe minimum weight for included terms
Returns
An ESet object containing the results of the expand.
Exceptions
Xapian::InvalidArgumentErrorSee class documentation.

◆ get_eset() [3/3]

ESet Xapian::Enquire::get_eset ( Xapian::termcount  maxitems,
const RSet rset,
int  flags,
double  k,
const Xapian::ExpandDecider edecider = NULL,
double  min_wt = 0.0 
) const
inline

Get the expand set for the given rset.

Parameters
maxitemsthe maximum number of items to return.
rsetthe relevance set to use when performing the expand operation.
flagszero or more of these values |-ed together:
kthe parameter k in the query expansion algorithm (default is 1.0)
edecidera decision functor to use to decide whether a given term should be put in the ESet
min_wtthe minimum weight for included terms
Returns
An ESet object containing the results of the expand.
Exceptions
Xapian::InvalidArgumentErrorSee class documentation.

◆ get_matching_terms_begin() [1/2]

TermIterator Xapian::Enquire::get_matching_terms_begin ( const MSetIterator it) const

Get terms which match a given document, by match set item.

This method returns the terms in the current query which match the given document.

If the underlying database has suitable support, using this call (rather than passing a Xapian::docid) will enable the system to ensure that the correct data is returned, and that the document has not been deleted or changed since the query was performed.

Parameters
itThe iterator for which to retrieve the matching terms.
Returns
An iterator returning the terms which match the document. The terms will be returned (as far as this makes any sense) in the same order as the terms in the query. Terms will not occur more than once, even if they do in the query.
Exceptions
Xapian::InvalidArgumentErrorSee class documentation.
Xapian::DocNotFoundErrorThe document specified could not be found in the database.

◆ get_matching_terms_begin() [2/2]

TermIterator Xapian::Enquire::get_matching_terms_begin ( Xapian::docid  did) const

Get terms which match a given document, by document id.

This method returns the terms in the current query which match the given document.

It is possible for the document to have been removed from the database between the time it is returned in an MSet, and the time that this call is made. If possible, you should specify an MSetIterator instead of a Xapian::docid, since this will enable database backends with suitable support to prevent this occurring.

Note that a query does not need to have been run in order to make this call.

Parameters
didThe document id for which to retrieve the matching terms.
Returns
An iterator returning the terms which match the document. The terms will be returned (as far as this makes any sense) in the same order as the terms in the query. Terms will not occur more than once, even if they do in the query.
Exceptions
Xapian::InvalidArgumentErrorSee class documentation.
Xapian::DocNotFoundErrorThe document specified could not be found in the database.

◆ get_mset() [1/2]

MSet Xapian::Enquire::get_mset ( Xapian::doccount  first,
Xapian::doccount  maxitems,
const RSet omrset,
const MatchDecider mdecider = 0 
) const
inline

Get (a portion of) the match set for the current query.

Parameters
firstthe first item in the result set to return. A value of zero corresponds to the first item returned being that with the highest score. A value of 10 corresponds to the first 10 items being ignored, and the returned items starting at the eleventh.
maxitemsthe maximum number of items to return. If you want all matches, then you can pass the result of calling get_doccount() on the Database object (though if you are doing this so you can filter results, you are likely to get much better performance by using Xapian's match-time filtering features instead). You can pass 0 for maxitems which will give you an empty MSet with valid statistics (such as get_matches_estimated()) calculated without looking at any postings, which is very quick, but means the estimates may be more approximate and the bounds may be much looser.
omrsetthe relevance set to use when performing the query.
mdecidera decision functor to use to decide whether a given document should be put in the MSet.
Returns
A Xapian::MSet object containing the results of the query.
Exceptions
Xapian::InvalidArgumentErrorSee class documentation.

◆ get_mset() [2/2]

MSet Xapian::Enquire::get_mset ( Xapian::doccount  first,
Xapian::doccount  maxitems,
Xapian::doccount  checkatleast = 0,
const RSet omrset = 0,
const MatchDecider mdecider = 0 
) const

Get (a portion of) the match set for the current query.

Parameters
firstthe first item in the result set to return. A value of zero corresponds to the first item returned being that with the highest score. A value of 10 corresponds to the first 10 items being ignored, and the returned items starting at the eleventh.
maxitemsthe maximum number of items to return. If you want all matches, then you can pass the result of calling get_doccount() on the Database object (though if you are doing this so you can filter results, you are likely to get much better performance by using Xapian's match-time filtering features instead). You can pass 0 for maxitems which will give you an empty MSet with valid statistics (such as get_matches_estimated()) calculated without looking at any postings, which is very quick, but means the estimates may be more approximate and the bounds may be much looser.
checkatleastthe minimum number of items to check. Because the matcher optimises, it won't consider every document which might match, so the total number of matches is estimated. Setting checkatleast forces it to consider at least this many matches and so allows for reliable paging links.
omrsetthe relevance set to use when performing the query.
mdecidera decision functor to use to decide whether a given document should be put in the MSet.
Returns
A Xapian::MSet object containing the results of the query.
Exceptions
Xapian::InvalidArgumentErrorSee class documentation.

◆ get_query()

const Xapian::Query & Xapian::Enquire::get_query ( ) const

Get the current query.

If called before set_query(), this will return a default initialised Query object.

◆ set_collapse_key()

void Xapian::Enquire::set_collapse_key ( Xapian::valueno  collapse_key,
Xapian::doccount  collapse_max = 1 
)

Set the collapse key to use for queries.

Parameters
collapse_keyvalue number to collapse on - at most one MSet entry with each particular value will be returned (default is Xapian::BAD_VALUENO which means no collapsing).
collapse_maxMax number of items with the same key to leave after collapsing (default 1).

The MSet returned by get_mset() will have only the "best" (at most) collapse_max entries with each particular value of collapse_key ("best" being highest ranked - i.e. highest weight or highest sorting key).

An example use might be to create a value for each document containing an MD5 hash of the document contents. Then duplicate documents from different sources can be eliminated at search time by collapsing with collapse_max = 1 (it's better to eliminate duplicates at index time, but this may not be always be possible - for example the search may be over more than one Xapian database).

Another use is to group matches in a particular category (e.g. you might collapse a mailing list search on the Subject: so that there's only one result per discussion thread). In this case you can use get_collapse_count() to give the user some idea how many other results there are. And if you index the Subject: as a boolean term as well as putting it in a value, you can offer a link to a non-collapsed search restricted to that thread using a boolean filter.

◆ set_cutoff()

void Xapian::Enquire::set_cutoff ( int  percent_cutoff,
double  weight_cutoff = 0 
)

Set the percentage and/or weight cutoffs.

Parameters
percent_cutoffMinimum percentage score for returned documents. If a document has a lower percentage score than this, it will not appear in the MSet. If your intention is to return only matches which contain all the terms in the query, then it's more efficient to use Xapian::Query::OP_AND instead of Xapian::Query::OP_OR in the query than to use set_cutoff(100). (default 0 => no percentage cut-off).
weight_cutoffMinimum weight for a document to be returned. If a document has a lower score that this, it will not appear in the MSet. It is usually only possible to choose an appropriate weight for cutoff based on the results of a previous run of the same query; this is thus mainly useful for alerting operations. The other potential use is with a user specified weighting scheme. (default 0 => no weight cut-off).

◆ set_docid_order()

void Xapian::Enquire::set_docid_order ( docid_order  order)

Set sort order for document IDs.

This order only has an effect on documents which would otherwise have equal rank. When ordering by relevance without a sort key, this means documents with equal weight. For a boolean match with no sort key, this means all documents. And if a sort key is used, this means documents with the same sort key (and also equal weight if ordering on relevance before or after the sort key).

Parameters
orderThis can be:
  • Xapian::Enquire::ASCENDING docids sort in ascending order (default)
  • Xapian::Enquire::DESCENDING docids sort in descending order
  • Xapian::Enquire::DONT_CARE docids sort in whatever order is most efficient for the backend

    Note: If you add documents in strict date order, then a boolean search - i.e. set_weighting_scheme(Xapian::BoolWeight()) - with set_docid_order(Xapian::Enquire::DESCENDING) is an efficient way to perform "sort by date, newest first", and with set_docid_order(Xapian::Enquire::ASCENDING) a very efficient way to perform "sort by date, oldest first".

◆ set_expansion_scheme()

void Xapian::Enquire::set_expansion_scheme ( const std::string &  eweightname_,
double  expand_k_ = 1.0 
) const

Set the weighting scheme to use for expansion.

If you don't call this method, the default is as if you'd used:

set_expansion_scheme("prob");

Parameters
eweightname_A string in lowercase specifying the name of the scheme to be used. The following schemes are currently available:
  • "bo1": Bose-Einstein 1 model from the Divergence From Randomness framework.
  • "prob" : Probabilistic model (since 1.4.26).
  • "trad" : Older alias for "prob".
expand_k_Parameter k for probabilistic query expansion. A default value of 1.0 is used if none is specified.

◆ set_query()

void Xapian::Enquire::set_query ( const Xapian::Query query,
Xapian::termcount  qlen = 0 
)

Set the query to run.

Parameters
querythe new query to run.
qlenthe query length to use in weight calculations - by default the sum of the wqf of all terms is used.

◆ set_sort_by_key()

void Xapian::Enquire::set_sort_by_key ( Xapian::KeyMaker sorter,
bool  reverse 
)

Set the sorting to be by key generated from values only.

Parameters
sorterThe functor to use for generating keys.
reverseIf true, reverses the sort order.

◆ set_sort_by_key_then_relevance()

void Xapian::Enquire::set_sort_by_key_then_relevance ( Xapian::KeyMaker sorter,
bool  reverse 
)

Set the sorting to be by keys generated from values, then by relevance for documents with identical keys.

Parameters
sorterThe functor to use for generating keys.
reverseIf true, reverses the sort order.

◆ set_sort_by_relevance()

void Xapian::Enquire::set_sort_by_relevance ( )

Set the sorting to be by relevance only.

This is the default.

◆ set_sort_by_relevance_then_key()

void Xapian::Enquire::set_sort_by_relevance_then_key ( Xapian::KeyMaker sorter,
bool  reverse 
)

Set the sorting to be by relevance, then by keys generated from values.

Note that with the default BM25 weighting scheme parameters, non-identical documents will rarely have the same weight, so this setting will give very similar results to set_sort_by_relevance(). It becomes more useful with particular BM25 parameter settings (e.g. BM25Weight(1,0,1,0,0)) or custom weighting schemes.

Parameters
sorterThe functor to use for generating keys.
reverseIf true, reverses the sort order of the generated keys. Beware that in 1.2.16 and earlier, the sense of this parameter was incorrectly inverted and inconsistent with the other set_sort_by_... methods. This was fixed in 1.2.17, so make that version a minimum requirement if this detail matters to your application.

◆ set_sort_by_relevance_then_value()

void Xapian::Enquire::set_sort_by_relevance_then_value ( Xapian::valueno  sort_key,
bool  reverse 
)

Set the sorting to be by relevance then value.

Note that sorting by values uses a string comparison, so to use this to sort by a numeric value you'll need to store the numeric values in a manner which sorts appropriately. For example, you could use Xapian::sortable_serialise() (which works for floating point numbers as well as integers), or store numbers padded with leading zeros or spaces, or with the number of digits prepended.

Note that with the default BM25 weighting scheme parameters, non-identical documents will rarely have the same weight, so this setting will give very similar results to set_sort_by_relevance(). It becomes more useful with particular BM25 parameter settings (e.g. BM25Weight(1,0,1,0,0)) or custom weighting schemes.

Parameters
sort_keyvalue number to sort on.
reverseIf true, reverses the sort order of sort_key. Beware that in 1.2.16 and earlier, the sense of this parameter was incorrectly inverted and inconsistent with the other set_sort_by_... methods. This was fixed in 1.2.17, so make that version a minimum requirement if this detail matters to your application.

◆ set_sort_by_value()

void Xapian::Enquire::set_sort_by_value ( Xapian::valueno  sort_key,
bool  reverse 
)

Set the sorting to be by value only.

Note that sorting by values uses a string comparison, so to use this to sort by a numeric value you'll need to store the numeric values in a manner which sorts appropriately. For example, you could use Xapian::sortable_serialise() (which works for floating point numbers as well as integers), or store numbers padded with leading zeros or spaces, or with the number of digits prepended.

Parameters
sort_keyvalue number to sort on.
reverseIf true, reverses the sort order.

◆ set_sort_by_value_then_relevance()

void Xapian::Enquire::set_sort_by_value_then_relevance ( Xapian::valueno  sort_key,
bool  reverse 
)

Set the sorting to be by value, then by relevance for documents with the same value.

Note that sorting by values uses a string comparison, so to use this to sort by a numeric value you'll need to store the numeric values in a manner which sorts appropriately. For example, you could use Xapian::sortable_serialise() (which works for floating point numbers as well as integers), or store numbers padded with leading zeros or spaces, or with the number of digits prepended.

Parameters
sort_keyvalue number to sort on.
reverseIf true, reverses the sort order.

◆ set_time_limit()

void Xapian::Enquire::set_time_limit ( double  time_limit)

Set a time limit for the match.

Matches with check_at_least set high can take a long time in some cases. You can set a time limit on this, after which check_at_least will be turned off.

Parameters
time_limittime in seconds after which to disable check_at_least (default: 0.0 which means no time limit)

Limitations:

This feature is currently supported on platforms which support POSIX interval timers. Interaction with the remote backend when using multiple databases may have bugs. There's not currently a way to force the match to end after a certain time.

◆ set_weighting_scheme()

void Xapian::Enquire::set_weighting_scheme ( const Weight weight_)

Set the weighting scheme to use for queries.

Parameters
weight_the new weighting scheme. If no weighting scheme is specified, the default is BM25 with the default parameters.

Member Data Documentation

◆ INCLUDE_QUERY_TERMS

const int Xapian::Enquire::INCLUDE_QUERY_TERMS = 1
static

Terms in the query may be returned by get_eset().

The original intended use for Enquire::get_eset() is for query expansion - suggesting terms to add to the query, generally with the aim of improving recall (i.e. finding more of the relevant documents), so by default terms already in the query won't be returned in the ESet. For some uses you might want to consider all terms, and this flag allows you to specify that.

◆ USE_EXACT_TERMFREQ

const int Xapian::Enquire::USE_EXACT_TERMFREQ = 2
static

Calculate exact term frequencies in get_eset().

By default, when working over multiple databases, Enquire::get_eset() uses an approximation to the termfreq to improve efficiency. This should still return good results, but if you want to calculate the exact combined termfreq then you can use this flag.


The documentation for this class was generated from the following file: