xapian-core
1.4.26
|
Class representing a query. More...
#include <query.h>
Classes | |
class | Internal |
Public Types | |
enum | op { OP_AND = 0, OP_OR = 1, OP_AND_NOT = 2, OP_XOR = 3, OP_AND_MAYBE = 4, OP_FILTER = 5, OP_NEAR = 6, OP_PHRASE = 7, OP_VALUE_RANGE = 8, OP_SCALE_WEIGHT = 9, OP_ELITE_SET = 10, OP_VALUE_GE = 11, OP_VALUE_LE = 12, OP_SYNONYM = 13, OP_MAX = 14, OP_WILDCARD = 15, OP_INVALID = 99, LEAF_TERM = 100, LEAF_POSTING_SOURCE, LEAF_MATCH_ALL, LEAF_MATCH_NOTHING } |
Query operators. More... | |
enum | { WILDCARD_LIMIT_ERROR, WILDCARD_LIMIT_FIRST, WILDCARD_LIMIT_MOST_FREQUENT } |
Public Member Functions | |
Query () | |
Construct a query matching no documents. More... | |
~Query () | |
Destructor. More... | |
Query (const Query &o) | |
Copying is allowed. More... | |
Query & | operator= (const Query &o) |
Copying is allowed. More... | |
Query (const std::string &term, Xapian::termcount wqf=1, Xapian::termpos pos=0) | |
Construct a Query object for a term. More... | |
Query (Xapian::PostingSource *source) | |
Construct a Query object for a PostingSource. More... | |
Query (double factor, const Xapian::Query &subquery) | |
Scale using OP_SCALE_WEIGHT. More... | |
Query (op op_, const Xapian::Query &subquery, double factor) | |
Scale using OP_SCALE_WEIGHT. More... | |
Query (op op_, const Xapian::Query &a, const Xapian::Query &b) | |
Construct a Query object by combining two others. More... | |
Query (op op_, const std::string &a, const std::string &b) | |
Construct a Query object by combining two terms. More... | |
Query (op op_, Xapian::valueno slot, const std::string &range_limit) | |
Construct a Query object for a single-ended value range. More... | |
Query (op op_, Xapian::valueno slot, const std::string &range_lower, const std::string &range_upper) | |
Construct a Query object for a value range. More... | |
Query (op op_, const std::string &pattern, Xapian::termcount max_expansion=0, int max_type=WILDCARD_LIMIT_ERROR, op combiner=OP_SYNONYM) | |
Query constructor for OP_WILDCARD queries. More... | |
template<typename I > | |
Query (op op_, I begin, I end, Xapian::termcount window=0) | |
Construct a Query object from a begin/end iterator pair. More... | |
const TermIterator | get_terms_begin () const |
Begin iterator for terms in the query object. More... | |
const TermIterator | get_terms_end () const |
End iterator for terms in the query object. More... | |
const TermIterator | get_unique_terms_begin () const |
Begin iterator for unique terms in the query object. More... | |
const TermIterator | get_unique_terms_end () const |
End iterator for unique terms in the query object. More... | |
Xapian::termcount | get_length () const |
Return the length of this query object. More... | |
bool | empty () const |
Check if this query is Xapian::Query::MatchNothing. More... | |
std::string | serialise () const |
Serialise this object into a string. More... | |
op | get_type () const |
Get the type of the top level of the query. More... | |
size_t | get_num_subqueries () const |
Get the number of subqueries of the top level query. More... | |
const Query | get_subquery (size_t n) const |
Read a top level subquery. More... | |
std::string | get_description () const |
Return a string describing this object. More... | |
const Query | operator &= (const Query &o) |
Combine with another Xapian::Query object using OP_AND. More... | |
const Query | operator|= (const Query &o) |
Combine with another Xapian::Query object using OP_OR. More... | |
const Query | operator^= (const Query &o) |
Combine with another Xapian::Query object using OP_XOR. More... | |
const Query | operator*= (double factor) |
Scale using OP_SCALE_WEIGHT. More... | |
const Query | operator/= (double factor) |
Inverse scale using OP_SCALE_WEIGHT. More... | |
Query (Query::op op_) | |
Construct with just an operator. More... | |
Static Public Member Functions | |
static const Query | unserialise (const std::string &serialised, const Registry ®=Registry()) |
Unserialise a string and return a Query object. More... | |
Static Public Attributes | |
static const Xapian::Query | MatchNothing |
A query matching no documents. More... | |
static const Xapian::Query | MatchAll |
A query matching all documents. More... | |
Private Member Functions | |
Query (Internal *internal_) | |
void | init (Query::op op_, size_t n_subqueries, Xapian::termcount window=0) |
template<typename I > | |
void | init (Query::op op_, Xapian::termcount window, const I &begin, const I &end, std::random_access_iterator_tag) |
template<typename I > | |
void | init (Query::op op_, Xapian::termcount window, const I &, const I &, std::input_iterator_tag) |
void | add_subquery (bool positional, const Xapian::Query &subquery) |
void | add_subquery (bool, const std::string &subquery) |
void | add_subquery (bool positional, const Xapian::Query *subquery) |
void | done () |
Private Attributes | |
Xapian::Internal::intrusive_ptr< Internal > | internal |
anonymous enum |
Enumerator | |
---|---|
WILDCARD_LIMIT_ERROR | Throw an error if OP_WILDCARD exceeds its expansion limit. Xapian::WildcardError will be thrown when the query is actually run. |
WILDCARD_LIMIT_FIRST | Stop expanding when OP_WILDCARD reaches its expansion limit. This makes the wildcard expand to only the first N terms (sorted by byte order). |
WILDCARD_LIMIT_MOST_FREQUENT | Limit OP_WILDCARD expansion to the most frequent terms. If OP_WILDCARD would expand to more than its expansion limit, the most frequent terms are taken. This approach works well for cases such as expanding a partial term at the end of a query string which the user hasn't finished typing yet - as well as being less expense to evaluate than the full expansion, using only the most frequent terms tends to give better results too. |
enum Xapian::Query::op |
Query operators.
Enumerator | |
---|---|
OP_AND | Match only documents which all subqueries match. When used in a weighted context, the weight is the sum of the weights for all the subqueries. |
OP_OR | Match documents which at least one subquery matches. When used in a weighted context, the weight is the sum of the weights for matching subqueries (so additional matching subqueries will mean a higher weight). |
OP_AND_NOT | Match documents which the first subquery matches but no others do. When used in a weighted context, the weight is just the weight of the first subquery. |
OP_XOR | Match documents which an odd number of subqueries match. When used in a weighted context, the weight is the sum of the weights for matching subqueries (so additional matching subqueries will mean a higher weight). |
OP_AND_MAYBE | Match the first subquery taking extra weight from other subqueries. When used in a weighted context, the weight is the sum of the weights for matching subqueries (so additional matching subqueries will mean a higher weight). Because only the first subquery determines which documents are matched, in a non-weighted context only the first subquery matters. |
OP_FILTER | Match like OP_AND but only taking weight from the first subquery. When used in a non-weighted context, OP_FILTER and OP_AND are equivalent. In older 1.4.x, the third and subsequent subqueries were ignored in some situations. This was fixed in 1.4.15. |
OP_NEAR | Match only documents where all subqueries match near each other. The subqueries must match at term positions within the specified window size, in any order. Currently subqueries must be terms or terms composed with OP_OR. When used in a weighted context, the weight is the sum of the weights for all the subqueries. |
OP_PHRASE | Match only documents where all subqueries match near and in order. The subqueries must match at term positions within the specified window size, in the same term position order as subquery order. Currently subqueries must be terms or terms composed with OP_OR. When used in a weighted context, the weight is the sum of the weights for all the subqueries. |
OP_VALUE_RANGE | Match only documents where a value slot is within a given range. This operator never contributes weight. |
OP_SCALE_WEIGHT | Scale the weight contributed by a subquery. The weight is the weight of the subquery multiplied by the specified non-negative scale factor (so if the scale factor is zero then the subquery contributes no weight). |
OP_ELITE_SET | Pick the best N subqueries and combine with OP_OR. If you want to implement a feature which finds documents similar to a piece of text, an obvious approach is to build an "OR" query from all the terms in the text, and run this query against a database containing the documents. However such a query can contain a lots of terms and be quite slow to perform, yet many of these terms don't contribute usefully to the results. The OP_ELITE_SET operator can be used instead of OP_OR in this situation. OP_ELITE_SET selects the most important ''N'' terms and then acts as an OP_OR query with just these, ignoring any other terms. This will usually return results just as good as the full OP_OR query, but much faster. In general, the OP_ELITE_SET operator can be used when you have a large OR query, but it doesn't matter if the search completely ignores some of the less important terms in the query. The subqueries don't have to be terms. If they aren't then OP_ELITE_SET could potentially pick a subset which doesn't actually match any documents even if the full OR would match some (because OP_ELITE_SET currently selects those subqueries which can return the highest weights). This is probably rare in practice though. You can specify a parameter to the query constructor which controls the number of subqueries which OP_ELITE_SET will pick. If not specified, this defaults to 10 (Xapian used to default to Xapian::Query query(Xapian::Query::OP_ELITE_SET, subqs.begin(), subqs.end(), 7); If the number of subqueries is less than this threshold, OP_ELITE_SET behaves identically to OP_OR. When used with a sharded database, OP_ELITE_SET currently picks the subqueries to use separately for each shard based on the maximum weight they can return in that shard. This means it probably won't select exactly the same terms, and so the results of the search may not be exactly the same as for a single database with equivalent contents. |
OP_VALUE_GE | Match only documents where a value slot is >= a given value. Similar to OP_VALUE_RANGE, but open-ended. This operator never contributes weight. |
OP_VALUE_LE | Match only documents where a value slot is <= a given value. Similar to OP_VALUE_RANGE, but open-ended. This operator never contributes weight. |
OP_SYNONYM | Match like OP_OR but weighting as if a single term. The weight is calculated combining the statistics for the subqueries to approximate the weight of a single term occurring with those statistics. |
OP_MAX | Pick the maximum weight of any subquery. Matches the same documents as OP_OR, but the weight contributed is the maximum weight from any matching subquery (for OP_OR, it's the sum of the weights from the matching subqueries). Added in Xapian 1.3.2. |
OP_WILDCARD | Wildcard expansion. Added in Xapian 1.3.3. |
OP_INVALID | Construct an invalid query. This can be useful as a placeholder - for example RangeProcessor uses it as a return value to indicate that a range hasn't been recognised. |
LEAF_TERM | Value returned by get_type() for a term. |
LEAF_POSTING_SOURCE | Value returned by get_type() for a PostingSource. |
LEAF_MATCH_ALL | Value returned by get_type() for MatchAll or equivalent. This is returned for any |
LEAF_MATCH_NOTHING | Value returned by get_type() for MatchNothing or equivalent. This is returned for any |
|
inline |
Construct a query matching no documents.
MatchNothing is a static instance of this.
When combined with other Query objects using the various supported operators, Query()
works like false
in boolean logic, so Query() & q
is Query()
, while Query() | q
is q
.
Definition at line 319 of file query.h.
Referenced by unserialise().
|
inline |
Xapian::Query::Query | ( | const std::string & | term, |
Xapian::termcount | wqf = 1 , |
||
Xapian::termpos | pos = 0 |
||
) |
Construct a Query object for a term.
term | The term. An empty string constructs a query matching all documents (MatchAll is a static instance of this). |
wqf | The within-query frequency. (default: 1) |
pos | The query position. Currently this is mainly used to determine the order of terms obtained via get_terms_begin(). (default: 0) |
Definition at line 43 of file query.cc.
References LOGCALL_CTOR.
|
explicit |
Construct a Query object for a PostingSource.
Definition at line 49 of file query.cc.
References LOGCALL_CTOR.
Xapian::Query::Query | ( | double | factor, |
const Xapian::Query & | subquery | ||
) |
Scale using OP_SCALE_WEIGHT.
factor | Non-negative real number to multiply weights by. |
subquery | Query object to scale weights from. |
Definition at line 55 of file query.cc.
References empty(), and LOGCALL_CTOR.
Xapian::Query::Query | ( | op | op_, |
const Xapian::Query & | subquery, | ||
double | factor | ||
) |
Scale using OP_SCALE_WEIGHT.
In this form, the op_ parameter is totally redundant - use Query(factor, subquery) in preference.
op_ | Must be OP_SCALE_WEIGHT. |
factor | Non-negative real number to multiply weights by. |
subquery | Query object to scale weights from. |
Definition at line 63 of file query.cc.
References internal, LOGCALL_CTOR, OP_SCALE_WEIGHT, OP_VALUE_GE, OP_VALUE_LE, OP_VALUE_RANGE, and rare.
|
inline |
|
inline |
Xapian::Query::Query | ( | op | op_, |
Xapian::valueno | slot, | ||
const std::string & | range_limit | ||
) |
Construct a Query object for a single-ended value range.
op_ | Must be OP_VALUE_LE or OP_VALUE_GE currently. |
slot | The value slot to work over. |
range_limit | The limit of the range. |
Definition at line 86 of file query.cc.
References LOGCALL_CTOR, OP_VALUE_GE, OP_VALUE_LE, and usual.
Xapian::Query::Query | ( | op | op_, |
Xapian::valueno | slot, | ||
const std::string & | range_lower, | ||
const std::string & | range_upper | ||
) |
Construct a Query object for a value range.
op_ | Must be OP_VALUE_RANGE currently. |
slot | The value slot to work over. |
range_lower | Lower end of the range. |
range_upper | Upper end of the range. |
Definition at line 102 of file query.cc.
References LOGCALL_CTOR, OP_VALUE_RANGE, rare, and usual.
Xapian::Query::Query | ( | op | op_, |
const std::string & | pattern, | ||
Xapian::termcount | max_expansion = 0 , |
||
int | max_type = WILDCARD_LIMIT_ERROR , |
||
op | combiner = OP_SYNONYM |
||
) |
Query constructor for OP_WILDCARD queries.
op_ | Must be OP_WILDCARD |
pattern | The wildcard pattern - currently this is just a string and the wildcard expands to terms which start with exactly this string. |
max_expansion | The maximum number of terms to expand to (default: 0, which means no limit) |
max_type | How to enforce max_expansion - one of WILDCARD_LIMIT_ERROR (the default), WILDCARD_LIMIT_FIRST or WILDCARD_LIMIT_MOST_FREQUENT. When searching multiple databases, the expansion limit is currently applied independently for each database, so the total number of terms may be higher than the limit. This is arguably a bug, and may change in future versions. |
combiner | The Query::op to combine the terms with - one of OP_SYNONYM (the default), OP_OR or OP_MAX. |
Definition at line 117 of file query.cc.
References LOGCALL_CTOR, OP_MAX, OP_OR, OP_SYNONYM, OP_WILDCARD, and rare.
|
inline |
Construct a Query object from a begin/end iterator pair.
Dereferencing the iterator should return a Xapian::Query, a non-NULL Xapian::Query*, a std::string or a type which converts to one of these (e.g. const char*).
If begin == end then there are no subqueries and the resulting Query won't match anything.
op_ | The operator to combine the queries with. |
begin | Begin iterator. |
end | End iterator. |
window | Window size for OP_NEAR and OP_PHRASE, or 0 to use the number of subqueries as the window size (default: 0). |
|
inlineexplicitprivate |
|
inlineexplicit |
Construct with just an operator.
op_ | The operator to use - currently only OP_INVALID is useful. |
Definition at line 607 of file query.h.
References OP_INVALID.
|
private |
Definition at line 296 of file query.cc.
References Xapian::Internal::QueryBranch::add_subquery(), Assert, get_type(), LEAF_MATCH_ALL, LEAF_MATCH_NOTHING, LEAF_POSTING_SOURCE, LEAF_TERM, MatchNothing, and OP_OR.
|
inlineprivate |
|
inlineprivate |
|
private |
Definition at line 328 of file query.cc.
References Xapian::Internal::QueryBranch::done().
|
inline |
Check if this query is Xapian::Query::MatchNothing.
Definition at line 524 of file query.h.
References Xapian::operator &=(), and XAPIAN_PURE_FUNCTION.
Referenced by ProbQuery::append_filter(), DEFINE_TESTCASE(), Xapian::Enquire::Internal::get_eset(), Xapian::Enquire::Internal::get_matching_terms(), MultiMatch::get_mset(), LocalSubMatch::get_postlist(), Xapian::Query::Internal::Internal(), MultiMatch::MultiMatch(), operator^=(), operator|=(), Query(), and yy_reduce().
string Xapian::Query::get_description | ( | ) | const |
Return a string describing this object.
Definition at line 232 of file query.cc.
Referenced by DEFINE_TESTCASE(), Xapian::Enquire::Internal::get_description(), main(), and PerfTestLogger::search_end().
Xapian::termcount Xapian::Query::get_length | ( | ) | const |
Return the length of this query object.
Definition at line 187 of file query.cc.
References internal.
Referenced by DEFINE_TESTCASE(), and Xapian::Enquire::Internal::set_query().
size_t Xapian::Query::get_num_subqueries | ( | ) | const |
Get the number of subqueries of the top level query.
Definition at line 220 of file query.cc.
Referenced by Xapian::check_query(), and DEFINE_TESTCASE().
const Query Xapian::Query::get_subquery | ( | size_t | n | ) | const |
Read a top level subquery.
n | Return the n-th subquery (starting from 0) - only valid when 0 <= n < get_num_subqueries(). |
Definition at line 226 of file query.cc.
References get_subquery().
Referenced by Xapian::Internal::QueryAndNot::add_subquery(), Xapian::check_query(), DEFINE_TESTCASE(), and get_subquery().
const TermIterator Xapian::Query::get_terms_begin | ( | ) | const |
Begin iterator for terms in the query object.
The iterator returns terms in ascending query position order, and will return the same term in each unique position it occurs in. If you want the terms in sorted order and without duplicates, see get_unique_terms_begin().
Definition at line 135 of file query.cc.
Referenced by DEFINE_TESTCASE(), Xapian::Enquire::Internal::get_eset(), Xapian::Enquire::Internal::get_matching_terms(), and main().
|
inline |
End iterator for terms in the query object.
Definition at line 502 of file query.h.
Referenced by DEFINE_TESTCASE(), Xapian::Enquire::Internal::get_eset(), Xapian::Enquire::Internal::get_matching_terms(), and main().
Xapian::Query::op Xapian::Query::get_type | ( | ) | const |
Get the type of the top level of the query.
Definition at line 212 of file query.cc.
References LEAF_MATCH_NOTHING.
Referenced by Xapian::Internal::QueryAndNot::add_subquery(), add_subquery(), Xapian::check_query(), DEFINE_TESTCASE(), and State::range().
const TermIterator Xapian::Query::get_unique_terms_begin | ( | ) | const |
Begin iterator for unique terms in the query object.
Terms are sorted and terms with the same name removed from the list.
If you want the terms in ascending query position order, see get_terms_begin().
Definition at line 160 of file query.cc.
Referenced by Xapian::Weight::Internal::accumulate_stats(), DEFINE_TESTCASE(), GlassDatabase::readahead_for_query(), and ChertDatabase::readahead_for_query().
|
inline |
End iterator for unique terms in the query object.
Definition at line 516 of file query.h.
References XAPIAN_PURE_FUNCTION.
Referenced by DEFINE_TESTCASE().
|
private |
Definition at line 242 of file query.cc.
References OP_AND, OP_AND_MAYBE, OP_AND_NOT, OP_ELITE_SET, OP_FILTER, OP_INVALID, OP_MAX, OP_NEAR, OP_OR, OP_PHRASE, OP_SYNONYM, and OP_XOR.
|
inlineprivate |
|
inlineprivate |
Combine with another Xapian::Query object using OP_AND.
!o.empty()
). Referenced by Xapian::Query::Internal::Internal().
|
inline |
|
inline |
Combine with another Xapian::Query object using OP_OR.
!o.empty()
). Definition at line 811 of file query.h.
References empty().
string Xapian::Query::serialise | ( | ) | const |
Serialise this object into a string.
Definition at line 193 of file query.cc.
Referenced by DEFINE_TESTCASE(), and RemoteDatabase::set_query().
|
static |
Unserialise a string and return a Query object.
serialised | the string to unserialise. |
reg | Xapian::Registry object to use to unserialise user-subclasses of Xapian::PostingSource (default: standard registry). |
Definition at line 202 of file query.cc.
References AssertEq, Query(), and Xapian::Query::Internal::unserialise().
Referenced by DEFINE_TESTCASE(), and RemoteServer::msg_query().
|
private |
Reference counted internals.
Definition at line 49 of file query.h.
Referenced by Xapian::Internal::QueryAndLike::add_subquery(), Xapian::Internal::QueryOrLike::add_subquery(), Xapian::Internal::QueryAndNot::add_subquery(), Xapian::Internal::QueryAndMaybe::add_subquery(), Xapian::check_query(), Xapian::Internal::QueryScaleWeight::gather_terms(), Xapian::Internal::QueryScaleWeight::get_description(), Xapian::Internal::QueryScaleWeight::get_length(), get_length(), LocalSubMatch::get_postlist(), operator=(), operator^=(), Xapian::Internal::QueryScaleWeight::postlist(), Query(), State::range(), and Xapian::Internal::QueryScaleWeight::serialise().
|
static |
A query matching all documents.
This is a static instance of Xapian::Query(std::string())
. If you are constructing Query objects which use MatchAll in different threads then the reference counting of the static object can get messed up by concurrent access so you should instead use Xapian::Query(std::string())
directly.
Definition at line 75 of file query.h.
Referenced by DEFINE_TESTCASE(), TestRangeProcessor::operator()(), TitleFieldProcessor::operator()(), and HostFieldProcessor::operator()().
|
static |
A query matching no documents.
This is a static instance of a default-constructed Xapian::Query object. It is safe to use concurrently from different threads, unlike MatchAll (this is because MatchNothing has a NULL internal object so there's no reference counting happening).
When combined with other Query objects using the various supported operators, MatchNothing works like false
in boolean logic, so MatchNothing & q
is MatchNothing
, while MatchNothing | q
is q
.
Definition at line 65 of file query.h.
Referenced by add_subquery(), and DEFINE_TESTCASE().