xapian-core
1.4.27
|
Build a Xapian::Query object from a user query string. More...
#include <queryparser.h>
Classes | |
class | Internal |
Public Types | |
enum | feature_flag { FLAG_BOOLEAN = 1, FLAG_PHRASE = 2, FLAG_LOVEHATE = 4, FLAG_BOOLEAN_ANY_CASE = 8, FLAG_WILDCARD = 16, FLAG_PURE_NOT = 32, FLAG_PARTIAL = 64, FLAG_SPELLING_CORRECTION = 128, FLAG_SYNONYM = 256, FLAG_AUTO_SYNONYMS = 512, FLAG_AUTO_MULTIWORD_SYNONYMS = 1024, FLAG_NGRAMS = 2048, FLAG_CJK_NGRAM = FLAG_NGRAMS, FLAG_ACCUMULATE = 65536, FLAG_NO_POSITIONS = 0x20000, FLAG_DEFAULT = FLAG_PHRASE|FLAG_BOOLEAN|FLAG_LOVEHATE } |
Enum of feature flags. More... | |
enum | stem_strategy { STEM_NONE, STEM_SOME, STEM_ALL, STEM_ALL_Z, STEM_SOME_FULL_POS } |
Stemming strategies, for use with set_stemming_strategy(). More... | |
Public Member Functions | |
QueryParser (const QueryParser &o) | |
Copy constructor. More... | |
QueryParser & | operator= (const QueryParser &o) |
Assignment. More... | |
QueryParser () | |
Default constructor. More... | |
~QueryParser () | |
Destructor. More... | |
void | set_stemmer (const Xapian::Stem &stemmer) |
Set the stemmer. More... | |
void | set_stemming_strategy (stem_strategy strategy) |
Set the stemming strategy. More... | |
void | set_stopper (const Stopper *stop=NULL) |
Set the stopper. More... | |
void | set_default_op (Query::op default_op) |
Set the default operator. More... | |
Query::op | get_default_op () const |
Get the current default operator. More... | |
void | set_database (const Database &db) |
Specify the database being searched. More... | |
void | set_max_expansion (Xapian::termcount max_expansion, int max_type=Xapian::Query::WILDCARD_LIMIT_ERROR, unsigned flags=FLAG_WILDCARD|FLAG_PARTIAL) |
Specify the maximum expansion of a wildcard and/or partial term. More... | |
void | set_max_wildcard_expansion (Xapian::termcount) |
Specify the maximum expansion of a wildcard. More... | |
Query | parse_query (const std::string &query_string, unsigned flags=FLAG_DEFAULT, const std::string &default_prefix=std::string()) |
Parse a query. More... | |
void | add_prefix (const std::string &field, const std::string &prefix) |
Add a free-text field term prefix. More... | |
void | add_prefix (const std::string &field, Xapian::FieldProcessor *proc) |
Register a FieldProcessor. More... | |
void | add_boolean_prefix (const std::string &field, const std::string &prefix, const std::string *grouping=NULL) |
Add a boolean term prefix allowing the user to restrict a search with a boolean filter specified in the free text query. More... | |
void | add_boolean_prefix (const std::string &field, const std::string &prefix, bool exclusive) |
Add a boolean term prefix allowing the user to restrict a search with a boolean filter specified in the free text query. More... | |
void | add_boolean_prefix (const std::string &field, Xapian::FieldProcessor *proc, const std::string *grouping=NULL) |
Register a FieldProcessor for a boolean prefix. More... | |
void | add_boolean_prefix (const std::string &field, Xapian::FieldProcessor *proc, bool exclusive) |
Register a FieldProcessor for a boolean prefix. More... | |
TermIterator | stoplist_begin () const |
Begin iterator over terms omitted from the query as stopwords. More... | |
TermIterator | stoplist_end () const |
End iterator over terms omitted from the query as stopwords. More... | |
TermIterator | unstem_begin (const std::string &term) const |
Begin iterator over unstemmed forms of the given stemmed query term. More... | |
TermIterator | unstem_end (const std::string &) const |
End iterator over unstemmed forms of the given stemmed query term. More... | |
void | add_rangeprocessor (Xapian::RangeProcessor *range_proc, const std::string *grouping=NULL) |
Register a RangeProcessor. More... | |
void | add_valuerangeprocessor (Xapian::ValueRangeProcessor *vrproc) |
Register a ValueRangeProcessor. More... | |
std::string | get_corrected_query_string () const |
Get the spelling-corrected query string. More... | |
std::string | get_description () const |
Return a string describing this object. More... | |
Private Attributes | |
Xapian::Internal::intrusive_ptr< Internal > | internal |
Build a Xapian::Query object from a user query string.
Definition at line 778 of file queryparser.h.
Enum of feature flags.
Enumerator | |
---|---|
FLAG_BOOLEAN | Support AND, OR, etc and bracketed subexpressions. |
FLAG_PHRASE | Support quoted phrases. |
FLAG_LOVEHATE | Support + and -. |
FLAG_BOOLEAN_ANY_CASE | Support AND, OR, etc even if they aren't in ALLCAPS. |
FLAG_WILDCARD | Support wildcards. At present only right truncation (e.g. Xap*) is supported. Currently you can't use wildcards with boolean filter prefixes, or in a phrase (either an explicitly quoted one, or one implicitly generated by hyphens or other punctuation). In Xapian 1.2.x, you needed to tell the QueryParser object which database to expand wildcards from by calling set_database(). In Xapian 1.3.3, OP_WILDCARD was added and wildcards are now expanded when Enquire::get_mset() is called, with the expansion using the database being searched. |
FLAG_PURE_NOT | Allow queries such as 'NOT apples'. These require the use of a list of all documents in the database which is potentially expensive, so this feature isn't enabled by default. |
FLAG_PARTIAL | Enable partial matching. Partial matching causes the parser to treat the query as a "partially entered" search. This will automatically treat the final word as a wildcarded match, unless it is followed by whitespace, to produce more stable results from interactive searches. Currently FLAG_PARTIAL doesn't do anything if the final word in the query has a boolean filter prefix, or if it is in a phrase (either an explicitly quoted one, or one implicitly generated by hyphens or other punctuation). It also doesn't do anything if if the final word is part of a value range. In Xapian 1.2.x, you needed to tell the QueryParser object which database to expand wildcards from by calling set_database(). In Xapian 1.3.3, OP_WILDCARD was added and wildcards are now expanded when Enquire::get_mset() is called, with the expansion using the database being searched. |
FLAG_SPELLING_CORRECTION | Enable spelling correction. For each word in the query which doesn't exist as a term in the database, Database::get_spelling_suggestion() will be called and if a suggestion is returned, a corrected version of the query string will be built up which can be read using QueryParser::get_corrected_query_string(). The query returned is based on the uncorrected query string however - if you want a parsed query based on the corrected query string, you must call QueryParser::parse_query() again. NB: You must also call set_database() for this to work. |
FLAG_SYNONYM | Enable synonym operator '~'. NB: You must also call set_database() for this to work. |
FLAG_AUTO_SYNONYMS | Enable automatic use of synonyms for single terms. NB: You must also call set_database() for this to work. |
FLAG_AUTO_MULTIWORD_SYNONYMS | Enable automatic use of synonyms for single terms and groups of terms. NB: You must also call set_database() for this to work. |
FLAG_NGRAMS | Generate n-grams for scripts without explicit word breaks. Spans of characters in such scripts are split into unigrams and bigrams, with the unigrams carrying positional information. Text in other scripts is split into words as normal. The TermGenerator::FLAG_NGRAMS flag needs to have been used at index time. This mode can also be enabled in 1.2.8 and later by setting environment variable XAPIAN_CJK_NGRAM to a non-empty value (but doing so was deprecated in 1.4.11). In 1.4.x this feature was specific to CJK (Chinese, Japanese and Korean), but in 1.5.0 it's been extended to other languages. To reflect this change the new and preferred name is FLAG_NGRAMS, which was added as an alias for forward compatibility in Xapian 1.4.23. Use FLAG_CJK_NGRAM instead if you aim to support Xapian < 1.4.23.
|
FLAG_CJK_NGRAM | Generate n-grams for scripts without explicit word breaks. Old name - use FLAG_NGRAMS instead unless you aim to support Xapian < 1.4.23.
|
FLAG_ACCUMULATE | Accumulate unstem and stoplist results. By default, the unstem and stoplist data is reset by a call to parse_query(), which makes sense if you use the same QueryParser object to parse a series of independent queries. If you're using the same QueryParser object to parse several fields on the same query form, you may want to have the unstem and stoplist data combined for all of them, in which case you can use this flag to prevent this data from being reset.
|
FLAG_NO_POSITIONS | Produce a query which doesn't use positional information. With this flag enabled, no positional information will be used and any query operations which would use it are replaced by the nearest equivalent which doesn't (so phrase searches, NEAR and ADJ will result in OP_AND).
|
FLAG_DEFAULT | The default flags. Used if you don't explicitly pass any to parse_query(). The default flags are FLAG_PHRASE|FLAG_BOOLEAN|FLAG_LOVEHATE. Added in Xapian 1.0.11. |
Definition at line 786 of file queryparser.h.
Stemming strategies, for use with set_stemming_strategy().
Enumerator | |
---|---|
STEM_NONE | |
STEM_SOME | |
STEM_ALL | |
STEM_ALL_Z | |
STEM_SOME_FULL_POS |
Definition at line 943 of file queryparser.h.
QueryParser::QueryParser | ( | const QueryParser & | o | ) |
Copy constructor.
Definition at line 66 of file queryparser.cc.
QueryParser::QueryParser | ( | ) |
QueryParser::~QueryParser | ( | ) |
Destructor.
Definition at line 82 of file queryparser.cc.
void QueryParser::add_boolean_prefix | ( | const std::string & | field, |
const std::string & | prefix, | ||
const std::string * | grouping = NULL |
||
) |
Add a boolean term prefix allowing the user to restrict a search with a boolean filter specified in the free text query.
For example:
This allows the user to restrict a search with site:xapian.org which will be converted to Hxapian.org combined with any weighted query with Xapian::Query::OP_FILTER
.
If multiple boolean filters are specified in a query for the same prefix, they will be combined with the Xapian::Query::OP_OR
operator. Then, if there are boolean filters for different prefixes, they will be combined with the Xapian::Query::OP_AND
operator.
Multiple fields can be mapped to the same prefix (so for example you can make site: and domain: aliases for each other). Instances of fields with different aliases but the same prefix will still be combined with the OR operator.
For example, if "site" and "domain" map to "H", but author maps to "A", a search for "site:foo domain:bar author:Fred" will map to "(Hfoo OR Hbar) AND Afred".
As of 1.0.4, you can call this method multiple times with the same value of field to allow a single field to be mapped to multiple prefixes. Multiple terms being generated for such a field, and combined with Xapian::Query::OP_OR
.
Calling this method with an empty string for field will cause a Xapian::InvalidArgumentError
.
If you call add_prefix()
and add_boolean_prefix()
for the same value of field, a Xapian::InvalidOperationError
exception will be thrown.
In 1.0.3 and earlier, subsequent calls to this method with the same value of field had no effect.
field | The user visible field name, which may not be empty for a boolean filter. Currently this needs to consist of characters for which Xapian::Unicode::is_wordchar() is true (approximately alphanumerics plus connector punctuation such as _ ). Since 1.4.26 it can optionally end in a : for consistency with how range prefixes are specified. |
prefix | The term prefix to map this to |
grouping | Controls how multiple filters are combined - filters with the same grouping value are combined with OP_OR, then the resulting queries are combined with OP_AND. If NULL, then field is used for grouping. If an empty string, then a unique grouping is created for each filter (this is sometimes useful when each document can have multiple terms with this prefix). [default: NULL] |
Definition at line 206 of file queryparser.cc.
References Assert, and endswith().
Referenced by DEFINE_TESTCASE(), and main().
|
inline |
Add a boolean term prefix allowing the user to restrict a search with a boolean filter specified in the free text query.
This is an older version of this method - use the version with the grouping
parameter in preference to this one.
field | The user visible field name, which may not be empty for a boolean filter. Currently this needs to consist of characters for which Xapian::Unicode::is_wordchar() is true (approximately alphanumerics plus connector punctuation such as _ ). Since 1.4.26 it can optionally end in a : for consistency with how range prefixes are specified. |
prefix | The term prefix to map this to |
exclusive | Controls how multiple filters are combined. If true then prefix is used as the grouping value, so terms with the same prefix are combined with OP_OR, then the resulting queries are combined with OP_AND. If false, then a unique grouping is created for each filter (this is sometimes useful when each document can have multiple terms with this prefix). |
Definition at line 1245 of file queryparser.h.
void QueryParser::add_boolean_prefix | ( | const std::string & | field, |
Xapian::FieldProcessor * | proc, | ||
const std::string * | grouping = NULL |
||
) |
Register a FieldProcessor for a boolean prefix.
Definition at line 219 of file queryparser.cc.
References Assert, and endswith().
|
inline |
Register a FieldProcessor for a boolean prefix.
This is an older version of this method - use the version with the grouping
parameter in preference to this one.
Definition at line 1265 of file queryparser.h.
void QueryParser::add_prefix | ( | const std::string & | field, |
const std::string & | prefix | ||
) |
Add a free-text field term prefix.
For example:
This allows the user to search for author:Orwell which will be converted to a search for the term "Aorwell".
Multiple fields can be mapped to the same prefix. For example, you can make title: and subject: aliases for each other.
As of 1.0.4, you can call this method multiple times with the same value of field to allow a single field to be mapped to multiple prefixes. Multiple terms being generated for such a field, and combined with Xapian::Query::OP_OR
.
If any prefixes are specified for the empty field name (i.e. you call this method with an empty string as the first parameter) these prefixes will be used for terms without a field specifier. If you do this and also specify the default_prefix
parameter to parse_query()
, then the default_prefix
parameter will override.
If the prefix parameter is empty, then "field:word" will produce the term "word" (and this can be one of several prefixes for a particular field, or for terms without a field specifier).
If you call add_prefix()
and add_boolean_prefix()
for the same value of field, a Xapian::InvalidOperationError
exception will be thrown.
In 1.0.3 and earlier, subsequent calls to this method with the same value of field had no effect.
field | The user visible field name. Currently this needs to consist of characters for which Xapian::Unicode::is_wordchar() is true (approximately alphanumerics plus connector punctuation such as _ ). Since 1.4.26 it can optionally end in a : for consistency with how range prefixes are specified. |
prefix | The term prefix to map this to. |
Definition at line 184 of file queryparser.cc.
References Assert, and endswith().
Referenced by DEFINE_TESTCASE(), and main().
void QueryParser::add_prefix | ( | const std::string & | field, |
Xapian::FieldProcessor * | proc | ||
) |
Register a FieldProcessor.
Definition at line 195 of file queryparser.cc.
References Assert, and endswith().
void QueryParser::add_rangeprocessor | ( | Xapian::RangeProcessor * | range_proc, |
const std::string * | grouping = NULL |
||
) |
Register a RangeProcessor.
Definition at line 253 of file queryparser.cc.
References Assert.
Referenced by DEFINE_TESTCASE().
|
inline |
Register a ValueRangeProcessor.
This method is provided for API compatibility with Xapian 1.2.x and is deprecated - use add_rangeprocessor() with a RangeProcessor instead.
Compatibility shim.
Definition at line 1300 of file queryparser.h.
References Xapian::BAD_VALUENO, Xapian::Query::OP_INVALID, and Xapian::RangeProcessor::operator()().
Referenced by DEFINE_TESTCASE().
string QueryParser::get_corrected_query_string | ( | ) | const |
Get the spelling-corrected query string.
This will only be set if FLAG_SPELLING_CORRECTION is specified when QueryParser::parse_query() was last called.
If there were no corrections, an empty string is returned.
Definition at line 261 of file queryparser.cc.
Referenced by DEFINE_TESTCASE(), and main().
Query::op QueryParser::get_default_op | ( | ) | const |
Get the current default operator.
Definition at line 136 of file queryparser.cc.
Referenced by DEFINE_TESTCASE().
string QueryParser::get_description | ( | ) | const |
Return a string describing this object.
Definition at line 267 of file queryparser.cc.
|
default |
Query QueryParser::parse_query | ( | const std::string & | query_string, |
unsigned | flags = FLAG_DEFAULT , |
||
const std::string & | default_prefix = std::string() |
||
) |
Parse a query.
query_string | A free-text query as entered by a user |
flags | Zero or more QueryParser::feature_flag specifying what features the QueryParser should support. Combine multiple values with bitwise-or (|) (default FLAG_DEFAULT). |
default_prefix | The default term prefix to use (default none). For example, you can pass "A" when parsing an "Author" field. |
If | the query string can't be parsed, then Xapian::QueryParserError is thrown. You can get an English error message to report to the user by catching it and calling get_msg() on the caught exception. The current possible values (in case you want to translate them) are: |
Definition at line 162 of file queryparser.cc.
References FLAG_ACCUMULATE, FLAG_NGRAMS, FLAG_NO_POSITIONS, and internal.
Referenced by DEFINE_TESTCASE(), main(), test_qp_flag_wildcard3_helper(), and time_query_parse().
void QueryParser::set_database | ( | const Database & | db | ) |
Specify the database being searched.
db | The database to use for spelling correction (FLAG_SPELLING_CORRECTION), and synonyms (FLAG_SYNONYM, FLAG_AUTO_SYNONYMS, and FLAG_AUTO_MULTIWORD_SYNONYMS). |
Definition at line 142 of file queryparser.cc.
Referenced by DEFINE_TESTCASE(), main(), test_qp_flag_wildcard3_helper(), and time_query_parse().
void QueryParser::set_default_op | ( | Query::op | default_op | ) |
Set the default operator.
default_op | The operator to use to combine non-filter query items when no explicit operator is used. |
So for example, 'weather forecast' is parsed as if it were 'weather OR forecast' by default.
The most useful values for this are OP_OR (the default) and OP_AND. OP_NEAR, OP_PHRASE, OP_ELITE_SET, OP_SYNONYM and OP_MAX are also permitted. Passing other values will result in InvalidArgumentError being thrown.
Definition at line 103 of file queryparser.cc.
References Xapian::Query::OP_AND, Xapian::Query::OP_ELITE_SET, Xapian::Query::OP_MAX, Xapian::Query::OP_NEAR, Xapian::Query::OP_OR, Xapian::Query::OP_PHRASE, and Xapian::Query::OP_SYNONYM.
Referenced by DEFINE_TESTCASE(), and main().
void QueryParser::set_max_expansion | ( | Xapian::termcount | max_expansion, |
int | max_type = Xapian::Query::WILDCARD_LIMIT_ERROR , |
||
unsigned | flags = FLAG_WILDCARD|FLAG_PARTIAL |
||
) |
Specify the maximum expansion of a wildcard and/or partial term.
Note: you must also set FLAG_WILDCARD and/or FLAG_PARTIAL in the flags parameter to parse_query() for this setting to have anything to affect.
If you don't call this method, the default settings are no limit on wildcard expansion, and partial terms expanding to the most frequent 100 terms - i.e. as if you'd called:
set_max_expansion(0); set_max_expansion(100, Xapian::Query::WILDCARD_LIMIT_MOST_FREQUENT, Xapian::QueryParser::FLAG_PARTIAL);
max_expansion | The maximum number of terms each wildcard in the query can expand to, or 0 for no limit (which is the default). |
max_type | Xapian::Query::WILDCARD_LIMIT_ERROR, Xapian::Query::WILDCARD_LIMIT_FIRST or Xapian::Query::WILDCARD_LIMIT_MOST_FREQUENT (default: Xapian::Query::WILDCARD_LIMIT_ERROR). |
flags | What to set the limit for (default: FLAG_WILDCARD|FLAG_PARTIAL, setting the limit for both wildcards and partial terms). |
Definition at line 147 of file queryparser.cc.
References FLAG_PARTIAL, and FLAG_WILDCARD.
Referenced by test_qp_flag_wildcard3_helper().
|
inline |
Specify the maximum expansion of a wildcard.
If any wildcard expands to more than max_expansion terms, an exception will be thrown.
This method is provided for API compatibility with Xapian 1.2.x and is deprecated - replace it with:
set_max_wildcard_expansion(max_expansion, Xapian::Query::WILDCARD_LIMIT_ERROR, Xapian::QueryParser::FLAG_WILDCARD);
Definition at line 1345 of file queryparser.h.
References Xapian::sortable_serialise_(), Xapian::Query::WILDCARD_LIMIT_ERROR, and XAPIAN_VISIBILITY_DEFAULT.
void QueryParser::set_stemmer | ( | const Xapian::Stem & | stemmer | ) |
Set the stemmer.
This sets the stemming algorithm which will be used by the query parser. The stemming algorithm will be used according to the stemming strategy set by set_stemming_strategy(). As of 1.3.1, this defaults to STEM_SOME, but in earlier versions the default was STEM_NONE. If you want to work with older versions, you should explicitly set a stemming strategy as well as setting a stemmer, otherwise your stemmer won't actually be used.
stemmer | The Xapian::Stem object to set. |
Definition at line 85 of file queryparser.cc.
References stemmer.
Referenced by DEFINE_TESTCASE(), and main().
void QueryParser::set_stemming_strategy | ( | stem_strategy | strategy | ) |
Set the stemming strategy.
This controls how the query parser will apply the stemming algorithm. Note that the stemming algorithm is only applied to words in free-text fields - boolean filter terms are never stemmed.
strategy | The strategy to use - possible values are:
|
Definition at line 91 of file queryparser.cc.
Referenced by DEFINE_TESTCASE(), and main().
void QueryParser::set_stopper | ( | const Stopper * | stop = NULL | ) |
Set the stopper.
stop | The Stopper object to set (default NULL, which means no stopwords). |
Definition at line 97 of file queryparser.cc.
Referenced by DEFINE_TESTCASE(), and main().
TermIterator QueryParser::stoplist_begin | ( | ) | const |
Begin iterator over terms omitted from the query as stopwords.
Definition at line 233 of file queryparser.cc.
Referenced by DEFINE_TESTCASE().
|
inline |
End iterator over terms omitted from the query as stopwords.
Definition at line 1279 of file queryparser.h.
Referenced by DEFINE_TESTCASE().
TermIterator QueryParser::unstem_begin | ( | const std::string & | term | ) | const |
Begin iterator over unstemmed forms of the given stemmed query term.
Definition at line 240 of file queryparser.cc.
References Xapian::operator*().
Referenced by DEFINE_TESTCASE().
|
inline |
End iterator over unstemmed forms of the given stemmed query term.
Definition at line 1287 of file queryparser.h.
Referenced by DEFINE_TESTCASE().
|
private |
Reference counted internals.
Definition at line 781 of file queryparser.h.
Referenced by operator=(), and parse_query().