|
xapian-core
2.0.0
|
Parses a piece of text and generate terms. More...
#include <termgenerator.h>
Collaboration diagram for Xapian::TermGenerator:Classes | |
| class | Internal |
Public Types | |
| enum | { FLAG_SPELLING = 128 , FLAG_NGRAMS = 2048 , FLAG_CJK_NGRAM = FLAG_NGRAMS , FLAG_WORD_BREAKS = 4096 } |
| Flags to OR together and pass to TermGenerator::set_flags(). More... | |
| enum | stem_strategy { STEM_NONE , STEM_SOME , STEM_ALL , STEM_ALL_Z , STEM_SOME_FULL_POS } |
| Stemming strategies, for use with set_stemming_strategy(). More... | |
| enum | stop_strategy { STOP_NONE , STOP_ALL , STOP_STEMMED } |
| Stopper strategies, for use with set_stopper_strategy(). More... | |
| typedef int | flags |
| For backward compatibility with Xapian 1.2. More... | |
Public Member Functions | |
| TermGenerator (const TermGenerator &o) | |
| Copy constructor. More... | |
| TermGenerator & | operator= (const TermGenerator &o) |
| Assignment. More... | |
| TermGenerator (TermGenerator &&o) | |
| Move constructor. More... | |
| TermGenerator & | operator= (TermGenerator &&o) |
| Move assignment operator. More... | |
| TermGenerator () | |
| Default constructor. More... | |
| ~TermGenerator () | |
| Destructor. More... | |
| void | set_stemmer (const Xapian::Stem &stemmer) |
| Set the Xapian::Stem object to be used for generating stemmed terms. More... | |
| void | set_stopper (const Xapian::Stopper *stop=NULL) |
| Set the Xapian::Stopper object to be used for identifying stopwords. More... | |
| void | set_document (const Xapian::Document &doc) |
| Set the current document. More... | |
| const Xapian::Document & | get_document () const |
| Get the current document. More... | |
| void | set_database (const Xapian::WritableDatabase &db) |
| Set the database to index spelling data to. More... | |
| flags | set_flags (flags toggle, flags mask=flags(0)) |
| Set flags. More... | |
| void | set_stemming_strategy (stem_strategy strategy) |
| Set the stemming strategy. More... | |
| void | set_stopper_strategy (stop_strategy strategy) |
| Set the stopper strategy. More... | |
| void | set_max_word_length (unsigned max_word_length) |
| Set the maximum length word to index. More... | |
| void | index_text (const Xapian::Utf8Iterator &itor, Xapian::termcount wdf_inc=1, std::string_view prefix={}) |
| Index some text. More... | |
| void | index_text (std::string_view text, Xapian::termcount wdf_inc=1, std::string_view prefix={}) |
| Index some text. More... | |
| void | index_text_without_positions (const Xapian::Utf8Iterator &itor, Xapian::termcount wdf_inc=1, std::string_view prefix={}) |
| Index some text without positional information. More... | |
| void | index_text_without_positions (std::string_view text, Xapian::termcount wdf_inc=1, std::string_view prefix={}) |
| Index some text without positional information. More... | |
| void | increase_termpos (Xapian::termpos delta=100) |
| Increase the term position used by index_text. More... | |
| Xapian::termpos | get_termpos () const |
| Get the current term position. More... | |
| void | set_termpos (Xapian::termpos termpos) |
| Set the current term position. More... | |
| void | set_termpos_limit (Xapian::termpos termpos_limit) |
| Set the term position limit. More... | |
| std::string | get_description () const |
| Return a string describing this object. More... | |
Private Attributes | |
| Xapian::Internal::intrusive_ptr_nonnull< Internal > | internal |
Parses a piece of text and generate terms.
This module takes a piece of text and parses it to produce words which are then used to generate suitable terms for indexing. The terms generated are suitable for use with Query objects produced by the QueryParser class.
Definition at line 49 of file termgenerator.h.
| typedef int Xapian::TermGenerator::flags |
For backward compatibility with Xapian 1.2.
Definition at line 97 of file termgenerator.h.
| anonymous enum |
Flags to OR together and pass to TermGenerator::set_flags().
Definition at line 100 of file termgenerator.h.
Stemming strategies, for use with set_stemming_strategy().
| Enumerator | |
|---|---|
| STEM_NONE | |
| STEM_SOME | |
| STEM_ALL | |
| STEM_ALL_Z | |
| STEM_SOME_FULL_POS | |
Definition at line 153 of file termgenerator.h.
Stopper strategies, for use with set_stopper_strategy().
| Enumerator | |
|---|---|
| STOP_NONE | |
| STOP_ALL | |
| STOP_STEMMED | |
Definition at line 158 of file termgenerator.h.
|
default |
Copy constructor.
|
default |
Move constructor.
| TermGenerator::TermGenerator | ( | ) |
Default constructor.
Definition at line 46 of file termgenerator.cc.
| TermGenerator::~TermGenerator | ( | ) |
Destructor.
Definition at line 48 of file termgenerator.cc.
| string TermGenerator::get_description | ( | ) | const |
Return a string describing this object.
Definition at line 148 of file termgenerator.cc.
References Xapian::TermGenerator::Internal::cur_pos, internal, Xapian::TermGenerator::Internal::stopper, and Xapian::Internal::str().
Referenced by DEFINE_TESTCASE().
| const Xapian::Document & TermGenerator::get_document | ( | ) | const |
Get the current document.
Definition at line 70 of file termgenerator.cc.
| Xapian::termpos TermGenerator::get_termpos | ( | ) | const |
Get the current term position.
Definition at line 130 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE().
| void TermGenerator::increase_termpos | ( | Xapian::termpos | delta = 100 | ) |
Increase the term position used by index_text.
This can be used between indexing text from different fields or other places to prevent phrase searches from spanning between them (e.g. between the title and body text, or between two chapters in a book).
| delta | Amount to increase the term position by (default: 100). |
Definition at line 124 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE().
| void TermGenerator::index_text | ( | const Xapian::Utf8Iterator & | itor, |
| Xapian::termcount | wdf_inc = 1, |
||
| std::string_view | prefix = {} |
||
| ) |
Index some text.
| itor | Utf8Iterator pointing to the text to index. |
| wdf_inc | The wdf increment (default 1). |
| prefix | The term prefix to use (default is no prefix). |
Definition at line 108 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE(), main(), make_netstats1_db(), and make_tg_db().
|
inline |
Index some text.
| text | The text to index. |
| wdf_inc | The wdf increment (default 1). |
| prefix | The term prefix to use (default is no prefix). |
Definition at line 249 of file termgenerator.h.
| void TermGenerator::index_text_without_positions | ( | const Xapian::Utf8Iterator & | itor, |
| Xapian::termcount | wdf_inc = 1, |
||
| std::string_view | prefix = {} |
||
| ) |
Index some text without positional information.
Just like index_text, but no positional information is generated. This means that the database will be significantly smaller, but that phrase searching and NEAR won't be supported.
| itor | Utf8Iterator pointing to the text to index. |
| wdf_inc | The wdf increment (default 1). |
| prefix | The term prefix to use (default is no prefix). |
Definition at line 116 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE().
|
inline |
Index some text without positional information.
Just like index_text, but no positional information is generated. This means that the database will be significantly smaller, but that phrase searching and NEAR won't be supported.
| text | The text to index. |
| wdf_inc | The wdf increment (default 1). |
| prefix | The term prefix to use (default is no prefix). |
Definition at line 279 of file termgenerator.h.
|
default |
Assignment.
|
default |
Move assignment operator.
| void TermGenerator::set_database | ( | const Xapian::WritableDatabase & | db | ) |
Set the database to index spelling data to.
Definition at line 76 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE().
| void TermGenerator::set_document | ( | const Xapian::Document & | doc | ) |
Set the current document.
Definition at line 63 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE(), main(), make_netstats1_db(), and make_tg_db().
| TermGenerator::flags TermGenerator::set_flags | ( | flags | toggle, |
| flags | mask = flags(0) |
||
| ) |
Set flags.
The new value of flags is: (flags & mask) ^ toggle
To just set the flags, pass the new flags in toggle and the default value for mask.
| toggle | Flags to XOR. |
| mask | Flags to AND with first. |
Definition at line 82 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE().
| void TermGenerator::set_max_word_length | ( | unsigned | max_word_length | ) |
Set the maximum length word to index.
The limit is on the length of a word prior to stemming and prior to adding any term prefix.
The backends mostly impose a limit on the length of terms (often of about 240 bytes), but it's generally useful to have a lower limit to help prevent the index being bloated by useless junk terms from trying to indexing things like binary data, uuencoded data, ASCII art, etc.
| max_word_length | The maximum length word to index, in bytes in UTF-8 representation. Default is 64. |
Definition at line 102 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE().
| void TermGenerator::set_stemmer | ( | const Xapian::Stem & | stemmer | ) |
Set the Xapian::Stem object to be used for generating stemmed terms.
Definition at line 51 of file termgenerator.cc.
References stemmer.
Referenced by DEFINE_TESTCASE(), main(), make_netstats1_db(), and make_tg_db().
| void TermGenerator::set_stemming_strategy | ( | stem_strategy | strategy | ) |
Set the stemming strategy.
This method controls how the stemming algorithm is applied.
| strategy | The strategy to use - possible values are:
|
Definition at line 90 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE(), main(), and make_netstats1_db().
| void TermGenerator::set_stopper | ( | const Xapian::Stopper * | stop = NULL | ) |
Set the Xapian::Stopper object to be used for identifying stopwords.
Stemmed forms of stopwords aren't indexed, but unstemmed forms still are so that searches for phrases including stop words still work.
| stop | The Stopper object to set (default NULL, which means no stopwords). |
Definition at line 57 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE().
| void TermGenerator::set_stopper_strategy | ( | stop_strategy | strategy | ) |
Set the stopper strategy.
The method controls how the stopper is used.
You need to also call set_stopper() for this to have any effect.
| strategy | The strategy to use - possible values are:
|
Definition at line 96 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE().
| void TermGenerator::set_termpos | ( | Xapian::termpos | termpos | ) |
Set the current term position.
| termpos | The new term position to set. |
Definition at line 136 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE().
| void TermGenerator::set_termpos_limit | ( | Xapian::termpos | termpos_limit | ) |
Set the term position limit.
| termpos_limit | Upper bound on term positions that can be added. |
By default the only limit is the maximum value of the Xapian::termpos type.
Definition at line 142 of file termgenerator.cc.
Referenced by DEFINE_TESTCASE().
|
private |
Reference counted internals.
Definition at line 54 of file termgenerator.h.
Referenced by get_description().