Xapian::TermGenerator Class Reference

Parses a piece of text and generate terms. More...

#include <termgenerator.h>

Collaboration diagram for Xapian::TermGenerator:

Collaboration graph
[legend]

List of all members.

Classes

class  Internal

Public Types

enum  flags { FLAG_SPELLING = 128 }
 Flags to OR together and pass to TermGenerator::set_flags(). More...
enum  stem_strategy { STEM_NONE, STEM_SOME, STEM_ALL, STEM_ALL_Z }
 Stemming strategies, for use with set_stemming_strategy(). More...

Public Member Functions

 TermGenerator (const TermGenerator &o)
 Copy constructor.
TermGeneratoroperator= (const TermGenerator &o)
 Assignment.
 TermGenerator ()
 Default constructor.
 ~TermGenerator ()
 Destructor.
void set_stemmer (const Xapian::Stem &stemmer)
 Set the Xapian::Stem object to be used for generating stemmed terms.
void set_stopper (const Xapian::Stopper *stop=NULL)
 Set the Xapian::Stopper object to be used for identifying stopwords.
void set_document (const Xapian::Document &doc)
 Set the current document.
const Xapian::Documentget_document () const
 Get the current document.
void set_database (const Xapian::WritableDatabase &db)
 Set the database to index spelling data to.
flags set_flags (flags toggle, flags mask=flags(0))
 Set flags.
void set_stemming_strategy (stem_strategy strategy)
 Set the stemming strategy.
void set_max_word_length (unsigned max_word_length)
 Set the maximum length word to index.
void index_text (const Xapian::Utf8Iterator &itor, Xapian::termcount wdf_inc=1, const std::string &prefix=std::string())
 Index some text.
void index_text (const std::string &text, Xapian::termcount wdf_inc=1, const std::string &prefix=std::string())
 Index some text in a std::string.
void index_text_without_positions (const Xapian::Utf8Iterator &itor, Xapian::termcount wdf_inc=1, const std::string &prefix=std::string())
 Index some text without positional information.
void index_text_without_positions (const std::string &text, Xapian::termcount wdf_inc=1, const std::string &prefix=std::string())
 Index some text in a std::string without positional information.
void increase_termpos (Xapian::termcount delta=100)
 Increase the term position used by index_text.
Xapian::termcount get_termpos () const
 Get the current term position.
void set_termpos (Xapian::termcount termpos)
 Set the current term position.
std::string get_description () const
 Return a string describing this object.

Private Attributes

Xapian::Internal::RefCntPtr
< Internal
internal


Detailed Description

Parses a piece of text and generate terms.

This module takes a piece of text and parses it to produce words which are then used to generate suitable terms for indexing. The terms generated are suitable for use with Query objects produced by the QueryParser class.

Definition at line 44 of file termgenerator.h.


Member Enumeration Documentation

Flags to OR together and pass to TermGenerator::set_flags().

Enumerator:
FLAG_SPELLING  Index data required for spelling correction.

Definition at line 86 of file termgenerator.h.

Stemming strategies, for use with set_stemming_strategy().

Enumerator:
STEM_NONE 
STEM_SOME 
STEM_ALL 
STEM_ALL_Z 

Definition at line 92 of file termgenerator.h.


Constructor & Destructor Documentation

TermGenerator::TermGenerator ( const TermGenerator o  ) 

Copy constructor.

Definition at line 34 of file termgenerator.cc.

TermGenerator::TermGenerator (  ) 

Default constructor.

Definition at line 42 of file termgenerator.cc.

TermGenerator::~TermGenerator (  ) 

Destructor.

Definition at line 44 of file termgenerator.cc.


Member Function Documentation

string TermGenerator::get_description (  )  const

Return a string describing this object.

Definition at line 132 of file termgenerator.cc.

References internal, and Xapian::Internal::str().

Referenced by DEFINE_TESTCASE().

const Xapian::Document & TermGenerator::get_document (  )  const

Get the current document.

Definition at line 66 of file termgenerator.cc.

Xapian::termcount TermGenerator::get_termpos (  )  const

Get the current term position.

Definition at line 120 of file termgenerator.cc.

void TermGenerator::increase_termpos ( Xapian::termcount  delta = 100  ) 

Increase the term position used by index_text.

This can be used between indexing text from different fields or other places to prevent phrase searches from spanning between them (e.g. between the title and body text, or between two chapters in a book).

Parameters:
delta Amount to increase the term position by (default: 100).

Definition at line 114 of file termgenerator.cc.

Referenced by test_termgen1().

void Xapian::TermGenerator::index_text ( const std::string &  text,
Xapian::termcount  wdf_inc = 1,
const std::string &  prefix = std::string() 
) [inline]

Index some text in a std::string.

Parameters:
text The text to index.
wdf_inc The wdf increment (default 1).
prefix The term prefix to use (default is no prefix).

Definition at line 156 of file termgenerator.h.

void TermGenerator::index_text ( const Xapian::Utf8Iterator itor,
Xapian::termcount  wdf_inc = 1,
const std::string &  prefix = std::string() 
)

Index some text.

Parameters:
itor Utf8Iterator pointing to the text to index.
wdf_inc The wdf increment (default 1).
prefix The term prefix to use (default is no prefix).

Definition at line 98 of file termgenerator.cc.

Referenced by main(), test_termgen1(), test_tg_max_word_length1(), test_tg_spell1(), and test_tg_spell2().

void Xapian::TermGenerator::index_text_without_positions ( const std::string &  text,
Xapian::termcount  wdf_inc = 1,
const std::string &  prefix = std::string() 
) [inline]

Index some text in a std::string without positional information.

Just like index_text, but no positional information is generated. This means that the database will be significantly smaller, but that phrase searching and NEAR won't be supported.

Parameters:
text The text to index.
wdf_inc The wdf increment (default 1).
prefix The term prefix to use (default is no prefix).

Definition at line 186 of file termgenerator.h.

void TermGenerator::index_text_without_positions ( const Xapian::Utf8Iterator itor,
Xapian::termcount  wdf_inc = 1,
const std::string &  prefix = std::string() 
)

Index some text without positional information.

Just like index_text, but no positional information is generated. This means that the database will be significantly smaller, but that phrase searching and NEAR won't be supported.

Parameters:
itor Utf8Iterator pointing to the text to index.
wdf_inc The wdf increment (default 1).
prefix The term prefix to use (default is no prefix).

Definition at line 106 of file termgenerator.cc.

Referenced by test_termgen1().

TermGenerator & TermGenerator::operator= ( const TermGenerator o  ) 

Assignment.

Definition at line 37 of file termgenerator.cc.

References internal.

void TermGenerator::set_database ( const Xapian::WritableDatabase db  ) 

Set the database to index spelling data to.

Definition at line 72 of file termgenerator.cc.

Referenced by test_tg_spell1().

void TermGenerator::set_document ( const Xapian::Document doc  ) 

Set the current document.

Definition at line 59 of file termgenerator.cc.

Referenced by main(), test_termgen1(), test_tg_max_word_length1(), test_tg_spell1(), and test_tg_spell2().

TermGenerator::flags TermGenerator::set_flags ( flags  toggle,
flags  mask = flags(0) 
)

Set flags.

The new value of flags is: (flags & mask) ^ toggle

To just set the flags, pass the new flags in toggle and the default value for mask.

Parameters:
toggle Flags to XOR.
mask Flags to AND with first.
Returns:
The old flags setting.

Definition at line 78 of file termgenerator.cc.

Referenced by test_tg_spell1(), and test_tg_spell2().

void TermGenerator::set_max_word_length ( unsigned  max_word_length  ) 

Set the maximum length word to index.

The limit is on the length of a word prior to stemming and prior to adding any term prefix.

The backends mostly impose a limit on the length of terms (often of about 240 bytes), but it's generally useful to have a lower limit to help prevent the index being bloated by useless junk terms from trying to indexing things like binary data, uuencoded data, ASCII art, etc.

This method was new in Xapian 1.3.1.

Parameters:
max_word_length The maximum length word to index, in bytes in UTF-8 representation. Default is 64.

Definition at line 92 of file termgenerator.cc.

Referenced by test_tg_max_word_length1().

void TermGenerator::set_stemmer ( const Xapian::Stem stemmer  ) 

Set the Xapian::Stem object to be used for generating stemmed terms.

Definition at line 47 of file termgenerator.cc.

Referenced by main(), test_termgen1(), and test_tg_max_word_length1().

void TermGenerator::set_stemming_strategy ( stem_strategy  strategy  ) 

Set the stemming strategy.

This method controls how the stemming algorithm is applied. It was new in Xapian 1.3.1.

Parameters:
strategy The strategy to use - possible values are:
  • STEM_NONE: Don't perform any stemming - only unstemmed terms are generated.
  • STEM_SOME: Generate both stemmed (with a "Z" prefix) and unstemmed terms. This is the default strategy.
  • STEM_ALL: Generate only stemmed terms (but without a "Z" prefix).
  • STEM_ALL_Z: Generate only stemmed terms (with a "Z" prefix).

Definition at line 86 of file termgenerator.cc.

Referenced by test_termgen1().

void TermGenerator::set_stopper ( const Xapian::Stopper stop = NULL  ) 

Set the Xapian::Stopper object to be used for identifying stopwords.

Stemmed forms of stopwords aren't indexed, but unstemmed forms still are so that searches for phrases including stop words still work.

Parameters:
stop The Stopper object to set (default NULL, which means no stopwords).

Definition at line 53 of file termgenerator.cc.

void TermGenerator::set_termpos ( Xapian::termcount  termpos  ) 

Set the current term position.

Parameters:
termpos The new term position to set.

Definition at line 126 of file termgenerator.cc.


Member Data Documentation

For internal use only.

Reference counted internals.

Definition at line 47 of file termgenerator.h.

Referenced by get_description(), and operator=().


The documentation for this class was generated from the following files:

Documentation for Xapian (version 1.2.13).
Generated on 9 Jan 2013 by Doxygen 1.5.9.