xapian-core  1.4.19
Public Member Functions | Private Member Functions | Private Attributes | List of all members
CJKTokenIterator Class Reference

Iterator returning unigrams and bigrams. More...

#include <cjk-tokenizer.h>

+ Collaboration diagram for CJKTokenIterator:

Public Member Functions

 CJKTokenIterator (const std::string &s)
 
 CJKTokenIterator (const Xapian::Utf8Iterator &it_)
 
 CJKTokenIterator ()
 
const std::string & operator* () const
 
CJKTokenIteratoroperator++ ()
 
bool unigram () const
 Is this a unigram? More...
 
const Xapian::Utf8Iteratorget_utf8iterator () const
 
bool operator== (const CJKTokenIterator &other) const
 
bool operator!= (const CJKTokenIterator &other) const
 

Private Member Functions

void init ()
 Call to set current_token at the start. More...
 

Private Attributes

Xapian::Utf8Iterator it
 
unsigned offset = 0
 Offset to penultimate Unicode character in current_token. More...
 
std::string current_token
 

Detailed Description

Iterator returning unigrams and bigrams.

Definition at line 56 of file cjk-tokenizer.h.

Constructor & Destructor Documentation

◆ CJKTokenIterator() [1/3]

CJKTokenIterator::CJKTokenIterator ( const std::string &  s)
inlineexplicit

Definition at line 71 of file cjk-tokenizer.h.

◆ CJKTokenIterator() [2/3]

CJKTokenIterator::CJKTokenIterator ( const Xapian::Utf8Iterator it_)
inlineexplicit

Definition at line 75 of file cjk-tokenizer.h.

◆ CJKTokenIterator() [3/3]

CJKTokenIterator::CJKTokenIterator ( )
inline

Definition at line 79 of file cjk-tokenizer.h.

Member Function Documentation

◆ get_utf8iterator()

const Xapian::Utf8Iterator& CJKTokenIterator::get_utf8iterator ( ) const
inline

Definition at line 90 of file cjk-tokenizer.h.

Referenced by Xapian::parse_terms().

◆ init()

void CJKTokenIterator::init ( )
private

Call to set current_token at the start.

Definition at line 96 of file cjk-tokenizer.cc.

References Xapian::Unicode::append_utf8(), CJK::codepoint_is_cjk(), and Xapian::Unicode::is_wordchar().

◆ operator!=()

bool CJKTokenIterator::operator!= ( const CJKTokenIterator other) const
inline

Definition at line 98 of file cjk-tokenizer.h.

◆ operator*()

const std::string& CJKTokenIterator::operator* ( ) const
inline

Definition at line 81 of file cjk-tokenizer.h.

◆ operator++()

CJKTokenIterator & CJKTokenIterator::operator++ ( )

◆ operator==()

bool CJKTokenIterator::operator== ( const CJKTokenIterator other) const
inline

Definition at line 92 of file cjk-tokenizer.h.

References current_token.

◆ unigram()

bool CJKTokenIterator::unigram ( ) const
inline

Is this a unigram?

Definition at line 88 of file cjk-tokenizer.h.

Referenced by Xapian::parse_terms().

Member Data Documentation

◆ current_token

std::string CJKTokenIterator::current_token
private

Definition at line 65 of file cjk-tokenizer.h.

Referenced by operator==().

◆ it

Xapian::Utf8Iterator CJKTokenIterator::it
private

Definition at line 57 of file cjk-tokenizer.h.

◆ offset

unsigned CJKTokenIterator::offset = 0
private

Offset to penultimate Unicode character in current_token.

If current_token has one Unicode character, this is 0.

Definition at line 63 of file cjk-tokenizer.h.


The documentation for this class was generated from the following files: