|
xapian-core
1.4.22
|
Iterator returning unigrams and bigrams. More...
#include <cjk-tokenizer.h>
Collaboration diagram for CJKTokenIterator:Public Member Functions | |
| CJKTokenIterator (const std::string &s) | |
| CJKTokenIterator (const Xapian::Utf8Iterator &it_) | |
| CJKTokenIterator () | |
| const std::string & | operator* () const |
| CJKTokenIterator & | operator++ () |
| bool | unigram () const |
| Is this a unigram? More... | |
| const Xapian::Utf8Iterator & | get_utf8iterator () const |
| bool | operator== (const CJKTokenIterator &other) const |
| bool | operator!= (const CJKTokenIterator &other) const |
Private Member Functions | |
| void | init () |
| Call to set current_token at the start. More... | |
Private Attributes | |
| Xapian::Utf8Iterator | it |
| unsigned | offset = 0 |
| Offset to penultimate Unicode character in current_token. More... | |
| std::string | current_token |
Iterator returning unigrams and bigrams.
Definition at line 56 of file cjk-tokenizer.h.
|
inlineexplicit |
Definition at line 71 of file cjk-tokenizer.h.
|
inlineexplicit |
Definition at line 75 of file cjk-tokenizer.h.
|
inline |
Definition at line 79 of file cjk-tokenizer.h.
|
inline |
Definition at line 90 of file cjk-tokenizer.h.
Referenced by Xapian::parse_terms().
|
private |
Call to set current_token at the start.
Definition at line 96 of file cjk-tokenizer.cc.
References Xapian::Unicode::append_utf8(), CJK::codepoint_is_cjk(), and Xapian::Unicode::is_wordchar().
|
inline |
Definition at line 98 of file cjk-tokenizer.h.
|
inline |
Definition at line 81 of file cjk-tokenizer.h.
| CJKTokenIterator & CJKTokenIterator::operator++ | ( | ) |
Definition at line 109 of file cjk-tokenizer.cc.
References Xapian::Unicode::append_utf8(), CJK::codepoint_is_cjk(), and Xapian::Unicode::is_wordchar().
|
inline |
Definition at line 92 of file cjk-tokenizer.h.
References current_token.
|
inline |
Is this a unigram?
Definition at line 88 of file cjk-tokenizer.h.
Referenced by Xapian::parse_terms().
|
private |
Definition at line 65 of file cjk-tokenizer.h.
Referenced by operator==().
|
private |
Definition at line 57 of file cjk-tokenizer.h.
|
private |
Offset to penultimate Unicode character in current_token.
If current_token has one Unicode character, this is 0.
Definition at line 63 of file cjk-tokenizer.h.