|
xapian-core
1.4.30
|
Iterator returning unigrams and bigrams. More...
#include <word-breaker.h>
Collaboration diagram for NgramIterator:Public Member Functions | |
| NgramIterator (const std::string &s) | |
| NgramIterator (const Xapian::Utf8Iterator &it_) | |
| NgramIterator () | |
| const std::string & | operator* () const |
| NgramIterator & | operator++ () |
| bool | unigram () const |
| Is this a unigram? More... | |
| const Xapian::Utf8Iterator & | get_utf8iterator () const |
| bool | operator== (const NgramIterator &other) const |
| bool | operator!= (const NgramIterator &other) const |
Private Member Functions | |
| void | init () |
| Call to set current_token at the start. More... | |
Private Attributes | |
| Xapian::Utf8Iterator | it |
| unsigned | offset = 0 |
| Offset to penultimate Unicode character in current_token. More... | |
| std::string | current_token |
Iterator returning unigrams and bigrams.
Definition at line 52 of file word-breaker.h.
|
inlineexplicit |
Definition at line 67 of file word-breaker.h.
References init().
|
inlineexplicit |
Definition at line 71 of file word-breaker.h.
References init().
|
inline |
Definition at line 75 of file word-breaker.h.
|
inline |
|
private |
Call to set current_token at the start.
Definition at line 96 of file word-breaker.cc.
References Xapian::Unicode::append_utf8(), is_unbroken_script(), and Xapian::Unicode::is_wordchar().
Referenced by NgramIterator().
|
inline |
Definition at line 94 of file word-breaker.h.
|
inline |
Definition at line 77 of file word-breaker.h.
References current_token.
| NgramIterator & NgramIterator::operator++ | ( | ) |
Definition at line 110 of file word-breaker.cc.
References Xapian::Unicode::append_utf8(), is_unbroken_script(), and Xapian::Unicode::is_wordchar().
|
inline |
Definition at line 88 of file word-breaker.h.
References current_token.
|
inline |
Is this a unigram?
Definition at line 84 of file word-breaker.h.
References offset.
Referenced by Xapian::parse_terms().
|
private |
Definition at line 61 of file word-breaker.h.
Referenced by operator*(), and operator==().
|
private |
Definition at line 53 of file word-breaker.h.
Referenced by get_utf8iterator().
|
private |
Offset to penultimate Unicode character in current_token.
If current_token has one Unicode character, this is 0.
Definition at line 59 of file word-breaker.h.
Referenced by unigram().