An iterator which returns Unicode character values from a UTF-8 encoded string.
More...
#include <unicode.h>
|
|
typedef std::input_iterator_tag | iterator_category |
| We implement the semantics of an STL input_iterator. More...
|
|
typedef unsigned | value_type |
| We implement the semantics of an STL input_iterator. More...
|
|
typedef size_t | difference_type |
| We implement the semantics of an STL input_iterator. More...
|
|
typedef const unsigned * | pointer |
| We implement the semantics of an STL input_iterator. More...
|
|
typedef const unsigned & | reference |
| We implement the semantics of an STL input_iterator. More...
|
|
|
const unsigned char * | p |
|
const unsigned char * | end |
|
unsigned | seqlen |
|
An iterator which returns Unicode character values from a UTF-8 encoded string.
Definition at line 38 of file unicode.h.
◆ difference_type
We implement the semantics of an STL input_iterator.
Definition at line 206 of file unicode.h.
◆ iterator_category
We implement the semantics of an STL input_iterator.
Definition at line 204 of file unicode.h.
◆ pointer
We implement the semantics of an STL input_iterator.
Definition at line 207 of file unicode.h.
◆ reference
We implement the semantics of an STL input_iterator.
Definition at line 208 of file unicode.h.
◆ value_type
We implement the semantics of an STL input_iterator.
Definition at line 205 of file unicode.h.
◆ Utf8Iterator() [1/5]
Xapian::Utf8Iterator::Utf8Iterator |
( |
const unsigned char * |
p_, |
|
|
const unsigned char * |
end_, |
|
|
unsigned |
seqlen_ |
|
) |
| |
|
inlineprivate |
◆ Utf8Iterator() [2/5]
Xapian::Utf8Iterator::Utf8Iterator |
( |
const char * |
p_ | ) |
|
|
explicit |
Create an iterator given a pointer to a null terminated string.
The iterator will return characters from the start of the string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.
- Parameters
-
p_ | A pointer to the start of the null terminated string to read. |
Definition at line 67 of file utf8itor.cc.
◆ Utf8Iterator() [3/5]
Xapian::Utf8Iterator::Utf8Iterator |
( |
const char * |
p_, |
|
|
size_t |
len |
|
) |
| |
|
inline |
Create an iterator given a pointer and a length.
The iterator will return characters from the start of the string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.
- Parameters
-
p_ | A pointer to the start of the string to read. |
len | The length of the string to read. |
Definition at line 114 of file unicode.h.
◆ Utf8Iterator() [4/5]
Xapian::Utf8Iterator::Utf8Iterator |
( |
const std::string & |
s | ) |
|
|
inline |
Create an iterator given a string.
The iterator will return characters from the start of the string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.
- Parameters
-
s | The string to read. Must not be modified while the iteration is in progress. |
Definition at line 125 of file unicode.h.
◆ Utf8Iterator() [5/5]
Xapian::Utf8Iterator::Utf8Iterator |
( |
| ) |
|
|
inline |
Create an iterator which is at the end of its iteration.
This can be compared to another iterator to check if the other iterator has reached its end.
Definition at line 132 of file unicode.h.
References XAPIAN_PURE_FUNCTION.
◆ assign() [1/2]
void Xapian::Utf8Iterator::assign |
( |
const char * |
p_, |
|
|
size_t |
len |
|
) |
| |
|
inline |
Assign a new string to the iterator.
The iterator will forget the string it was iterating through, and return characters from the start of the new string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.
- Parameters
-
p_ | A pointer to the start of the string to read. |
len | The length of the string to read. |
Definition at line 72 of file unicode.h.
◆ assign() [2/2]
void Xapian::Utf8Iterator::assign |
( |
const std::string & |
s | ) |
|
|
inline |
Assign a new string to the iterator.
The iterator will forget the string it was iterating through, and return characters from the start of the new string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.
- Parameters
-
s | The string to read. Must not be modified while the iteration is in progress. |
Definition at line 92 of file unicode.h.
References assign().
Referenced by assign().
◆ calculate_sequence_length()
bool Xapian::Utf8Iterator::calculate_sequence_length |
( |
| ) |
const |
|
private |
◆ get_char()
unsigned Xapian::Utf8Iterator::get_char |
( |
| ) |
const |
|
private |
◆ left()
size_t Xapian::Utf8Iterator::left |
( |
| ) |
const |
|
inline |
◆ operator!=()
bool Xapian::Utf8Iterator::operator!= |
( |
const Utf8Iterator & |
other | ) |
const |
|
inline |
Test two Utf8Iterators for inequality.
- Parameters
-
- Returns
- true iff the iterators do not point to the same position.
Definition at line 198 of file unicode.h.
References p.
◆ operator*()
unsigned Xapian::Utf8Iterator::operator* |
( |
| ) |
const |
Get the current Unicode character value pointed to by the iterator.
If an invalid UTF-8 sequence is encountered, then the byte values comprising it are returned until valid UTF-8 or the end of the input is reached.
Returns unsigned(-1) if the iterator has reached the end of its buffer.
Definition at line 115 of file utf8itor.cc.
◆ operator++() [1/2]
Move forward to the next Unicode character.
- Returns
- An iterator pointing to the position before the move.
Definition at line 161 of file unicode.h.
◆ operator++() [2/2]
Move forward to the next Unicode character.
- Returns
- A reference to this object.
Definition at line 176 of file unicode.h.
◆ operator==()
bool Xapian::Utf8Iterator::operator== |
( |
const Utf8Iterator & |
other | ) |
const |
|
inline |
Test two Utf8Iterators for equality.
- Parameters
-
- Returns
- true iff the iterators point to the same position.
Definition at line 189 of file unicode.h.
References p.
◆ raw()
const char* Xapian::Utf8Iterator::raw |
( |
| ) |
const |
|
inline |
◆ strict_deref()
unsigned Xapian::Utf8Iterator::strict_deref |
( |
| ) |
const |
|
private |
Get the current Unicode character value pointed to by the iterator.
If an invalid UTF-8 sequence is encountered, then the byte values comprising it are returned with the top bit set (so the caller can differentiate these from the same values arising from valid UTF-8) until valid UTF-8 or the end of the input is reached.
Returns unsigned(-1) if the iterator has reached the end of its buffer.
Definition at line 128 of file utf8itor.cc.
◆ end
const unsigned char* Xapian::Utf8Iterator::end |
|
private |
const unsigned char* Xapian::Utf8Iterator::p |
|
private |
◆ seqlen
unsigned Xapian::Utf8Iterator::seqlen |
|
mutableprivate |
The documentation for this class was generated from the following files: