xapian-core  1.4.26
Public Member Functions | Private Member Functions | Private Attributes | List of all members
Xapian::Utf8Iterator Class Reference

An iterator which returns Unicode character values from a UTF-8 encoded string. More...

#include <unicode.h>

Public Types

typedef std::input_iterator_tag iterator_category
 We implement the semantics of an STL input_iterator. More...
 
typedef unsigned value_type
 We implement the semantics of an STL input_iterator. More...
 
typedef size_t difference_type
 We implement the semantics of an STL input_iterator. More...
 
typedef const unsigned * pointer
 We implement the semantics of an STL input_iterator. More...
 
typedef const unsigned & reference
 We implement the semantics of an STL input_iterator. More...
 

Public Member Functions

const char * raw () const
 Return the raw const char* pointer for the current position. More...
 
size_t left () const
 Return the number of bytes left in the iterator's buffer. More...
 
void assign (const char *p_, size_t len)
 Assign a new string to the iterator. More...
 
void assign (const std::string &s)
 Assign a new string to the iterator. More...
 
 Utf8Iterator (const char *p_)
 Create an iterator given a pointer to a null terminated string. More...
 
 Utf8Iterator (const char *p_, size_t len)
 Create an iterator given a pointer and a length. More...
 
 Utf8Iterator (const std::string &s)
 Create an iterator given a string. More...
 
 Utf8Iterator ()
 Create an iterator which is at the end of its iteration. More...
 
unsigned operator* () const
 Get the current Unicode character value pointed to by the iterator. More...
 
Utf8Iterator operator++ (int)
 Move forward to the next Unicode character. More...
 
Utf8Iteratoroperator++ ()
 Move forward to the next Unicode character. More...
 
bool operator== (const Utf8Iterator &other) const
 Test two Utf8Iterators for equality. More...
 
bool operator!= (const Utf8Iterator &other) const
 Test two Utf8Iterators for inequality. More...
 

Private Member Functions

bool calculate_sequence_length () const
 
unsigned get_char () const
 
 Utf8Iterator (const unsigned char *p_, const unsigned char *end_, unsigned seqlen_)
 
unsigned strict_deref () const
 

Private Attributes

const unsigned char * p
 
const unsigned char * end
 
unsigned seqlen
 

Detailed Description

An iterator which returns Unicode character values from a UTF-8 encoded string.

Definition at line 38 of file unicode.h.

Member Typedef Documentation

◆ difference_type

We implement the semantics of an STL input_iterator.

Definition at line 206 of file unicode.h.

◆ iterator_category

typedef std::input_iterator_tag Xapian::Utf8Iterator::iterator_category

We implement the semantics of an STL input_iterator.

Definition at line 204 of file unicode.h.

◆ pointer

typedef const unsigned* Xapian::Utf8Iterator::pointer

We implement the semantics of an STL input_iterator.

Definition at line 207 of file unicode.h.

◆ reference

typedef const unsigned& Xapian::Utf8Iterator::reference

We implement the semantics of an STL input_iterator.

Definition at line 208 of file unicode.h.

◆ value_type

We implement the semantics of an STL input_iterator.

Definition at line 205 of file unicode.h.

Constructor & Destructor Documentation

◆ Utf8Iterator() [1/5]

Xapian::Utf8Iterator::Utf8Iterator ( const unsigned char *  p_,
const unsigned char *  end_,
unsigned  seqlen_ 
)
inlineprivate

Definition at line 47 of file unicode.h.

◆ Utf8Iterator() [2/5]

Xapian::Utf8Iterator::Utf8Iterator ( const char *  p_)
explicit

Create an iterator given a pointer to a null terminated string.

The iterator will return characters from the start of the string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.

Parameters
p_A pointer to the start of the null terminated string to read.

Definition at line 67 of file utf8itor.cc.

◆ Utf8Iterator() [3/5]

Xapian::Utf8Iterator::Utf8Iterator ( const char *  p_,
size_t  len 
)
inline

Create an iterator given a pointer and a length.

The iterator will return characters from the start of the string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.

Parameters
p_A pointer to the start of the string to read.
lenThe length of the string to read.

Definition at line 114 of file unicode.h.

◆ Utf8Iterator() [4/5]

Xapian::Utf8Iterator::Utf8Iterator ( const std::string &  s)
inline

Create an iterator given a string.

The iterator will return characters from the start of the string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.

Parameters
sThe string to read. Must not be modified while the iteration is in progress.

Definition at line 125 of file unicode.h.

◆ Utf8Iterator() [5/5]

Xapian::Utf8Iterator::Utf8Iterator ( )
inline

Create an iterator which is at the end of its iteration.

This can be compared to another iterator to check if the other iterator has reached its end.

Definition at line 132 of file unicode.h.

References XAPIAN_PURE_FUNCTION.

Member Function Documentation

◆ assign() [1/2]

void Xapian::Utf8Iterator::assign ( const char *  p_,
size_t  len 
)
inline

Assign a new string to the iterator.

The iterator will forget the string it was iterating through, and return characters from the start of the new string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.

Parameters
p_A pointer to the start of the string to read.
lenThe length of the string to read.

Definition at line 72 of file unicode.h.

◆ assign() [2/2]

void Xapian::Utf8Iterator::assign ( const std::string &  s)
inline

Assign a new string to the iterator.

The iterator will forget the string it was iterating through, and return characters from the start of the new string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.

Parameters
sThe string to read. Must not be modified while the iteration is in progress.

Definition at line 92 of file unicode.h.

References assign().

Referenced by assign().

◆ calculate_sequence_length()

bool Xapian::Utf8Iterator::calculate_sequence_length ( ) const
private

Definition at line 73 of file utf8itor.cc.

References bad_cont().

◆ get_char()

unsigned Xapian::Utf8Iterator::get_char ( ) const
private

◆ left()

size_t Xapian::Utf8Iterator::left ( ) const
inline

Return the number of bytes left in the iterator's buffer.

Definition at line 59 of file unicode.h.

Referenced by Xapian::MSet::Internal::snippet().

◆ operator!=()

bool Xapian::Utf8Iterator::operator!= ( const Utf8Iterator other) const
inline

Test two Utf8Iterators for inequality.

Parameters
otherThe Utf8Iterator to compare this one with.
Returns
true iff the iterators do not point to the same position.

Definition at line 198 of file unicode.h.

References p.

◆ operator*()

unsigned Xapian::Utf8Iterator::operator* ( ) const

Get the current Unicode character value pointed to by the iterator.

If an invalid UTF-8 sequence is encountered, then the byte values comprising it are returned until valid UTF-8 or the end of the input is reached.

Returns unsigned(-1) if the iterator has reached the end of its buffer.

Definition at line 115 of file utf8itor.cc.

◆ operator++() [1/2]

Utf8Iterator Xapian::Utf8Iterator::operator++ ( int  )
inline

Move forward to the next Unicode character.

Returns
An iterator pointing to the position before the move.

Definition at line 161 of file unicode.h.

◆ operator++() [2/2]

Utf8Iterator& Xapian::Utf8Iterator::operator++ ( )
inline

Move forward to the next Unicode character.

Returns
A reference to this object.

Definition at line 176 of file unicode.h.

◆ operator==()

bool Xapian::Utf8Iterator::operator== ( const Utf8Iterator other) const
inline

Test two Utf8Iterators for equality.

Parameters
otherThe Utf8Iterator to compare this one with.
Returns
true iff the iterators point to the same position.

Definition at line 189 of file unicode.h.

References p.

◆ raw()

const char* Xapian::Utf8Iterator::raw ( ) const
inline

Return the raw const char* pointer for the current position.

Definition at line 54 of file unicode.h.

Referenced by Xapian::SnipPipe::drain(), and Xapian::QueryParser::Internal::parse_term().

◆ strict_deref()

unsigned Xapian::Utf8Iterator::strict_deref ( ) const
private

Get the current Unicode character value pointed to by the iterator.

If an invalid UTF-8 sequence is encountered, then the byte values comprising it are returned with the top bit set (so the caller can differentiate these from the same values arising from valid UTF-8) until valid UTF-8 or the end of the input is reached.

Returns unsigned(-1) if the iterator has reached the end of its buffer.

Definition at line 128 of file utf8itor.cc.

Member Data Documentation

◆ end

const unsigned char* Xapian::Utf8Iterator::end
private

Definition at line 40 of file unicode.h.

◆ p

const unsigned char* Xapian::Utf8Iterator::p
private

Definition at line 39 of file unicode.h.

Referenced by operator!=(), and operator==().

◆ seqlen

unsigned Xapian::Utf8Iterator::seqlen
mutableprivate

Definition at line 41 of file unicode.h.


The documentation for this class was generated from the following files: