|
xapian-core
2.0.0
|
Abstract base class for a document. More...
#include <documentinternal.h>
Inheritance diagram for Xapian::Document::Internal:
Collaboration diagram for Xapian::Document::Internal:Public Types | |
| enum | remove_posting_result { OK , NO_TERM , NO_POS } |
Public Member Functions | |
| Internal () | |
| Construct an empty document. More... | |
| virtual | ~Internal () |
| We have virtual methods and want to be able to delete derived classes using a pointer to the base class, so we need a virtual destructor. More... | |
| bool | data_modified () const |
| Return true if the document data might have been modified. More... | |
| bool | terms_modified () const |
| Return true if the document's terms might have been modified. More... | |
| bool | values_modified () const |
| Return true if the document's values might have been modified. More... | |
| bool | modified () const |
| Return true if the document might have been modified in any way. More... | |
| bool | positions_modified () const |
| Return true if the document's term positions might have been modified. More... | |
| Xapian::docid | get_docid () const |
| Get the document ID this document came from. More... | |
| Xapian::doccount | get_index () const |
| Internal method used by MSet::diversify(). More... | |
| void | set_index (Xapian::doccount new_index) |
| Internal method used by MSet::diversify(). More... | |
| std::string | get_data () const |
| Get the document data. More... | |
| void | set_data (std::string_view data_) |
| Set the document data. More... | |
| void | add_term (std::string_view term, Xapian::termcount wdf_inc) |
| Add a term to this document. More... | |
| bool | remove_term (std::string_view term) |
| Remove a term from this document. More... | |
| void | add_posting (std::string_view term, Xapian::termpos term_pos, Xapian::termcount wdf_inc) |
| Add a posting for a term. More... | |
| remove_posting_result | remove_posting (std::string_view term, Xapian::termpos term_pos, Xapian::termcount wdf_dec) |
| Remove a posting for a term. More... | |
| remove_posting_result | remove_postings (std::string_view term, Xapian::termpos term_pos_first, Xapian::termpos term_pos_last, Xapian::termcount wdf_dec, Xapian::termpos &n_removed) |
| Remove a range of postings for a term. More... | |
| void | clear_terms () |
| Clear all terms from the document. More... | |
| Xapian::termcount | termlist_count () const |
| Return the number of distinct terms in this document. More... | |
| TermList * | open_term_list () const |
| Start iterating the terms in this document. More... | |
| std::string | get_value (Xapian::valueno slot) const |
| Read a value slot in this document. More... | |
| void | add_value (Xapian::valueno slot, std::string_view value) |
| Add a value to a slot in this document. More... | |
| void | clear_values () |
| Clear all value slots in this document. More... | |
| Xapian::valueno | values_count () const |
| Count the value slots used in this document. More... | |
| Xapian::ValueIterator | values_begin () const |
| std::string | get_description () const |
| Return a string describing this object. More... | |
Public Member Functions inherited from Xapian::Internal::intrusive_base | |
| intrusive_base () | |
| Construct with no references. More... | |
Protected Member Functions | |
| Internal (Xapian::Internal::intrusive_ptr< const Xapian::Database::Internal > database_, Xapian::docid did_) | |
| Constructor used by subclasses. More... | |
| Internal (const Xapian::Database::Internal *database_, Xapian::docid did_, std::string &&data_, std::map< Xapian::valueno, std::string > &&values_) | |
| Constructor used by RemoteDocument subclass. More... | |
| virtual std::string | fetch_data () const |
| Fetch the document data from the database. More... | |
| virtual void | fetch_all_values (std::map< Xapian::valueno, std::string > &values_) const |
| Fetch all set values from the database. More... | |
| virtual std::string | fetch_value (Xapian::valueno slot) const |
| Fetch a single value from the database. More... | |
Protected Attributes | |
| std::unique_ptr< std::map< Xapian::valueno, std::string > > | values |
| Document value slots and their contents. More... | |
| Xapian::Internal::intrusive_ptr< const Xapian::Database::Internal > | database |
| Database this document came from. More... | |
| Xapian::docid | did |
| The document ID this document came from in database. More... | |
Private Member Functions | |
| void | operator= (const Internal &)=delete |
| Don't allow assignment. More... | |
| Internal (const Internal &)=delete | |
| Don't allow copying. More... | |
| void | ensure_terms_fetched () const |
| Ensure terms have been fetched from database. More... | |
| void | ensure_values_fetched () const |
| Ensure values have been fetched from database. More... | |
Private Attributes | |
| std::unique_ptr< std::string > | data |
| The document data. More... | |
| std::unique_ptr< std::map< std::string, TermInfo, std::less<> > > | terms |
| Terms in the document and their associated metadata. More... | |
| Xapian::termcount | termlist_size |
| The number of distinct terms in terms. More... | |
| Xapian::doccount | index: 31 |
| An index value, unused by Document itself. More... | |
| bool | positions_modified_: 1 |
| Are there any changes to term positions in terms? More... | |
Friends | |
| class | ::DocumentTermList |
| class | ::DocumentValueList |
| class | ::GlassValueManager |
| class | ::HoneyValueManager |
| class | ::ValueStreamDocument |
Additional Inherited Members | |
Public Attributes inherited from Xapian::Internal::intrusive_base | |
| unsigned | _refs |
| Reference count. More... | |
Abstract base class for a document.
Definition at line 49 of file documentinternal.h.
| Enumerator | |
|---|---|
| OK | |
| NO_TERM | |
| NO_POS | |
Definition at line 322 of file documentinternal.h.
|
privatedelete |
Don't allow copying.
|
inlineprotected |
Constructor used by subclasses.
Definition at line 158 of file documentinternal.h.
|
inlineprotected |
Constructor used by RemoteDocument subclass.
Definition at line 163 of file documentinternal.h.
|
inline |
Construct an empty document.
Definition at line 197 of file documentinternal.h.
|
virtual |
We have virtual methods and want to be able to delete derived classes using a pointer to the base class, so we need a virtual destructor.
Definition at line 96 of file documentinternal.cc.
|
inline |
Add a posting for a term.
Definition at line 306 of file documentinternal.h.
References ensure_terms_fetched(), positions_modified_, term, termlist_size, and terms.
|
inline |
Add a term to this document.
Definition at line 274 of file documentinternal.h.
References ensure_terms_fetched(), term, termlist_size, and terms.
|
inline |
Add a value to a slot in this document.
Definition at line 428 of file documentinternal.h.
References ensure_values_fetched(), and values.
|
inline |
Clear all terms from the document.
Definition at line 376 of file documentinternal.h.
References database, Xapian::Database::Internal::has_positions(), positions_modified_, termlist_size, and terms.
|
inline |
Clear all value slots in this document.
Definition at line 441 of file documentinternal.h.
|
inline |
Return true if the document data might have been modified.
If the document is from a database, this means modifications compared to the version read, otherwise it means modifications compared to an empty database.
Definition at line 210 of file documentinternal.h.
References data.
Referenced by modified().
|
private |
Ensure terms have been fetched from database.
After this call, terms will be non-NULL. If database is NULL, terms will be initialised to an empty map if it was NULL.
Definition at line 39 of file documentinternal.cc.
References Xapian::PositionIterator::Internal::get_position(), Xapian::PositionIterator::Internal::next(), p, and term.
Referenced by add_posting(), add_term(), remove_posting(), remove_postings(), and remove_term().
|
private |
Ensure values have been fetched from database.
After this call, values will be non-NULL. If database is NULL, values will be initialised to an empty map if it was NULL.
Definition at line 66 of file documentinternal.cc.
Referenced by add_value(), and values_count().
|
protectedvirtual |
Fetch all set values from the database.
The default implementation (used when there's no associated database) clears values_.
Reimplemented in ValueStreamDocument, RemoteDocument, InMemoryDocument, HoneyDocument, and GlassDocument.
Definition at line 84 of file documentinternal.cc.
|
protectedvirtual |
Fetch the document data from the database.
The default implementation (used when there's no associated database) returns an empty string.
Reimplemented in ValueStreamDocument, RemoteDocument, InMemoryDocument, HoneyDocument, and GlassDocument.
Definition at line 78 of file documentinternal.cc.
Referenced by get_data().
|
protectedvirtual |
Fetch a single value from the database.
The default implementation (used when there's no associated database) returns an empty string.
Reimplemented in ValueStreamDocument, RemoteDocument, InMemoryDocument, HoneyDocument, and GlassDocument.
Definition at line 91 of file documentinternal.cc.
Referenced by get_value().
|
inline |
Get the document data.
Definition at line 262 of file documentinternal.h.
References data, and fetch_data().
| string Xapian::Document::Internal::get_description | ( | ) | const |
Return a string describing this object.
Definition at line 129 of file documentinternal.cc.
References description_append(), and Xapian::Internal::str().
|
inline |
Get the document ID this document came from.
If this document didn't come from a database, this will be 0.
Note that this is the docid in the sub-database when multiple databases are being searched.
Definition at line 253 of file documentinternal.h.
References did.
|
inline |
Internal method used by MSet::diversify().
Definition at line 256 of file documentinternal.h.
References index.
|
inline |
Read a value slot in this document.
Definition at line 416 of file documentinternal.h.
References fetch_value(), and values.
Referenced by Collapser::check().
|
inline |
Return true if the document might have been modified in any way.
If the document is from a database, this means modifications compared to the version read, otherwise it means modifications compared to an empty database.
Definition at line 234 of file documentinternal.h.
References data_modified(), terms_modified(), and values_modified().
| TermList * Xapian::Document::Internal::open_term_list | ( | ) | const |
Start iterating the terms in this document.
Definition at line 103 of file documentinternal.cc.
Referenced by Xapian::Document::termlist_begin().
|
privatedelete |
Don't allow assignment.
|
inline |
Return true if the document's term positions might have been modified.
If the document is from a database, this means modifications compared to the version read, otherwise it means modifications compared to an empty database.
Definition at line 244 of file documentinternal.h.
References positions_modified_.
|
inline |
Remove a posting for a term.
Definition at line 326 of file documentinternal.h.
References ensure_terms_fetched(), positions_modified_, term, termlist_size, and terms.
|
inline |
Remove a range of postings for a term.
Can only return OK or NO_TERM.
Definition at line 349 of file documentinternal.h.
References ensure_terms_fetched(), mul_overflows(), positions_modified_, term, termlist_size, and terms.
|
inline |
Remove a term from this document.
Definition at line 288 of file documentinternal.h.
References ensure_terms_fetched(), positions_modified_, term, termlist_size, and terms.
Referenced by Xapian::Document::remove_term().
|
inline |
|
inline |
Internal method used by MSet::diversify().
Definition at line 259 of file documentinternal.h.
References index.
|
inline |
Return the number of distinct terms in this document.
Definition at line 393 of file documentinternal.h.
References database, did, Xapian::Database::Internal::open_term_list(), termlist_size, and terms.
|
inline |
Return true if the document's terms might have been modified.
If the document is from a database, this means modifications compared to the version read, otherwise it means modifications compared to an empty database.
Definition at line 218 of file documentinternal.h.
References terms.
Referenced by modified().
| Xapian::ValueIterator Xapian::Document::Internal::values_begin | ( | ) | const |
Definition at line 115 of file documentinternal.cc.
|
inline |
Count the value slots used in this document.
Definition at line 455 of file documentinternal.h.
References ensure_values_fetched(), and values.
|
inline |
Return true if the document's values might have been modified.
If the document is from a database, this means modifications compared to the version read, otherwise it means modifications compared to an empty database.
Definition at line 226 of file documentinternal.h.
References values.
Referenced by modified().
|
friend |
Definition at line 50 of file documentinternal.h.
|
friend |
Definition at line 51 of file documentinternal.h.
|
friend |
Definition at line 53 of file documentinternal.h.
|
friend |
Definition at line 54 of file documentinternal.h.
|
friend |
Definition at line 55 of file documentinternal.h.
|
private |
The document data.
If NULL, this hasn't been fetched or set yet.
Definition at line 67 of file documentinternal.h.
Referenced by data_modified(), get_data(), and set_data().
|
protected |
Database this document came from.
If this document didn't come from a database, this will be NULL.
Definition at line 146 of file documentinternal.h.
Referenced by clear_terms(), clear_values(), and termlist_count().
|
protected |
The document ID this document came from in database.
If this document didn't come from a database, this will be 0.
Note that this is the docid in the sub-database when multiple databases are being searched.
Definition at line 155 of file documentinternal.h.
Referenced by get_docid(), ValueStreamDocument::set_shard_document(), and termlist_count().
|
private |
An index value, unused by Document itself.
This is used by the diversification code.
It is in a bit field with a bool flag so that it doesn't incur any additional space cost for cases where it isn't used.
The bool flag is stored in the top bit, which is likely to be very cheap to check (since it's the sign bit for a signed integer value).
We initialise this in the constructors to avoid valgrind warning that positions_modified_ is used uninitialised. Valgrind is meant to track undefined-ness at the bit level, so this shouldn't be needed. FIXME: Investigate!
Definition at line 103 of file documentinternal.h.
Referenced by get_index(), and set_index().
|
mutableprivate |
Are there any changes to term positions in terms?
If a document is read from a database, modified and then replaced at the same docid, then we can save a lot of work if we know when there are no changes to term positions, even if there are changes to terms (a common example is adding filter terms to an existing document).
It's OK for this to be true when there aren't any modifications (it just means that the backend can't shortcut as directly).
Definition at line 115 of file documentinternal.h.
Referenced by add_posting(), clear_terms(), positions_modified(), remove_posting(), remove_postings(), and remove_term().
|
mutableprivate |
The number of distinct terms in terms.
Only valid when terms is non-NULL.
This may be less than terms.size() if any terms have been deleted.
Definition at line 86 of file documentinternal.h.
Referenced by add_posting(), add_term(), clear_terms(), remove_posting(), remove_postings(), remove_term(), and termlist_count().
|
mutableprivate |
Terms in the document and their associated metadata.
If NULL, the terms haven't been fetched or set yet.
We use std::map<> rather than std::unordered_map<> because the latter invalidates existing iterators upon insert() if rehashing occurs, whereas existing iterators remain valid for std::map<>.
Definition at line 78 of file documentinternal.h.
Referenced by add_posting(), add_term(), clear_terms(), remove_posting(), remove_postings(), remove_term(), termlist_count(), and terms_modified().
|
mutableprotected |
Document value slots and their contents.
If NULL, the values haven't been fetched or set yet.
We use std::map<> rather than std::unordered_map<> because the latter invalidates existing iterators upon insert() if rehashing occurs, whereas existing iterators remain valid for std::map<>.
Definition at line 140 of file documentinternal.h.
Referenced by add_value(), clear_values(), get_value(), values_count(), and values_modified().