|
xapian-core
2.0.0
|
Class representing a document. More...
#include <document.h>
Collaboration diagram for Xapian::Document:Classes | |
| class | Internal |
| Abstract base class for a document. More... | |
Public Member Functions | |
| Document (const Document &o) | |
| Copy constructor. More... | |
| Document & | operator= (const Document &o) |
| Assignment operator. More... | |
| Document (Document &&o) | |
| Move constructor. More... | |
| Document & | operator= (Document &&o) |
| Move assignment operator. More... | |
| Document () | |
| Default constructor. More... | |
| ~Document () | |
| Destructor. More... | |
| Xapian::docid | get_docid () const |
| Get the document ID this document came from. More... | |
| std::string | get_data () const |
| Get the document data. More... | |
| void | set_data (std::string_view data) |
| Set the document data. More... | |
| void | add_term (std::string_view term, Xapian::termcount wdf_inc=1) |
| Add a term to this document. More... | |
| void | add_boolean_term (std::string_view term) |
| Add a boolean filter term to the document. More... | |
| void | remove_term (std::string_view term) |
| Remove a term from this document. More... | |
| void | add_posting (std::string_view term, Xapian::termpos term_pos, Xapian::termcount wdf_inc=1) |
| Add a posting for a term. More... | |
| void | remove_posting (std::string_view term, Xapian::termpos term_pos, Xapian::termcount wdf_dec=1) |
| Remove posting for a term. More... | |
| Xapian::termpos | remove_postings (std::string_view term, Xapian::termpos term_pos_first, Xapian::termpos term_pos_last, Xapian::termcount wdf_dec=1) |
| Remove a range of postings for a term. More... | |
| void | clear_terms () |
| Clear all terms from the document. More... | |
| Xapian::termcount | termlist_count () const |
| Return the number of distinct terms in this document. More... | |
| TermIterator | termlist_begin () const |
| Start iterating the terms in this document. More... | |
| TermIterator | termlist_end () const noexcept |
| End iterator corresponding to termlist_begin(). More... | |
| std::string | get_value (Xapian::valueno slot) const |
| Read a value slot in this document. More... | |
| void | add_value (Xapian::valueno slot, std::string_view value) |
| Add a value to a slot in this document. More... | |
| void | remove_value (Xapian::valueno slot) |
| Remove any value from the specified slot. More... | |
| void | clear_values () |
| Clear all value slots in this document. More... | |
| Xapian::valueno | values_count () const |
| Count the value slots used in this document. More... | |
| ValueIterator | values_begin () const |
| Start iterating the values in this document. More... | |
| ValueIterator | values_end () const noexcept |
| End iterator corresponding to values_begin(). More... | |
| void | swap (Document &o) |
| Efficiently swap this Document object with another. More... | |
| std::string | serialise () const |
| Serialise document into a string. More... | |
| std::string | get_description () const |
| Return a string describing this object. More... | |
Static Public Member Functions | |
| static Document | unserialise (std::string_view serialised) |
| Unserialise a document from a string produced by serialise(). More... | |
Private Member Functions | |
| Document (Internal *) | |
Private Attributes | |
| Xapian::Internal::intrusive_ptr_nonnull< Internal > | internal |
Class representing a document.
The term "document" shouldn't be taken too literally - really it's a "thing to retrieve", as the list of search results is essentially a list of documents.
Document objects fetch information from the database lazily. Usually this behaviour isn't visible to users (except for the speed benefits), but if the document in the database is modified or deleted then preexisting Document objects may return the old or new versions of data (or throw Xapian::DocNotFoundError in the case of deletion).
Since Database objects work on a snapshot of the database's state, the situation above can only happen with a WritableDatabase object, or if you call Database::reopen() on the Database object which you got the Document from.
We recommend you avoid designs where this behaviour is an issue, but if you need a way to make a non-lazy version of a Document object, you can do this like so:
doc = Xapian::Document::unserialise(doc.serialise());
Definition at line 64 of file document.h.
|
explicitprivate |
Wrap an existing Internal.
Definition at line 45 of file document.cc.
|
default |
Copy constructor.
The internals are reference counted, so copying is cheap.
|
default |
Move constructor.
| Xapian::Document::Document | ( | ) |
| Xapian::Document::~Document | ( | ) |
Destructor.
Definition at line 64 of file document.cc.
|
inline |
Add a boolean filter term to the document.
This method adds term to the document with wdf of 0 - this is generally what you want for a term used for boolean filtering as the wdf of such terms is ignored, and it doesn't make sense for them to contribute to the document's length.
If the specified term already indexes this document, this method has no effect.
It is exactly the same as add_term(term, 0) and is provided as a way to make a common operation more explicit.
| term | The term to add. |
Definition at line 145 of file document.h.
References term.
Referenced by DEFINE_TESTCASE(), gen_uniqterms_gt_doclen_db(), and make_topercent7_db().
| void Xapian::Document::add_posting | ( | std::string_view | term, |
| Xapian::termpos | term_pos, | ||
| Xapian::termcount | wdf_inc = 1 |
||
| ) |
Add a posting for a term.
Definition at line 111 of file document.cc.
References term, and throw_invalid_arg_empty_term().
Referenced by DEFINE_TESTCASE(), gen_longpositionlist1_db(), gen_uniqterms_gt_doclen_db(), FileIndexer::index_to(), make_phrasebug1_db(), and unserialise_document().
| void Xapian::Document::add_term | ( | std::string_view | term, |
| Xapian::termcount | wdf_inc = 1 |
||
| ) |
Add a term to this document.
Definition at line 87 of file document.cc.
References term, and throw_invalid_arg_empty_term().
Referenced by basic_doc(), builddb_queries1(), builddb_valuestest1(), DEFINE_TESTCASE(), gen_consistency2_db(), gen_decvalwtsource3_db(), gen_lazytablebug1_db(), gen_multicharwildcard1_db(), gen_qp_flag_partial1_db(), gen_singlecharwildcard1_db(), gen_subdbwithoutpos1_db(), gen_uniqterms_gt_doclen_db(), gen_wdf_eq_doclen_db(), make_all_tables(), make_all_tables2(), make_matchspy2_db(), make_missing_tables(), make_msize1_db(), make_msize2_db(), make_multichunk_db(), make_orcheck_db(), make_ordecay_db(), make_sparse_db(), make_topercent7_db(), make_xordecay1_db(), and unserialise_document().
| void Xapian::Document::add_value | ( | Xapian::valueno | slot, |
| std::string_view | value | ||
| ) |
Add a value to a slot in this document.
| slot | The slot to set |
| value | The new value |
Definition at line 191 of file document.cc.
Referenced by bigoaddvalue1_helper(), builddb_coords1(), builddb_valuestest1(), DEFINE_TESTCASE(), gen_consistency2_db(), gen_decvalwtsource3_db(), gen_decvalwtsource5_db(), gen_qp_range3_db(), gen_valuestats6_db(), gen_valueweightsource5_db(), FileIndexer::index_to(), make_matchspy2_db(), make_msize1_db(), make_msize2_db(), make_orcheck_db(), make_singularvalue_db(), make_valprefixbounds_db(), make_valuerange5(), and unserialise_document().
| void Xapian::Document::clear_terms | ( | ) |
Clear all terms from the document.
Definition at line 168 of file document.cc.
Referenced by DEFINE_TESTCASE().
| void Xapian::Document::clear_values | ( | ) |
Clear all value slots in this document.
Definition at line 197 of file document.cc.
Referenced by DEFINE_TESTCASE().
| string Xapian::Document::get_data | ( | ) | const |
Get the document data.
Definition at line 75 of file document.cc.
Referenced by InMemoryDatabase::add_document(), GlassWritableDatabase::add_document_(), DEFINE_TESTCASE(), main(), RemoteServer::msg_document(), GrepMatchDecider::operator()(), SimpleMatchSpy::operator()(), closedb1_iterators::perform(), remotefailure1_iterators::perform(), GlassWritableDatabase::replace_document(), InMemoryDatabase::replace_document(), serialise_document(), and show_docdata().
| string Xapian::Document::get_description | ( | ) | const |
Return a string describing this object.
Definition at line 226 of file document.cc.
Referenced by DEFINE_TESTCASE().
| Xapian::docid Xapian::Document::get_docid | ( | ) | const |
Get the document ID this document came from.
If this document didn't come from a database, this will be 0 (in Xapian 1.0.22/1.2.4 or later; prior to this the returned value was uninitialised in this case).
Note that if the document came from a sharded database, this is the docid in the shard it came from, not the docid in the combined database.
Definition at line 69 of file document.cc.
Referenced by DEFINE_TESTCASE(), HoneyValueManager::replace_document(), and GlassValueManager::replace_document().
| string Xapian::Document::get_value | ( | Xapian::valueno | slot | ) | const |
Read a value slot in this document.
| slot | The slot to read the value from |
Definition at line 185 of file document.cc.
Referenced by check_vals(), DEFINE_TESTCASE(), Xapian::LatLongDistanceKeyMaker::operator()(), Xapian::MultiValueKeyMaker::operator()(), Xapian::ValueSetMatchDecider::operator()(), Xapian::ValueCountMatchSpy::operator()(), closedb1_iterators::perform(), remotefailure1_iterators::perform(), and show_value().
Assignment operator.
The internals are reference counted, so assignment is cheap.
| void Xapian::Document::remove_posting | ( | std::string_view | term, |
| Xapian::termpos | term_pos, | ||
| Xapian::termcount | wdf_dec = 1 |
||
| ) |
Remove posting for a term.
The instance of the specified term at position term_pos will be removed, and the wdf reduced by wdf_dec (the wdf will not ever go below zero though - the resultant wdf is clamped to zero if it would).
If the term doesn't occur at position term_pos then Xapian::InvalidArgumentError is thrown. If you want to remove a single position which may not be present without triggering an exception you can call remove_postings(term, pos, pos) instead.
Since 2.0.0, if the final position is removed and the wdf becomes zero then the term will be removed from the document.
Definition at line 122 of file document.cc.
References Xapian::Document::Internal::NO_TERM, Xapian::Document::Internal::OK, Xapian::Internal::str(), term, and throw_invalid_arg_empty_term().
Referenced by DEFINE_TESTCASE().
| Xapian::termpos Xapian::Document::remove_postings | ( | std::string_view | term, |
| Xapian::termpos | term_pos_first, | ||
| Xapian::termpos | term_pos_last, | ||
| Xapian::termcount | wdf_dec = 1 |
||
| ) |
Remove a range of postings for a term.
Any instances of the term at positions >= term_pos_first and <= term_pos_last will be removed, and the wdf reduced by wdf_dec for each instance removed (the wdf will not ever go below zero though - the resultant wdf is clamped to zero if it would).
If the term doesn't occur in the range of positions specified (including if term_pos_first > term_pos_last) then this method does nothing (unlike remove_posting() which throws an exception if the specified position is not present).
Since 2.0.0, if all remaining positions are removed and the wdf becomes zero then the term will be removed from the document. Note that this only happens if some positions are removed though - calling this method on a term which has no positions and zero wdf won't remove that term.
Definition at line 144 of file document.cc.
References Xapian::Document::Internal::OK, rare, term, and throw_invalid_arg_empty_term().
Referenced by DEFINE_TESTCASE().
| void Xapian::Document::remove_term | ( | std::string_view | term | ) |
Remove a term from this document.
Definition at line 96 of file document.cc.
References internal, Xapian::Document::Internal::remove_term(), term, and throw_invalid_arg_empty_term().
Referenced by DEFINE_TESTCASE().
|
inline |
Remove any value from the specified slot.
| slot | The slot to remove any value from. |
Definition at line 242 of file document.h.
Referenced by DEFINE_TESTCASE().
| string Xapian::Document::serialise | ( | ) | const |
Serialise document into a string.
The document representation may change between Xapian releases: even between minor versions. However, it is guaranteed not to change if the remote database protocol has not changed between releases.
Definition at line 214 of file document.cc.
References serialise_document().
Referenced by DEFINE_TESTCASE().
| void Xapian::Document::set_data | ( | std::string_view | data | ) |
Set the document data.
This is an opaque blob as far as Xapian is concerned - it's up to you to impose whatever structure you want on it. If you want to store structured data, consider using something like protocol buffers.
Definition at line 81 of file document.cc.
Referenced by builddb_valuestest1(), DEFINE_TESTCASE(), gen_longpositionlist1_db(), FileIndexer::index_to(), main(), make_matchspy2_db(), make_sparse_db(), make_tg_db(), make_topercent7_db(), and unserialise_document().
|
inline |
Efficiently swap this Document object with another.
Definition at line 267 of file document.h.
References internal.
| TermIterator Xapian::Document::termlist_begin | ( | ) | const |
Start iterating the terms in this document.
The terms are returned in ascending string order (by byte value).
Note that if the Document object came from a sharded database then the TermIterator returned by this method only knows about the shard the document came from so calling get_termfreq() on it will give you the term frequency in that shard rather than in the combined database.
Definition at line 179 of file document.cc.
References internal, and Xapian::Document::Internal::open_term_list().
Referenced by GlassWritableDatabase::add_document_(), dbcheck(), DEFINE_TESTCASE(), InMemoryDatabase::finish_add_doc(), format_doc_termlist(), closedb1_iterators::perform(), remotefailure1_iterators::perform(), GlassWritableDatabase::replace_document(), serialise_document(), GlassTermListTable::set_termlist(), and HoneyTermListTable::set_termlist().
| Xapian::termcount Xapian::Document::termlist_count | ( | ) | const |
Return the number of distinct terms in this document.
Definition at line 174 of file document.cc.
Referenced by dbcheck(), DEFINE_TESTCASE(), serialise_document(), GlassTermListTable::set_termlist(), and HoneyTermListTable::set_termlist().
|
inlinenoexcept |
End iterator corresponding to termlist_begin().
Definition at line 219 of file document.h.
Referenced by GlassWritableDatabase::add_document_(), dbcheck(), DEFINE_TESTCASE(), InMemoryDatabase::finish_add_doc(), format_doc_termlist(), GlassWritableDatabase::replace_document(), serialise_document(), GlassTermListTable::set_termlist(), and HoneyTermListTable::set_termlist().
|
static |
Unserialise a document from a string produced by serialise().
Definition at line 220 of file document.cc.
References unserialise_document().
Referenced by DEFINE_TESTCASE().
| ValueIterator Xapian::Document::values_begin | ( | ) | const |
Start iterating the values in this document.
The values are returned in ascending numerical slot order.
Definition at line 208 of file document.cc.
Referenced by HoneyValueManager::add_document(), GlassValueManager::add_document(), check_vals(), dbcheck(), DEFINE_TESTCASE(), InMemoryDatabase::finish_add_doc(), RemoteServer::msg_document(), serialise_document(), and show_values().
| Xapian::valueno Xapian::Document::values_count | ( | ) | const |
Count the value slots used in this document.
Definition at line 203 of file document.cc.
Referenced by check_vals(), dbcheck(), DEFINE_TESTCASE(), and serialise_document().
|
inlinenoexcept |
End iterator corresponding to values_begin().
Definition at line 259 of file document.h.
Referenced by HoneyValueManager::add_document(), GlassValueManager::add_document(), check_vals(), dbcheck(), DEFINE_TESTCASE(), InMemoryDatabase::finish_add_doc(), RemoteServer::msg_document(), serialise_document(), and show_values().
|
private |
Reference counted internals.
Definition at line 69 of file document.h.
Referenced by HoneyValueManager::add_document(), remove_term(), HoneyValueManager::replace_document(), GlassValueManager::replace_document(), GlassWritableDatabase::replace_document(), swap(), and termlist_begin().