xapian-core
1.4.26
|
A handle representing a document in a Xapian database. More...
#include <document.h>
Classes | |
class | Internal |
A document in the database, possibly plus modifications. More... | |
Public Member Functions | |
Document (const Document &other) | |
Copying is allowed. More... | |
void | operator= (const Document &other) |
Assignment is allowed. More... | |
Document () | |
Make a new empty Document. More... | |
~Document () | |
Destructor. More... | |
std::string | get_value (Xapian::valueno slot) const |
Get value by number. More... | |
void | add_value (Xapian::valueno slot, const std::string &value) |
Add a new value. More... | |
void | remove_value (Xapian::valueno slot) |
Remove any value with the given number. More... | |
void | clear_values () |
Remove all values associated with the document. More... | |
std::string | get_data () const |
Get data stored in the document. More... | |
void | set_data (const std::string &data) |
Set data stored in the document. More... | |
void | add_posting (const std::string &tname, Xapian::termpos tpos, Xapian::termcount wdfinc=1) |
Add an occurrence of a term at a particular position. More... | |
void | add_term (const std::string &tname, Xapian::termcount wdfinc=1) |
Add a term to the document, without positional information. More... | |
void | add_boolean_term (const std::string &term) |
Add a boolean filter term to the document. More... | |
void | remove_posting (const std::string &tname, Xapian::termpos tpos, Xapian::termcount wdfdec=1) |
Remove a posting of a term from the document. More... | |
Xapian::termpos | remove_postings (const std::string &term, Xapian::termpos term_pos_first, Xapian::termpos term_pos_last, Xapian::termcount wdf_dec=1) |
Remove a range of postings for a term. More... | |
void | remove_term (const std::string &tname) |
Remove a term and all postings associated with it. More... | |
void | clear_terms () |
Remove all terms (and postings) from the document. More... | |
Xapian::termcount | termlist_count () const |
The length of the termlist - i.e. More... | |
TermIterator | termlist_begin () const |
Start iterating the terms in this document. More... | |
TermIterator | termlist_end () const |
Equivalent end iterator for termlist_begin(). More... | |
Xapian::termcount | values_count () const |
Count the values in this document. More... | |
ValueIterator | values_begin () const |
Iterator for the values in this document. More... | |
ValueIterator | values_end () const |
Equivalent end iterator for values_begin(). More... | |
docid | get_docid () const |
Get the document id which is associated with this document (if any). More... | |
std::string | serialise () const |
Serialise document into a string. More... | |
std::string | get_description () const |
Return a string describing this object. More... | |
Static Public Member Functions | |
static Document | unserialise (const std::string &serialised) |
Unserialise a document from a string produced by serialise(). More... | |
Private Member Functions | |
Document (Internal *internal_) | |
Private Attributes | |
Xapian::Internal::intrusive_ptr< Internal > | internal |
A handle representing a document in a Xapian database.
The Document class fetches information from the database lazily. Usually this behaviour isn't visible to users (except for the speed benefits), but if the document in the database is modified or deleted, then preexisting Document objects may return the old or new versions of data (or throw Xapian::DocNotFoundError in the case of deletion).
Since Database objects work on a snapshot of the database's state, the situation above can only happen with a WritableDatabase object, or if you call Database::reopen() on a Database object.
We recommend you avoid designs where this behaviour is an issue, but if you need a way to make a non-lazy version of a Document object, you can do this like so:
doc = Xapian::Document::unserialise(doc.serialise());
Definition at line 61 of file document.h.
|
explicitprivate |
Constructor is only used by internal classes.
internal_ | pointer to internal opaque class |
Definition at line 50 of file omdocument.cc.
References Document(), and operator=().
Xapian::Document::Document | ( | const Document & | other | ) |
Copying is allowed.
The internals are reference counted, so copying is cheap.
other | The object to copy. |
Definition at line 91 of file omdocument.cc.
Xapian::Document::Document | ( | ) |
Xapian::Document::~Document | ( | ) |
Destructor.
Definition at line 96 of file omdocument.cc.
|
inline |
Add a boolean filter term to the document.
This method adds term to the document with wdf of 0 - this is generally what you want for a term used for boolean filtering as the wdf of such terms is ignored, and it doesn't make sense for them to contribute to the document's length.
If the specified term already indexes this document, this method has no effect.
It is exactly the same as add_term(term, 0).
This method was added in Xapian 1.0.18.
term | The term to add. |
Definition at line 192 of file document.h.
Referenced by DEFINE_TESTCASE(), gen_uniqterms_gt_doclen_db(), and make_topercent7_db().
void Xapian::Document::add_posting | ( | const std::string & | tname, |
Xapian::termpos | tpos, | ||
Xapian::termcount | wdfinc = 1 |
||
) |
Add an occurrence of a term at a particular position.
Multiple occurrences of the term at the same position are represented only once in the positional information, but do increase the wdf.
If the term is not already in the document, it will be added to it.
tname | The name of the term. |
tpos | The position of the term. |
wdfinc | The increment that will be applied to the wdf for this term. |
Definition at line 128 of file omdocument.cc.
References LOGCALL_VOID.
Referenced by DEFINE_TESTCASE(), gen_longpositionlist1_db(), gen_uniqterms_gt_doclen_db(), FileIndexer::index_to(), make_phrasebug1_db(), and unserialise_document().
void Xapian::Document::add_term | ( | const std::string & | tname, |
Xapian::termcount | wdfinc = 1 |
||
) |
Add a term to the document, without positional information.
Any existing positional information for the term will be left unmodified.
tname | The name of the term. |
wdfinc | The increment that will be applied to the wdf for this term (default: 1). |
Definition at line 140 of file omdocument.cc.
References LOGCALL_VOID.
Referenced by basic_doc(), builddb_queries1(), builddb_valuestest1(), DEFINE_TESTCASE(), gen_consistency2_db(), gen_decvalwtsource3_db(), gen_lazytablebug1_db(), gen_qp_flag_partial1_db(), gen_subdbwithoutpos1_db(), gen_uniqterms_gt_doclen_db(), gen_wdf_eq_doclen_db(), make_all_tables(), make_all_tables2(), make_matchspy2_db(), make_missing_tables(), make_msize1_db(), make_msize2_db(), make_multichunk_db(), make_orcheck_db(), make_ordecay_db(), make_sparse_db(), make_topercent7_db(), make_xordecay1_db(), and unserialise_document().
void Xapian::Document::add_value | ( | Xapian::valueno | slot, |
const std::string & | value | ||
) |
Add a new value.
The new value will replace any existing value with the same number (or if the new value is empty, it will remove any existing value with the same number).
slot | The value slot to add the value in. |
value | The value to set. |
Definition at line 107 of file omdocument.cc.
References LOGCALL_VOID.
Referenced by bigoaddvalue1_helper(), builddb_coords1(), builddb_valuestest1(), DEFINE_TESTCASE(), gen_consistency2_db(), gen_decvalwtsource3_db(), gen_decvalwtsource5_db(), gen_qp_range3_db(), gen_valueweightsource5_db(), FileIndexer::index_to(), make_matchspy2_db(), make_matchtimelimit1_db(), make_msize1_db(), make_msize2_db(), make_singularvalue_db(), make_valprefixbounds_db(), make_valuerange5(), and unserialise_document().
void Xapian::Document::clear_terms | ( | ) |
Remove all terms (and postings) from the document.
Definition at line 184 of file omdocument.cc.
References LOGCALL_VOID.
Referenced by DEFINE_TESTCASE().
void Xapian::Document::clear_values | ( | ) |
Remove all values associated with the document.
Definition at line 121 of file omdocument.cc.
References LOGCALL_VOID.
Referenced by DEFINE_TESTCASE().
string Xapian::Document::get_data | ( | ) | const |
Get data stored in the document.
This is potentially a relatively expensive operation, and shouldn't normally be used during the match (e.g. in a PostingSource or match decider functor. Put data for use by match deciders in a value instead.
Definition at line 71 of file omdocument.cc.
References internal, LOGCALL, and RETURN.
Referenced by InMemoryDatabase::add_document(), GlassWritableDatabase::add_document_(), ChertWritableDatabase::add_document_(), DEFINE_TESTCASE(), main(), RemoteServer::msg_document(), SimpleMatchSpy::operator()(), GrepMatchDecider::operator()(), closedb1_iterators::perform(), InMemoryDatabase::replace_document(), GlassWritableDatabase::replace_document(), ChertWritableDatabase::replace_document(), serialise_document(), and show_docdata().
string Xapian::Document::get_description | ( | ) | const |
Return a string describing this object.
Definition at line 101 of file omdocument.cc.
Referenced by DEFINE_TESTCASE().
docid Xapian::Document::get_docid | ( | ) | const |
Get the document id which is associated with this document (if any).
NB If multiple databases are being searched together, then this will be the document id in the individual database, not the merged database!
Definition at line 220 of file omdocument.cc.
References internal, LOGCALL, and RETURN.
Referenced by DEFINE_TESTCASE(), and GlassValueManager::replace_document().
string Xapian::Document::get_value | ( | Xapian::valueno | slot | ) | const |
Get value by number.
Returns an empty string if no value with the given number is present in the document.
slot | The number of the value. |
Definition at line 64 of file omdocument.cc.
References internal, LOGCALL, and RETURN.
Referenced by check_vals(), DEFINE_TESTCASE(), Xapian::ValueSetMatchDecider::operator()(), Xapian::MultiValueKeyMaker::operator()(), Xapian::ValueCountMatchSpy::operator()(), Xapian::LatLongDistanceKeyMaker::operator()(), closedb1_iterators::perform(), and show_value().
|
default |
Assignment is allowed.
The internals are reference counted, so assignment is cheap.
other | The object to copy. |
Definition at line 85 of file omdocument.cc.
References internal.
Referenced by Document().
void Xapian::Document::remove_posting | ( | const std::string & | tname, |
Xapian::termpos | tpos, | ||
Xapian::termcount | wdfdec = 1 |
||
) |
Remove a posting of a term from the document.
Note that the term will still index the document even if all occurrences are removed. To remove a term from a document completely, use remove_term().
tname | The name of the term. |
tpos | The position of the term. |
wdfdec | The decrement that will be applied to the wdf when removing this posting. The wdf will not go below the value of 0. |
Xapian::InvalidArgumentError | will be thrown if the term is not at the position specified in the position list for this term in this document. |
Xapian::InvalidArgumentError | will be thrown if the term is not in the document |
Definition at line 150 of file omdocument.cc.
References LOGCALL_VOID.
Referenced by DEFINE_TESTCASE().
Xapian::termpos Xapian::Document::remove_postings | ( | const std::string & | term, |
Xapian::termpos | term_pos_first, | ||
Xapian::termpos | term_pos_last, | ||
Xapian::termcount | wdf_dec = 1 |
||
) |
Remove a range of postings for a term.
Any instances of the term at positions >= term_pos_first and <= term_pos_last will be removed, and the wdf reduced by wdf_dec for each instance removed (the wdf will not ever go below zero though).
It's OK if the term doesn't occur in the range of positions specified (unlike remove_posting()). And if term_pos_first > term_pos_last, this method does nothing.
Xapian::InvalidArgumentError | will be thrown if the term is not in the document |
Definition at line 161 of file omdocument.cc.
References rare.
Referenced by DEFINE_TESTCASE().
void Xapian::Document::remove_term | ( | const std::string & | tname | ) |
Remove a term and all postings associated with it.
tname | The name of the term. |
Xapian::InvalidArgumentError | will be thrown if the term is not in the document |
Definition at line 177 of file omdocument.cc.
References LOGCALL_VOID.
Referenced by DEFINE_TESTCASE().
void Xapian::Document::remove_value | ( | Xapian::valueno | slot | ) |
Remove any value with the given number.
Definition at line 114 of file omdocument.cc.
References LOGCALL_VOID.
Referenced by DEFINE_TESTCASE().
std::string Xapian::Document::serialise | ( | ) | const |
Serialise document into a string.
The document representation may change between Xapian releases: even between minor versions. However, it is guaranteed not to change if the remote database protocol has not changed between releases.
Definition at line 227 of file omdocument.cc.
References LOGCALL, RETURN, and serialise_document().
Referenced by DEFINE_TESTCASE().
void Xapian::Document::set_data | ( | const std::string & | data | ) |
Set data stored in the document.
This is an opaque blob as far as Xapian is concerned - it's up to you to impose whatever structure you want on it. If you want to store structured data, consider using something like protocol buffers.
data | The data to store. |
Definition at line 78 of file omdocument.cc.
References LOGCALL_VOID.
Referenced by builddb_valuestest1(), DEFINE_TESTCASE(), gen_longpositionlist1_db(), FileIndexer::index_to(), main(), make_matchspy2_db(), make_sparse_db(), make_tg_db(), make_topercent7_db(), and unserialise_document().
TermIterator Xapian::Document::termlist_begin | ( | ) | const |
Start iterating the terms in this document.
The terms are returned in ascending string order (by byte value).
Note that if the Document object came from a sharded database then the TermIterator returned by this method only knows about the shard the document came from so calling get_termfreq() on it will give you the term frequency in that shard rather than in the combined database.
Definition at line 197 of file omdocument.cc.
References internal, LOGCALL, and RETURN.
Referenced by GlassWritableDatabase::add_document_(), ChertWritableDatabase::add_document_(), DEFINE_TESTCASE(), InMemoryDatabase::finish_add_doc(), format_doc_termlist(), closedb1_iterators::perform(), GlassWritableDatabase::replace_document(), ChertWritableDatabase::replace_document(), serialise_document(), ChertTermListTable::set_termlist(), and GlassTermListTable::set_termlist().
Xapian::termcount Xapian::Document::termlist_count | ( | ) | const |
The length of the termlist - i.e.
the number of different terms which index this document.
Definition at line 191 of file omdocument.cc.
References internal, LOGCALL, and RETURN.
Referenced by DEFINE_TESTCASE(), serialise_document(), ChertTermListTable::set_termlist(), and GlassTermListTable::set_termlist().
|
inline |
Equivalent end iterator for termlist_begin().
Definition at line 270 of file document.h.
Referenced by GlassWritableDatabase::add_document_(), ChertWritableDatabase::add_document_(), DEFINE_TESTCASE(), InMemoryDatabase::finish_add_doc(), format_doc_termlist(), GlassWritableDatabase::replace_document(), ChertWritableDatabase::replace_document(), serialise_document(), ChertTermListTable::set_termlist(), and GlassTermListTable::set_termlist().
|
static |
Unserialise a document from a string produced by serialise().
Definition at line 234 of file omdocument.cc.
References LOGCALL_STATIC, RETURN, and unserialise_document().
Referenced by DEFINE_TESTCASE().
ValueIterator Xapian::Document::values_begin | ( | ) | const |
Iterator for the values in this document.
Definition at line 210 of file omdocument.cc.
References internal, LOGCALL, and RETURN.
Referenced by ChertValueManager::add_document(), GlassValueManager::add_document(), check_vals(), DEFINE_TESTCASE(), InMemoryDatabase::finish_add_doc(), RemoteServer::msg_document(), serialise_document(), and show_values().
Xapian::termcount Xapian::Document::values_count | ( | ) | const |
Count the values in this document.
Definition at line 204 of file omdocument.cc.
References internal, LOGCALL, and RETURN.
Referenced by check_vals(), DEFINE_TESTCASE(), and serialise_document().
|
inline |
Equivalent end iterator for values_begin().
Definition at line 281 of file document.h.
Referenced by ChertValueManager::add_document(), GlassValueManager::add_document(), check_vals(), DEFINE_TESTCASE(), InMemoryDatabase::finish_add_doc(), RemoteServer::msg_document(), serialise_document(), and show_values().
|
private |
Reference counted internals.
Definition at line 63 of file document.h.
Referenced by get_data(), get_docid(), get_value(), operator=(), ChertValueManager::replace_document(), GlassValueManager::replace_document(), GlassWritableDatabase::replace_document(), ChertWritableDatabase::replace_document(), termlist_begin(), termlist_count(), values_begin(), and values_count().