|
xapian-core
2.0.0
|
Class representing a list of search results. More...
#include <mset.h>
Collaboration diagram for Xapian::MSet:Classes | |
| class | Internal |
| Xapian::MSet internals. More... | |
Public Types | |
| enum | { SNIPPET_BACKGROUND_MODEL = 1 , SNIPPET_EXHAUSTIVE = 2 , SNIPPET_EMPTY_WITHOUT_MATCH = 4 , SNIPPET_NGRAMS = 2048 , SNIPPET_CJK_NGRAM = SNIPPET_NGRAMS , SNIPPET_WORD_BREAKS = 4096 } |
Public Member Functions | |
| MSet (const MSet &o) | |
| Copying is allowed. More... | |
| MSet & | operator= (const MSet &o) |
| Copying is allowed. More... | |
| MSet (MSet &&o) | |
| Move constructor. More... | |
| MSet & | operator= (MSet &&o) |
| Move assignment operator. More... | |
| MSet () | |
| Default constructor. More... | |
| ~MSet () | |
| Destructor. More... | |
| template<typename Iterator > | |
| void | replace_weights (Iterator first, Iterator last) |
| Assigns new weights and updates MSet. More... | |
| void | sort_by_relevance () |
| Sorts the list of documents in MSet according to their weights. More... | |
| int | convert_to_percent (double weight) const |
| Convert a weight to a percentage. More... | |
| int | convert_to_percent (const MSetIterator &it) const |
| Convert the weight of the current iterator position to a percentage. More... | |
| Xapian::doccount | get_termfreq (std::string_view term) const |
| Get the termfreq of a term. More... | |
| double | get_termweight (std::string_view term) const |
| Get the term weight of a term. More... | |
| Xapian::doccount | get_firstitem () const |
| Rank of first item in this MSet. More... | |
| Xapian::doccount | get_matches_lower_bound () const |
| Lower bound on the total number of matching documents. More... | |
| Xapian::doccount | get_matches_estimated () const |
| Estimate of the total number of matching documents. More... | |
| Xapian::doccount | get_matches_upper_bound () const |
| Upper bound on the total number of matching documents. More... | |
| Xapian::doccount | get_uncollapsed_matches_lower_bound () const |
| Lower bound on the total number of matching documents before collapsing. More... | |
| Xapian::doccount | get_uncollapsed_matches_estimated () const |
| Estimate of the total number of matching documents before collapsing. More... | |
| Xapian::doccount | get_uncollapsed_matches_upper_bound () const |
| Upper bound on the total number of matching documents before collapsing. More... | |
| double | get_max_attained () const |
| The maximum weight attained by any document. More... | |
| double | get_max_possible () const |
| The maximum possible weight any document could achieve. More... | |
| std::string | snippet (std::string_view text, size_t length=500, const Xapian::Stem &stemmer=Xapian::Stem(), unsigned flags=SNIPPET_BACKGROUND_MODEL|SNIPPET_EXHAUSTIVE, std::string_view hi_start="<b>", std::string_view hi_end="</b>", std::string_view omit="...") const |
| Generate a snippet. More... | |
| void | fetch (const MSetIterator &begin, const MSetIterator &end) const |
| Prefetch hint a range of items. More... | |
| void | fetch (const MSetIterator &item) const |
| Prefetch hint a single MSet item. More... | |
| void | fetch () const |
| Prefetch hint the whole MSet. More... | |
| Xapian::doccount | size () const |
| Return number of items in this MSet object. More... | |
| bool | empty () const |
| Return true if this MSet object is empty. More... | |
| void | swap (MSet &o) |
| Efficiently swap this MSet object with another. More... | |
| MSetIterator | begin () const |
| Return iterator pointing to the first item in this MSet. More... | |
| MSetIterator | end () const |
| Return iterator pointing to just after the last item in this MSet. More... | |
| MSetIterator | operator[] (Xapian::doccount i) const |
| Return iterator pointing to the i-th object in this MSet. More... | |
| MSetIterator | back () const |
| Return iterator pointing to the last object in this MSet. More... | |
| std::string | get_description () const |
| Return a string describing this object. More... | |
Private Types | |
| typedef Xapian::MSetIterator | value_type |
| typedef Xapian::doccount | size_type |
| typedef Xapian::doccount_diff | difference_type |
| typedef Xapian::MSetIterator | iterator |
| typedef Xapian::MSetIterator | const_iterator |
| typedef value_type * | pointer |
| typedef const value_type * | const_pointer |
| typedef value_type | reference |
| typedef const value_type | const_reference |
Private Member Functions | |
| void | fetch_ (Xapian::doccount first, Xapian::doccount last) const |
| void | set_item_weight (Xapian::doccount i, double wt) |
| Update the weight corresponding to the document indexed at position i with wt. More... | |
| MSet (Internal *internal_) | |
| Xapian::doccount | max_size () const |
Private Attributes | |
| Xapian::Internal::intrusive_ptr_nonnull< Internal > | internal |
Friends | |
| class | MSetIterator |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
MSet is what the C++ STL calls a container.
The following typedefs allow the class to be used in templates in the same way the standard containers can be.
These are deliberately hidden from the Doxygen-generated docs, as the machinery here isn't interesting to API users. They just need to know that Xapian container classes are compatible with the STL.
See "The C++ Programming Language", 3rd ed. section 16.3.1:
| anonymous enum |
| Enumerator | |
|---|---|
| SNIPPET_BACKGROUND_MODEL | Model the relevancy of non-query terms in MSet::snippet(). Non-query terms will be assigned a small weight, and the snippet
will tend to prefer snippets which contain a more interesting
background (where the query term content is equivalent).
|
| SNIPPET_EXHAUSTIVE | Exhaustively evaluate candidate snippets in MSet::snippet(). Without this flag, snippet generation will stop once it thinks
it has found a "good enough" snippet, which will generally reduce
the time taken to generate a snippet.
|
| SNIPPET_EMPTY_WITHOUT_MATCH | Return the empty string if no term got matched. If enabled, snippet() returns an empty string if not a single match
was found in text. If not enabled, snippet() returns a (sub)string
of text without any highlighted terms.
|
| SNIPPET_NGRAMS | Generate n-grams for scripts without explicit word breaks. Text in other scripts is split into words as normal.
Enable this option to highlight search results for queries parsed
with the QueryParser::FLAG_NGRAMS flag.
The TermGenerator::FLAG_NGRAMS flag needs to have been used at
index time.
This mode can also be enabled by setting environment variable
XAPIAN_CJK_NGRAM to a non-empty value (but doing so was deprecated
in 1.4.11).
In 1.4.x this feature was specific to CJK (Chinese, Japanese and
Korean), but in 2.0.0 it's been extended to other languages. To
reflect this change the new and preferred name is SNIPPET_NGRAMS,
which was added as an alias for forward compatibility in Xapian
1.4.23. Use SNIPPET_CJK_NGRAM instead if you aim to support Xapian
< 1.4.23.
@since Added in Xapian 1.4.23.
|
| SNIPPET_CJK_NGRAM | Generate n-grams for scripts without explicit word breaks. Old name - use SNIPPET_NGRAMS instead unless you aim to support
Xapian < 1.4.23.
@since Added in Xapian 1.4.11.
|
| SNIPPET_WORD_BREAKS | Find word breaks for text in scripts without explicit word breaks. Enable this option to highlight search results for queries parsed
with the QueryParser::FLAG_WORD_BREAKS flag. Spans of text
written in such scripts are split into words using ICU (which uses
heuristics and/or dictionaries to do so). Text in other scripts is
split into words as normal.
The TermGenerator::FLAG_WORD_BREAKS flag needs to have been used at
index time.
@since Added in Xapian 2.0.0.
|
|
default |
Copying is allowed.
The internals are reference counted, so copying is cheap.
|
default |
Move constructor.
| Xapian::MSet::MSet | ( | ) |
|
explicitprivate |
|
inline |
Return iterator pointing to the last object in this MSet.
Definition at line 803 of file mset.h.
References MSetIterator.
Referenced by DEFINE_TESTCASE().
|
inline |
Return iterator pointing to the first item in this MSet.
Definition at line 786 of file mset.h.
References MSetIterator, and size().
Referenced by DEFINE_TESTCASE(), main(), print_mset_percentages(), print_mset_weights(), and test_mset_order_equal().
|
inline |
Convert the weight of the current iterator position to a percentage.
If the weighting scheme gives everything zero weight (like Xapian::BoolWeight does) then all results will score 100%.
Otherwise the percentage is calculated as a linear scaling of the relevance weight, with the scale factor determined by the matching document with the highest weight. This result scores 100% if it matches all the weighted query terms, and proportionally less if it only matches some.
The returned percentage is an integer. If the calculated percentage before rounding is non-zero but less than 1% it is rounded up to 1% so that a result scoring 0% means it has zero weight.
Similarly, percentages over 99% but less than 100% are always rounded down, so a result scoring 100% means it matches all weighted query terms.
Note that these generally aren't percentages of anything meaningful (unless you use a custom weighting formula where they are!) but like the weights they are based on, higher values should indicate more relevant results.
Definition at line 808 of file mset.h.
References convert_to_percent(), and Xapian::MSetIterator::get_weight().
| int Xapian::MSet::convert_to_percent | ( | double | weight | ) | const |
Convert a weight to a percentage.
If the weighting scheme gives everything zero weight (like Xapian::BoolWeight does) then all results will score 100%.
Otherwise the percentage is calculated as a linear scaling of the relevance weight, with the scale factor determined by the matching document with the highest weight. This result scores 100% if it matches all the weighted query terms, and proportionally less if it only matches some.
The returned percentage is an integer. If the calculated percentage before rounding is non-zero but less than 1% it is rounded up to 1% so that a result scoring 0% means it has zero weight.
Similarly, percentages over 99% but less than 100% are always rounded down, so a result scoring 100% means it matches all weighted query terms.
Note that these generally aren't percentages of anything meaningful (unless you use a custom weighting formula where they are!) but like the weights they are based on, higher values should indicate more relevant results.
Definition at line 275 of file mset.cc.
Referenced by convert_to_percent(), DEFINE_TESTCASE(), Xapian::MSetIterator::get_percent(), and print_mset_percentages().
|
inline |
Return true if this MSet object is empty.
Definition at line 467 of file mset.h.
Referenced by DEFINE_TESTCASE(), Matcher::get_mset(), and operator==().
|
inline |
Return iterator pointing to just after the last item in this MSet.
Definition at line 791 of file mset.h.
References MSetIterator.
Referenced by DEFINE_TESTCASE(), main(), print_mset_percentages(), print_mset_weights(), and test_mset_order_equal().
|
inline |
Prefetch hint the whole MSet.
For a remote database, this may start a pipelined fetch of the requested documents from the remote server.
For a disk-based database, this may send prefetch hints to the operating system such that the disk blocks the requested documents are stored in are more likely to be in the cache when we come to actually read them.
|
inline |
Prefetch hint a range of items.
For a remote database, this may start a pipelined fetch of the requested documents from the remote server.
For a disk-based database, this may send prefetch hints to the operating system such that the disk blocks the requested documents are stored in are more likely to be in the cache when we come to actually read them.
Definition at line 774 of file mset.h.
References fetch_(), and Xapian::MSetIterator::off_from_end.
Referenced by DEFINE_TESTCASE().
|
inline |
Prefetch hint a single MSet item.
For a remote database, this may start a pipelined fetch of the requested documents from the remote server.
For a disk-based database, this may send prefetch hints to the operating system such that the disk blocks the requested documents are stored in are more likely to be in the cache when we come to actually read them.
Definition at line 780 of file mset.h.
References fetch_(), and Xapian::MSetIterator::off_from_end.
|
private |
| std::string Xapian::MSet::get_description | ( | ) | const |
Return a string describing this object.
Definition at line 394 of file mset.cc.
Referenced by DEFINE_TESTCASE().
| Xapian::doccount Xapian::MSet::get_firstitem | ( | ) | const |
Rank of first item in this MSet.
This is the parameter first passed to Xapian::Enquire::get_mset().
Definition at line 312 of file mset.cc.
Referenced by DEFINE_TESTCASE(), and Xapian::MSetIterator::get_rank().
| Xapian::doccount Xapian::MSet::get_matches_estimated | ( | ) | const |
Estimate of the total number of matching documents.
Definition at line 324 of file mset.cc.
References internal, Xapian::MSet::Internal::matches_estimated, Xapian::MSet::Internal::matches_lower_bound, Xapian::MSet::Internal::matches_upper_bound, and round_estimate().
Referenced by DEFINE_TESTCASE(), main(), operator==(), and PerfTestLogger::search_end().
| Xapian::doccount Xapian::MSet::get_matches_lower_bound | ( | ) | const |
Lower bound on the total number of matching documents.
Definition at line 318 of file mset.cc.
Referenced by DEFINE_TESTCASE(), main(), operator==(), and PerfTestLogger::search_end().
| Xapian::doccount Xapian::MSet::get_matches_upper_bound | ( | ) | const |
Upper bound on the total number of matching documents.
Definition at line 334 of file mset.cc.
Referenced by DEFINE_TESTCASE(), main(), operator==(), and PerfTestLogger::search_end().
| double Xapian::MSet::get_max_attained | ( | ) | const |
The maximum weight attained by any document.
Definition at line 362 of file mset.cc.
Referenced by DEFINE_TESTCASE().
| double Xapian::MSet::get_max_possible | ( | ) | const |
The maximum possible weight any document could achieve.
Definition at line 368 of file mset.cc.
Referenced by DEFINE_TESTCASE(), and operator==().
| Xapian::doccount Xapian::MSet::get_termfreq | ( | std::string_view | term | ) | const |
Get the termfreq of a term.
db.get_termfreq(term) (but is more efficient for query terms as it returns a value cached during the search.)Since 2.0.0, this method returns 0 if called on an MSet which is not associated with a database (which is consistent with Database::get_termfreq() returning 0 when called on a Database with no sub-databases); in earlier versions, Xapian::InvalidOperationError was thrown in this case.
Definition at line 281 of file mset.cc.
References Xapian::MSet::Internal::enquire, internal, rare, Xapian::MSet::Internal::stats, term, and usual.
Referenced by DEFINE_TESTCASE(), and main().
| double Xapian::MSet::get_termweight | ( | std::string_view | term | ) | const |
Get the term weight of a term.
Since 2.0.0, this method returns 0.0 if called on an MSet which is not associated with a database, or with a term which wasn't present in the query (since in both cases the term contributes no weight to any matching documents); in earlier versions, Xapian::InvalidOperationError was thrown for the first case, and Xapian::InvalidArgumentError for the second.
Definition at line 300 of file mset.cc.
References internal, Xapian::MSet::Internal::stats, term, and usual.
Referenced by DEFINE_TESTCASE().
| Xapian::doccount Xapian::MSet::get_uncollapsed_matches_estimated | ( | ) | const |
Estimate of the total number of matching documents before collapsing.
Conceptually the same as get_matches_estimated() for the same query without any collapse part (though the actual value may differ).
Definition at line 346 of file mset.cc.
References internal, round_estimate(), Xapian::MSet::Internal::uncollapsed_estimated, Xapian::MSet::Internal::uncollapsed_lower_bound, and Xapian::MSet::Internal::uncollapsed_upper_bound.
Referenced by DEFINE_TESTCASE().
| Xapian::doccount Xapian::MSet::get_uncollapsed_matches_lower_bound | ( | ) | const |
Lower bound on the total number of matching documents before collapsing.
Conceptually the same as get_matches_lower_bound() for the same query without any collapse part (though the actual value may differ).
Definition at line 340 of file mset.cc.
Referenced by DEFINE_TESTCASE().
| Xapian::doccount Xapian::MSet::get_uncollapsed_matches_upper_bound | ( | ) | const |
Upper bound on the total number of matching documents before collapsing.
Conceptually the same as get_matches_upper_bound() for the same query without any collapse part (though the actual value may differ).
Definition at line 356 of file mset.cc.
Referenced by DEFINE_TESTCASE().
|
inlineprivate |
MSet is what the C++ STL calls a container.
The following methods allow the class to be used in templates in the same way the standard containers can be.
These are deliberately hidden from the Doxygen-generated docs, as the machinery here isn't interesting to API users. They just need to know that Xapian container classes are compatible with the STL.
Copying is allowed.
The internals are reference counted, so assignment is cheap.
|
inline |
Return iterator pointing to the i-th object in this MSet.
Definition at line 798 of file mset.h.
References MSetIterator, and size().
|
inline |
Assigns new weights and updates MSet.
Dereferencing the Iterator should return a double.
The weights returned by the iterator are assigned to elements of the MSet in rank order.
| begin | Begin iterator. |
| end | End iterator. |
| Xapian::InvalidArgument | is thrown if the total number of elements in the input doesn't match the total number of documents in MSet. |
Definition at line 130 of file mset.h.
Referenced by DEFINE_TESTCASE().
|
private |
Update the weight corresponding to the document indexed at position i with wt.
The MSet's max_possible and max_attained are also updated.
This method must be called to update the weight of every document in the MSet for i = 0 to mset.size() - 1 in ascending order to avoid miscalculation of max_attained and max_possible.
| i | MSet index to update |
| wt | new weight to assign to the document at index i |
| Xapian::doccount Xapian::MSet::size | ( | ) | const |
Return number of items in this MSet object.
Definition at line 374 of file mset.cc.
Referenced by begin(), check_msets_contain_same_docs(), DEFINE_TESTCASE(), PerfTestLogger::diversify_end(), Xapian::MSetIterator::get_rank(), Xapian::iterator_rewind(), Xapian::iterator_rewound(), main(), mset_range_is_same(), mset_range_is_same_weights(), operator==(), operator[](), PerfTestLogger::search_end(), and test_mset_order_equal().
| std::string Xapian::MSet::snippet | ( | std::string_view | text, |
| size_t | length = 500, |
||
| const Xapian::Stem & | stemmer = Xapian::Stem(), |
||
| unsigned | flags = SNIPPET_BACKGROUND_MODEL|SNIPPET_EXHAUSTIVE, |
||
| std::string_view | hi_start = "<b>", |
||
| std::string_view | hi_end = "</b>", |
||
| std::string_view | omit = "..." |
||
| ) | const |
Generate a snippet.
This method selects a continuous run of words from text, based mainly on where the query matches (currently terms, exact phrases and wildcards are taken into account). If flag SNIPPET_BACKGROUND_MODEL is used (which it is by default) then the selection algorithm also considers the non-query terms in the text with the aim of showing a context which provides more useful information.
The size of the text selected can be controlled by the length parameter, which specifies a number of bytes of text to aim to select. However slightly more text may be selected. Also the size of any escaping, highlighting or omission markers is not considered.
The returned text is escaped to make it suitable for use in HTML (though beware that in upstream releases 1.4.5 and earlier this escaping was sometimes incomplete), and matches with the query will be highlighted using hi_start and hi_end.
If the snippet seems to start or end mid-sentence, then omit is prepended or append (respectively) to indicate this.
The same stemming algorithm which was used to build the query should be specified in stemmer.
And flags contains flags controlling behaviour.
Definition at line 380 of file mset.cc.
References stemmer.
Referenced by DEFINE_TESTCASE().
| void Xapian::MSet::sort_by_relevance | ( | ) |
Sorts the list of documents in MSet according to their weights.
Use after calling MSet::replace_weights.
This invalidates any MSetIterator objects active on this MSet.
Definition at line 268 of file mset.cc.
References get_msetcmp_function(), internal, Xapian::MSet::Internal::items, Xapian::Enquire::Internal::REL, and Heap::sort().
Referenced by DEFINE_TESTCASE().
|
inline |
|
friend |
|
private |
Reference counted internals.
Definition at line 80 of file mset.h.
Referenced by get_matches_estimated(), RemoteDatabase::get_mset(), Xapian::Enquire::Internal::get_mset(), Matcher::get_mset(), get_termfreq(), get_termweight(), get_uncollapsed_matches_estimated(), RemoteServer::msg_query(), sort_by_relevance(), and swap().