|
xapian-core
2.0.0
|
Xapian::Weight subclass implementing the BM25+ probabilistic formula. More...
#include <weight.h>
Inheritance diagram for Xapian::BM25PlusWeight:
Collaboration diagram for Xapian::BM25PlusWeight:Public Member Functions | |
| BM25PlusWeight (double k1, double k2, double k3, double b, double min_normlen, double delta) | |
| Construct a BM25PlusWeight. More... | |
| BM25PlusWeight () | |
| std::string | name () const |
| Return the name of this weighting scheme, e.g. More... | |
| std::string | serialise () const |
| Return this object's parameters serialised as a single string. More... | |
| BM25PlusWeight * | unserialise (const std::string &serialised) const |
| Unserialise parameters. More... | |
| double | get_sumpart (Xapian::termcount wdf, Xapian::termcount doclen, Xapian::termcount uniqterms, Xapian::termcount wdfdocmax) const |
| Calculate the weight contribution for this object's term to a document. More... | |
| double | get_maxpart () const |
| Return an upper bound on what get_sumpart() can return for any document. More... | |
| double | get_sumextra (Xapian::termcount doclen, Xapian::termcount uniqterms, Xapian::termcount wdfdocmax) const |
| Calculate the term-independent weight component for a document. More... | |
| double | get_maxextra () const |
| Return an upper bound on what get_sumextra() can return for any document. More... | |
| BM25PlusWeight * | create_from_parameters (const char *params) const |
| Create from a human-readable parameter string. More... | |
Public Member Functions inherited from Xapian::Weight | |
| Weight () | |
| Default constructor, needed by subclass constructors. More... | |
| virtual | ~Weight () |
| Virtual destructor, because we have virtual methods. More... | |
Private Member Functions | |
| BM25PlusWeight * | clone () const |
| Clone this object. More... | |
| void | init (double factor) |
| Allow the subclass to perform any initialisation it needs to. More... | |
Private Attributes | |
| Xapian::doclength | len_factor |
| Factor to multiply the document length by. More... | |
| double | termweight |
| Factor combining all the document independent factors. More... | |
| double | param_k1 |
| The BM25+ parameters. More... | |
| double | param_k2 |
| double | param_k3 |
| double | param_b |
| Xapian::doclength | param_min_normlen |
| The minimum normalised document length value. More... | |
| double | param_delta |
| Additional parameter delta in the BM25+ formula. More... | |
Additional Inherited Members | |
Static Public Member Functions inherited from Xapian::Weight | |
| static const Weight * | create (const std::string &scheme, const Registry ®=Registry()) |
| Return the appropriate weighting scheme object. More... | |
Protected Types inherited from Xapian::Weight | |
| enum | stat_flags { COLLECTION_SIZE = 0 , RSET_SIZE = 0 , AVERAGE_LENGTH = 4 , TERMFREQ = 1 , RELTERMFREQ = 1 , QUERY_LENGTH = 0 , WQF = 0 , WDF = 2 , DOC_LENGTH = 8 , DOC_LENGTH_MIN = 16 , DOC_LENGTH_MAX = 32 , WDF_MAX = 64 , COLLECTION_FREQ = 1 , UNIQUE_TERMS = 128 , TOTAL_LENGTH = 256 , WDF_DOC_MAX = 512 , UNIQUE_TERMS_MIN = 1024 , UNIQUE_TERMS_MAX = 2048 , DB_DOC_LENGTH_MIN = 4096 , DB_DOC_LENGTH_MAX = 8192 , DB_UNIQUE_TERMS_MIN = 16384 , DB_UNIQUE_TERMS_MAX = 32768 , DB_WDF_MAX = 65536 , IS_BOOLWEIGHT_ = static_cast<int>(0x80000000) } |
| Stats which the weighting scheme can use (see need_stat()). More... | |
Protected Member Functions inherited from Xapian::Weight | |
| void | need_stat (stat_flags flag) |
| Tell Xapian that your subclass will want a particular statistic. More... | |
| Weight (const Weight &) | |
| Don't allow copying. More... | |
| Xapian::doccount | get_collection_size () const |
| The number of documents in the collection. More... | |
| Xapian::doccount | get_rset_size () const |
| The number of documents marked as relevant. More... | |
| Xapian::doclength | get_average_length () const |
| The average length of a document in the collection. More... | |
| Xapian::doccount | get_termfreq () const |
| The number of documents which this term indexes. More... | |
| Xapian::doccount | get_reltermfreq () const |
| The number of relevant documents which this term indexes. More... | |
| Xapian::termcount | get_collection_freq () const |
| The collection frequency of the term. More... | |
| Xapian::termcount | get_query_length () const |
| The length of the query. More... | |
| Xapian::termcount | get_wqf () const |
| The within-query-frequency of this term. More... | |
| Xapian::termcount | get_doclength_upper_bound () const |
| An upper bound on the maximum length of any document in the shard. More... | |
| Xapian::termcount | get_doclength_lower_bound () const |
| A lower bound on the minimum length of any document in the shard. More... | |
| Xapian::termcount | get_wdf_upper_bound () const |
| An upper bound on the wdf of this term in the shard. More... | |
| Xapian::totallength | get_total_length () const |
| Total length of all documents in the collection. More... | |
| Xapian::termcount | get_unique_terms_upper_bound () const |
| A lower bound on the number of unique terms in any document in the shard. More... | |
| Xapian::termcount | get_unique_terms_lower_bound () const |
| An upper bound on the number of unique terms in any document in the shard. More... | |
| Xapian::termcount | get_db_doclength_upper_bound () const |
| An upper bound on the maximum length of any document in the database. More... | |
| Xapian::termcount | get_db_doclength_lower_bound () const |
| A lower bound on the minimum length of any document in the database. More... | |
| Xapian::termcount | get_db_unique_terms_upper_bound () const |
| A lower bound on the number of unique terms in any document in the database. More... | |
| Xapian::termcount | get_db_unique_terms_lower_bound () const |
| An upper bound on the number of unique terms in any document in the database. More... | |
| Xapian::termcount | get_db_wdf_upper_bound () const |
| An upper bound on the wdf of this term in the database. More... | |
Xapian::Weight subclass implementing the BM25+ probabilistic formula.
|
inline |
Construct a BM25PlusWeight.
| k1 | A non-negative parameter controlling how influential within-document-frequency (wdf) is. k1=0 means that wdf doesn't affect the weights. The larger k1 is, the more wdf influences the weights. (default 1) |
| k2 | A non-negative parameter which controls the strength of a correction factor which depends upon query length and normalised document length. k2=0 disable this factor; larger k2 makes it stronger. The paper which describes BM25+ ignores BM25's document-independent component (so implicitly k2=0), but we support non-zero k2 too. (default 0) |
| k3 | A non-negative parameter controlling how influential within-query-frequency (wqf) is. k3=0 means that wqf doesn't affect the weights. The larger k3 is, the more wqf influences the weights. (default 1) |
| b | A parameter between 0 and 1, controlling how strong the document length normalisation of wdf is. 0 means no normalisation; 1 means full normalisation. (default 0.5) |
| min_normlen | A parameter specifying a minimum value for normalised document length. Normalised document length values less than this will be clamped to this value, helping to prevent very short documents getting large weights. (default 0.5) |
| delta | A parameter for pseudo tf value to control the scale of the tf lower bound. Delta(δ) can be tuned for example from 0.0 to 1.5 but BM25+ can still work effectively across collections with a fixed δ = 1.0. (default 1.0) |
|
privatevirtual |
Clone this object.
This method allocates and returns a copy of the object it is called on.
If your subclass is called FooWeight and has parameters a and b, then you would implement FooWeight::clone() like so:
FooWeight * FooWeight::clone() const { return new FooWeight(a, b); }
Note that the returned object will be deallocated by Xapian after use with "delete". If you want to handle the deletion in a special way (for example when wrapping the Xapian API for use from another language) then you can define a static operator delete method in your subclass as shown here: https://trac.xapian.org/ticket/554#comment:1
Implements Xapian::Weight.
Definition at line 41 of file bm25plusweight.cc.
|
virtual |
Create from a human-readable parameter string.
| params | string containing weighting scheme parameter values. |
Reimplemented from Xapian::Weight.
Definition at line 198 of file bm25plusweight.cc.
References Xapian::Weight::Internal::double_param(), p, and parameter_error().
|
virtual |
Return an upper bound on what get_sumextra() can return for any document.
The default implementation always returns 0 (in Xapian < 2.0.0 this was a pure virtual method).
This information is used by the matcher to perform various optimisations, so strive to make the bound as tight as possible.
Reimplemented from Xapian::Weight.
Definition at line 180 of file bm25plusweight.cc.
|
virtual |
Return an upper bound on what get_sumpart() can return for any document.
This information is used by the matcher to perform various optimisations, so strive to make the bound as tight as possible.
Implements Xapian::Weight.
Definition at line 131 of file bm25plusweight.cc.
|
virtual |
Calculate the term-independent weight component for a document.
The default implementation always returns 0 (in Xapian < 2.0.0 this was a pure virtual method).
The parameter gives information about the document which may be used in the calculations:
| doclen | The document's length (unnormalised). You need to call need_stat(DOC_LENGTH) if you use this value. |
| uniqterms | Number of unique terms in the document. You need to call need_stat(UNIQUE_TERMS) if you use this value. |
| wdfdocmax | Maximum wdf value in the document. You need to call need_stat(WDF_DOC_MAX) if you use this value. |
Reimplemented from Xapian::Weight.
Definition at line 170 of file bm25plusweight.cc.
|
virtual |
Calculate the weight contribution for this object's term to a document.
The parameters give information about the document which may be used in the calculations:
| wdf | The within document frequency of the term in the document. You need to call need_stat(WDF) if you use this value. |
| doclen | The document's length (unnormalised). You need to call need_stat(DOC_LENGTH) if you use this value. |
| uniqterms | Number of unique terms in the document. You need to call need_stat(UNIQUE_TERMS) if you use this value. |
| wdfdocmax | Maximum wdf value in the document. You need to call need_stat(WDF_DOC_MAX) if you use this value. |
You can rely of wdf <= doclen if you call both need_stat(WDF) and need_stat(DOC_LENGTH) - this is trivially true for terms, but Xapian also ensure it's true for OP_SYNONYM, where the wdf is approximated.
Implements Xapian::Weight.
Definition at line 115 of file bm25plusweight.cc.
|
privatevirtual |
Allow the subclass to perform any initialisation it needs to.
| factor | Any scaling factor (e.g. from OP_SCALE_WEIGHT). If the Weight object is for the term-independent weight supplied by get_sumextra()/get_maxextra(), then init(0.0) is called (starting from Xapian 1.2.11 and 1.3.1 - earlier versions failed to call init() for such Weight objects). |
Implements Xapian::Weight.
Definition at line 48 of file bm25plusweight.cc.
|
virtual |
Return the name of this weighting scheme, e.g.
"bm25+".
This is the name that the weighting scheme gets registered under when passed to Xapian:Registry::register_weighting_scheme().
As a result:
For 1.4.x and earlier we recommended returning the full namespace-qualified name of your class here, but now we recommend returning a just the name in lower case, e.g. "foo" instead of "FooWeight", "bm25+" instead of "Xapian::BM25PlusWeight".
If you don't want to support creation via Weight::create() or the remote backend, you can use the default implementation which simply returns an empty string.
Reimplemented from Xapian::Weight.
Definition at line 81 of file bm25plusweight.cc.
|
virtual |
Return this object's parameters serialised as a single string.
If you don't want to support the remote backend, you can use the default implementation which simply throws Xapian::UnimplementedError.
Reimplemented from Xapian::Weight.
Definition at line 87 of file bm25plusweight.cc.
References serialise_double().
|
virtual |
Unserialise parameters.
This method unserialises parameters serialised by the serialise() method and allocates and returns a new object initialised with them.
If you don't want to support the remote backend, you can use the default implementation which simply throws Xapian::UnimplementedError.
Note that the returned object will be deallocated by Xapian after use with "delete". If you want to handle the deletion in a special way (for example when wrapping the Xapian API for use from another language) then you can define a static operator delete method in your subclass as shown here: https://trac.xapian.org/ticket/554#comment:1
| serialised | A string containing the serialised parameters. |
Reimplemented from Xapian::Weight.
Definition at line 99 of file bm25plusweight.cc.
References rare, and unserialise_double().
|
mutableprivate |
|
private |
|
private |
|
private |
|
mutableprivate |