39 PL2Weight::PL2Weight(
double c) : param_c(c)
84 double base_change(1.0 / log(2.0));
86 P1 = mean * base_change + 0.5 * log2(2.0 * M_PI);
87 P2 = log2(mean) + base_change;
107 double P_max2a = (wdfn_upper + 0.5) * log2(wdfn_upper) / (wdfn_upper + 1.0);
123 double wdfn_optb =
P1 +
P2 > 0 ? wdfn_upper : wdfn_lower;
124 double P_max2b = (
P1 -
P2 * wdfn_optb) / (wdfn_optb + 1.0);
145 const char *ptr = s.data();
146 const char *end = ptr + s.size();
148 if (
rare(ptr != end))
157 if (wdf == 0)
return 0.0;
159 double wdfn = wdf * log2(1 +
cl / len);
161 double P =
P1 + (wdfn + 0.5) * log2(wdfn) -
P2 * wdfn;
162 if (
rare(P <= 0))
return 0.0;
164 return factor * P / (wdfn + 1.0);
183 const char*
p = params;
InvalidArgumentError indicates an invalid parameter value was passed to the API.
This class implements the PL2 weighting scheme.
PL2Weight * unserialise(const std::string &serialised) const
Unserialise parameters.
double get_sumpart(Xapian::termcount wdf, Xapian::termcount doclen, Xapian::termcount uniqterms, Xapian::termcount wdfdocmax) const
Calculate the weight contribution for this object's term to a document.
double upper_bound
The upper bound on the weight.
void init(double factor_)
Allow the subclass to perform any initialisation it needs to.
double param_c
The wdf normalization parameter in the formula.
PL2Weight * clone() const
Clone this object.
std::string serialise() const
Return this object's parameters serialised as a single string.
std::string name() const
Return the name of this weighting scheme, e.g.
double cl
Set by init() to (param_c * get_average_length())
double P1
Constants for a given term in a given query.
double factor
The factor to multiply weights by.
double get_maxpart() const
Return an upper bound on what get_sumpart() can return for any document.
PL2Weight * create_from_parameters(const char *params) const
Create from a human-readable parameter string.
Indicates an error in the std::string serialisation of an object.
static void parameter_error(const char *msg, const std::string &scheme, const char *params)
static bool double_param(const char **p, double *ptr_val)
Xapian::termcount get_doclength_lower_bound() const
A lower bound on the minimum length of any document in the shard.
void need_stat(stat_flags flag)
Tell Xapian that your subclass will want a particular statistic.
Xapian::termcount get_wqf() const
The within-query-frequency of this term.
Xapian::termcount get_collection_freq() const
The collection frequency of the term.
Xapian::doccount get_collection_size() const
The number of documents in the collection.
Xapian::doclength get_average_length() const
The average length of a document in the collection.
Xapian::termcount get_doclength_upper_bound() const
An upper bound on the maximum length of any document in the shard.
@ AVERAGE_LENGTH
Average length of documents in the collection.
@ DOC_LENGTH_MAX
Upper bound on document lengths.
@ DOC_LENGTH
Length of the current document (sum wdf).
@ WQF
Within-query-frequency of the current term.
@ COLLECTION_SIZE
Number of documents in the collection.
@ WDF_MAX
Upper bound on wdf.
@ DOC_LENGTH_MIN
Lower bound on (non-zero) document lengths.
@ COLLECTION_FREQ
Sum of wdf over the whole collection for the current term.
@ WDF
Within-document-frequency of the current term in the current document.
Xapian::termcount get_wdf_upper_bound() const
An upper bound on the wdf of this term in the shard.
Hierarchy of classes which Xapian can throw as exceptions.
The Xapian namespace contains public interfaces for the Xapian library.
unsigned XAPIAN_TERMCOUNT_BASE_TYPE termcount
A counts of terms.
static void parameter_error(const char *message, const char *params)
string serialise_double(double v)
Serialise a double to a string.
double unserialise_double(const char **p, const char *end)
Unserialise a double serialised by serialise_double.
functions to serialise and unserialise a double
Xapian::Weight::Internal class, holding database and term statistics.