22 #ifndef XAPIAN_INCLUDED_EXPANDWEIGHT_H
23 #define XAPIAN_INCLUDED_EXPANDWEIGHT_H
77 if (wdf == 0) wdf = 1;
86 if (shard_index >=
dbs_seen.size()) {
156 bool use_exact_termfreq_,
157 bool want_collection_freq_,
158 double expand_k_ = 0.0)
215 bool use_exact_termfreq_,
247 bool use_exact_termfreq_)
An indexed database of documents.
This class implements the Bo1 scheme for query expansion.
double get_weight() const
Calculate the weight.
Bo1EWeight(const Xapian::Database &db_, Xapian::doccount rsize_, bool use_exact_termfreq_)
Constructor.
Collates statistics while calculating term weight in an ESet.
Xapian::doclength avlen
Average document length in the whole database.
Xapian::doccount termfreq
Term frequency (for a multidb, may be for a subset of the databases).
ExpandStats(Xapian::doclength avlen_, double expand_k_=0.0)
Constructor.
void accumulate(size_t shard_index, Xapian::termcount wdf, Xapian::termcount doclen, Xapian::doccount subtf, Xapian::doccount subdbsize)
double get_average_length() const
Return the average document length in the database.
void clear_stats()
Reset for the next term.
Xapian::doccount rtermfreq
The number of documents from the RSet indexed by the current term (r).
double multiplier
The multiplier to be used in probabilistic query expansion.
Xapian::doccount dbsize
Size of the subset of a multidb to which the value in termfreq applies.
Xapian::termcount rcollection_freq
The number of times the term occurs in the rset.
double expand_k
The parameter k to be used for probabilistic query expansion.
std::vector< bool > dbs_seen
Which databases in a multidb are included in termfreq.
Class for calculating ESet term weights.
void collect_stats(TermList *merger, const std::string &term)
Get the term statistics.
Xapian::doccount dbsize
The number of documents in the whole database.
Xapian::totallength get_collection_len() const
Return the length of the collection.
bool use_exact_termfreq
Should we calculate the exact term frequency when generating an ESet?
Xapian::doccount get_rsize() const
Return the number of documents in the RSet.
double get_average_length() const
Return the average length of the database.
virtual double get_weight() const =0
Calculate the weight.
ExpandWeight(const Xapian::Database &db_, Xapian::doccount rsize_, bool use_exact_termfreq_, bool want_collection_freq_, double expand_k_=0.0)
Constructor.
const Xapian::Database db
The combined database.
bool want_collection_freq
Does the expansion scheme use collection frequency?
Xapian::doccount get_dbsize() const
Return the size of the database.
Xapian::totallength collection_len
The total length of the database.
Xapian::termcount collection_freq
The collection frequency of the term.
Xapian::termcount get_collection_freq() const
Return the collection frequency of the term.
ExpandStats stats
ExpandStats object to accumulate statistics.
Xapian::doccount rsize
The number of documents in the RSet.
This class implements the probabilistic scheme for query expansion.
double get_weight() const
Calculate the weight.
ProbEWeight(const Xapian::Database &db_, Xapian::doccount rsize_, bool use_exact_termfreq_, double expand_k_)
Constructor.
Abstract base class for termlists.
An indexed database of documents.
The Xapian namespace contains public interfaces for the Xapian library.
unsigned XAPIAN_TERMCOUNT_BASE_TYPE termcount
A counts of terms.
double doclength
A normalised document length.
unsigned XAPIAN_DOCID_BASE_TYPE doccount
A count of documents.
XAPIAN_TOTALLENGTH_TYPE totallength
The total length of all documents in a database.
Abstract base class for termlists.