xapian-core  2.0.0
Public Member Functions | Private Member Functions | Private Attributes | List of all members
Xapian::KMeans Class Reference

Kmeans clusterer: This clusterer implements the K-Means clustering algorithm. More...

#include <cluster.h>

+ Inheritance diagram for Xapian::KMeans:
+ Collaboration diagram for Xapian::KMeans:

Public Member Functions

 KMeans (unsigned int k_, unsigned int max_iters_=0)
 Constructor specifying number of clusters and maximum iterations. More...
 
ClusterSet cluster (const MSet &mset) override
 Implements the KMeans clustering algorithm. More...
 
void set_stopper (const Xapian::Stopper *stop=NULL)
 Set the Xapian::Stopper object to be used for identifying stopwords. More...
 
std::string get_description () const override
 Return a string describing this object. More...
 
- Public Member Functions inherited from Xapian::Clusterer
virtual ~Clusterer ()
 Destructor. More...
 
Clustererrelease ()
 Start reference counting this object. More...
 
const Clustererrelease () const
 Start reference counting this object. More...
 
- Public Member Functions inherited from Xapian::Internal::opt_intrusive_base
 opt_intrusive_base (const opt_intrusive_base &)
 
opt_intrusive_baseoperator= (const opt_intrusive_base &)
 
 opt_intrusive_base ()
 Construct object which is initially not reference counted. More...
 
virtual ~opt_intrusive_base ()
 
void ref () const
 
void unref () const
 

Private Member Functions

void initialise_clusters (ClusterSet &cset, Xapian::doccount num_of_points)
 Initialise 'k' clusters by selecting 'k' centroids and assigning them to different clusters. More...
 
void initialise_points (const MSet &source)
 Initialise the Points to be fed into the Clusterer with the MSet object 'source'. More...
 

Private Attributes

std::vector< Pointpoints
 Contains the initialised points that are to be clustered. More...
 
unsigned int k
 Specifies that the clusterer needs to form 'k' clusters. More...
 
unsigned int max_iters
 Specifies the maximum number of iterations that KMeans will have. More...
 
Xapian::Internal::opt_intrusive_ptr< const Xapian::Stopperstopper
 Pointer to stopper object for identifying stopwords. More...
 

Additional Inherited Members

- Public Attributes inherited from Xapian::Internal::opt_intrusive_base
unsigned _refs
 Reference count. More...
 
- Protected Member Functions inherited from Xapian::Internal::opt_intrusive_base
void release () const
 Start reference counting. More...
 

Detailed Description

Kmeans clusterer: This clusterer implements the K-Means clustering algorithm.

Definition at line 596 of file cluster.h.

Constructor & Destructor Documentation

◆ KMeans()

Xapian::KMeans::KMeans ( unsigned int  k_,
unsigned int  max_iters_ = 0 
)
explicit

Constructor specifying number of clusters and maximum iterations.

Parameters
k_Number of required clusters
max_iters_The maximum number of iterations for which KMeans will run if it doesn't converge

Member Function Documentation

◆ cluster()

ClusterSet Xapian::KMeans::cluster ( const MSet mset)
overridevirtual

Implements the KMeans clustering algorithm.

Parameters
msetMSet object containing the documents that are to be clustered

Implements Xapian::Clusterer.

◆ get_description()

std::string Xapian::KMeans::get_description ( ) const
overridevirtual

Return a string describing this object.

Implements Xapian::Clusterer.

◆ initialise_clusters()

void Xapian::KMeans::initialise_clusters ( ClusterSet cset,
Xapian::doccount  num_of_points 
)
private

Initialise 'k' clusters by selecting 'k' centroids and assigning them to different clusters.

Parameters
csetClusterSet object to be initialised by assigning centroids to each cluster
num_of_pointsNumber of points passed to clusterer

◆ initialise_points()

void Xapian::KMeans::initialise_points ( const MSet source)
private

Initialise the Points to be fed into the Clusterer with the MSet object 'source'.

The TF-IDF weights for the documents are calculated and stored within the Points to be used later during distance calculations

Parameters
sourceMSet object containing the documents which will be used to create document vectors that are represented as Point objects

◆ set_stopper()

void Xapian::KMeans::set_stopper ( const Xapian::Stopper stop = NULL)
inline

Set the Xapian::Stopper object to be used for identifying stopwords.

Stopwords are discarded while calculating term frequency for terms.

Parameters
stopThe Stopper object to set (default NULL, which means no stopwords)

Definition at line 651 of file cluster.h.

Member Data Documentation

◆ k

unsigned int Xapian::KMeans::k
private

Specifies that the clusterer needs to form 'k' clusters.

Definition at line 601 of file cluster.h.

◆ max_iters

unsigned int Xapian::KMeans::max_iters
private

Specifies the maximum number of iterations that KMeans will have.

Definition at line 604 of file cluster.h.

◆ points

std::vector<Point> Xapian::KMeans::points
private

Contains the initialised points that are to be clustered.

Definition at line 598 of file cluster.h.

◆ stopper

Xapian::Internal::opt_intrusive_ptr<const Xapian::Stopper> Xapian::KMeans::stopper
private

Pointer to stopper object for identifying stopwords.

Definition at line 607 of file cluster.h.


The documentation for this class was generated from the following file: