NODElib Documentation

By Gary William Flake

NAME

kmeans.h - perform k-means clustering on a DATASET

SYNOPSIS

The routines in this module will compute clusters of a DATASET with the k-means clustering algorithm. The resulting clusters are returned in another DATASET.

#include <kmeans.h>

DATASET *kmeans(DATASET *set,
    unsigned 
nclus, double minfrac, unsigned maxiters,
    int 
clusinit);

DATASET *kmeans_online(DATASET *set,
     unsigned 
nclus, unsigned maxiters,
     int 
clusinit);

DESCRIPTION

Function Definitions

The following function prototypes are given in the header file kmeans.h.

DATASET *kmeans(DATASET *set,
    unsigned 
nclus, double minfrac, unsigned maxiters,
    int 
clusinit);

This routine will take all of the points in set and attempt to cluster them into nclus points via the k-means clustering algorithm. The routine's termination is determinined by the last two arguments, with minfrac being a tolerance on the fractional decrease in the distortion for one iteration, and maxiters is the maximum number of iterations. If minfrac is less than zero, then the routine will work for maxiters iterations, but if maxiters is zero, then the routines will only consider minfrac. It is an error to set minfrac less than zero with maxiters set to zero. If clusinit is zero, the clusters are initialized to random exemplars. If clusinit is one, the clusters are initialized to the means of a random partitioning of the entire data set. If clusinit is two, the clusters are initialized to the means of a sequential partitioning of the entire data set. The resulting clusters can be found in the DATASET returned. If any error occurs, NULL is returned.

DATASET *kmeans_online(DATASET *set,
     unsigned 
nclus, unsigned maxiters,
     int 
clusinit);

Given set (which has n x-dimensional data points), perform online kmeans clustering and return nclus clusters in the returned DATASET. The algorithm operates on maxiters shuffled samples, reshuffling the data set as necessary. If clusinit is zero, the clusters are initialized to random exemplars. This is the usual choice for online kmeans. If clusinit is one, the clusters are initialized to the means of a random partitioning of the entire data set. If clusinit is two, the clusters are initialized to the means of a sequential partitioning of the entire data set. If any error occurs, NULL is returned.

AUTHOR

Gary William Flake (gary.flake@usa.net).

CREDITS

The original source code for this package came from Chris Darken (darken@scr.siemens.com). Thanks Chris.

SEE ALSO

series(3), and dsmethod(3).