R/generics.R, R/compounds-cluster.R
compounds-cluster.RdPerform hierarchical clustering of structure candidates based on chemical similarity and obtain overall structural information based on the maximum common structure (MCS).
makeHCluster(obj, method = "complete", ...)
# S4 method for class 'compounds'
makeHCluster(
obj,
method,
fpType = "extended",
fpSimMethod = "tanimoto",
maxTreeHeight = 1,
deepSplit = TRUE,
minModuleSize = 1
)The compounds object to be clustered.
The clustering method passed to hclust.
further arguments specified to methods.
The type of structural fingerprint that should be calculated. See the type argument of the
get.fingerprint function of rcdk.
The method for calculating similarities (i.e. not dissimilarity!). See the method argument
of the fp.sim.matrix function of the fingerprint package.
Arguments used by
cutreeDynamicTree.
makeHCluster returns an compoundsCluster object.
Often many possible chemical structure candidates are found for each feature group when performing compound annotation. Therefore, it may be useful to obtain an overview of their general structural properties. One strategy is to perform hierarchical clustering based on their chemical (dis)similarity, for instance, using the Tanimoto score. The resulting clusters can then be characterized by evaluating their maximum common substructure (MCS).
makeHCluster performs hierarchical clustering of all
structure candidates for each feature group within a
compounds object. The resulting dendrograms are automatically
cut using the cutreeDynamicTree function from the
dynamicTreeCut package. The returned
compoundsCluster object can then be used, for instance, for
plotting dendrograms and MCS structures and manually re-cutting specific
clusters.
The methodology applied here has been largely derived from
chemclust.R from the metfRag package and the package vignette
of rcdk.
Guha R (2007). “Chemical Informatics Functionality in R.” Journal of Statistical Software, 18(6).
compoundsCluster