Perform hierarchical clustering of structure candidates based on chemical similarity and obtain overall structural information based on the maximum common structure (MCS).

makeHCluster(obj, method = "complete", ...)

# S4 method for class 'compounds'
makeHCluster(
  obj,
  method,
  fpType = "extended",
  fpSimMethod = "tanimoto",
  maxTreeHeight = 1,
  deepSplit = TRUE,
  minModuleSize = 1
)

Arguments

obj

The compounds object to be clustered.

method

The clustering method passed to hclust.

...

further arguments specified to methods.

fpType

The type of structural fingerprint that should be calculated. See the type argument of the get.fingerprint function of rcdk.

fpSimMethod

The method for calculating similarities (i.e. not dissimilarity!). See the method argument of the fp.sim.matrix function of the fingerprint package.

maxTreeHeight, deepSplit, minModuleSize

Arguments used by cutreeDynamicTree.

Value

makeHCluster returns an compoundsCluster object.

Details

Often many possible chemical structure candidates are found for each feature group when performing compound annotation. Therefore, it may be useful to obtain an overview of their general structural properties. One strategy is to perform hierarchical clustering based on their chemical (dis)similarity, for instance, using the Tanimoto score. The resulting clusters can then be characterized by evaluating their maximum common substructure (MCS).

makeHCluster performs hierarchical clustering of all structure candidates for each feature group within a compounds object. The resulting dendrograms are automatically cut using the cutreeDynamicTree function from the dynamicTreeCut package. The returned compoundsCluster object can then be used, for instance, for plotting dendrograms and MCS structures and manually re-cutting specific clusters.

Source

The methodology applied here has been largely derived from chemclust.R from the metfRag package and the package vignette of rcdk.

References

rcdk1

See also

compoundsCluster