4.4 Componentization

In patRoon componentization refers to grouping related feature groups together in components. There are different methodologies to generate components:

Similarity on chromatographic elution profiles: feature groups with similar chromatographic behaviour which are assuming to be the same chemical compound (e.g. adducts or isotopologues).
Homologous series: features with increasing m/z and retention time.
Intensity profiles: features that follow a similar intensity profile in the analyses.
MS/MS similarity: feature groups with similar MS/MS spectra are clustered.
Transformation products: Components are formed by grouping feature groups that have a parent/transformation product relationship. This is further discussed in its own chapter.

The following algorithms are currently supported:

Algorithm	Usage	Remarks
CAMERA	`generateComponents(algorithm = "camera", ...)`	Clusters feature groups with similar chromatographic elution profiles and annotate by known chemical rules (adducts, isotopologues, in-source fragments).
RAMClustR	`generateComponents(algorithm = "ramclustr", ...)`	As above.
cliqueMS	`generateComponents(algorithm = "cliquems", ...)`	As above, but using feature components.
OpenMS	`generateComponents(algorithm = "openms", ...)`	As above. Uses MetaboliteAdductDecharger.
nontarget	`generateComponents(algorithm = "nontarget", ...)`	Uses the nontarget R package to perform unsupervised homologous series detection.
Intensity clustering	`generateComponents(algorithm = "intclust", ...)`	Groups features with similar intensity profiles across analyses by hierarchical clustering.
MS/MS clustering	`generateComponents(algorithm = "specclust", ...)`	Clusters feature groups with similar MS/MS spectra.
Transformation products	`generateComponents(algorithm = "tp", ...)`	Discussed in its own chapter.

4.4.1 Features with similar chromatographic behaviour

Isotopes, adducts and in-source fragments typically result in detection of multiple mass peaks by the mass spectrometer for a single chemical compound. While some feature finding algorithms already try to collapse (some of) these in to a single feature, this process is often incomplete (if performed at all) and it is not uncommon that multiple features will describe the same compound. To overcome this complexity several algorithms can be used to group features that undergo highly similar chromatographic behavior but have different m/z values. Basic chemical rules are then applied to the resulting components to annotate adducts, in-source fragments and isotopologues, which may be highly useful for general identification purposes.

Note that some algorithms were primarily designed for datasets where features are generally present in the majority of the analyses (as is relatively common in metabolomics). For environmental analyses, however, this is often not the case. For instance, consider the following situation with three feature groups that chromatographically overlap and therefore could be considered a component:

Feature group	m/z	analysis 1	analysis 2	analysis 3
#1	100.08827	Present	Present	Absent
#2	122.07021	Present	Present	Absent
#3	138.04415	Absent	Absent	Present

Based on the mass differences from this example a cluster of [M+H]+, [M+Na]+ and [M+K]+ could be assumed. However, no features of the first two feature groups were detected in the third sample analysis, whereas the third feature group wasn’t detected in the first two sample analysis. Based on this it seems unlikely that feature group #3 should be part of the component.

For the algorithms that operate on a ‘feature group level’ (CAMERA and RAMClustR), the relMinReplicates argument can be used to remove feature groups from a component that are not abundant. For instance, when this value is 0.5 (the default), and all the features of a component were detected in four different replicate groups in total, then only those feature groups are kept for which its features were detected in at least two different replicate groups (i.e. half of four).

Another approach to reduce unlikely adduct annotations is to use algorithms that operate on a ‘feature level’ (cliqueMS and OpenMS). These algorithms generate components for each sample analysis individually. The ‘feature components’ are then merged by a consensus approach where unlikely annotations are removed (the algorithm is described further in the reference manual, ?generateComponents).

Each algorithm supports many different parameters that may significantly influence the (quality of the) output. For instance, care has to be taken to avoid ‘over-clustering’ of feature groups which do not belong in the same component. This is often easily visible since the chromatographic peaks poorly overlap or are shaped differently. The checkComponents function (discussed here) can be used to quickly verify componentization results. For a complete listing all arguments see the reference manual (e.g. ?generateComponents).

Once the components with adduct and isotopes annotations are generated this data can be used to prioritize and improve the workflow.

Some example usage is shown below.

# Use CAMERA with defaults
componCAM <- generateComponents(fGroups, "camera", ionization = "positive")

# CAMERA with customized settings
componCAM2 <- generateComponents(fGroups, "camera", ionization = "positive",
                                 extraOpts = list(mzabs = 0.001, sigma = 5))

# Use RAMClustR with customized parameters
componRC <- generateComponents(fGroups, "ramclustr", ionization = "positive", hmax = 0.4,
                               extraOptsRC = list(cor.method = "spearman"),
                               extraOptsFM = list(ppm.error = 5))

# OpenMS with customized parameters
componOpenMS <- generateComponents(fGroups, "openms", ionization = "positive", chargeMax = 2,
                                   absMzDev = 0.002)

# cliqueMS with default parameters
componCliqueMS <- generateComponents(fGroups, "cliquems", ionization = "negative")

4.4.2 Homologues series

Homologues series can be automatically detected by interfacing with the nontarget R package. Components are made from feature groups that show increasing m/z and retention time values. Series are first detected within each replicate group. Afterwards, series from all replicates are linked in case (partial) overlap occurs and this overlap consists of the same feature groups (see figure below). Linked series are then finally merged if this will not cause any conflicts with other series: such a conflict typically occurs when two series are not only linked to each other.

Figure 4.2: Linking of homologues series top: partial overlap and will be linked; bottom: no linkage due to different feature in overlapping series.

The series that are linked can be interactively explored with the plotGraph() function (discussed here).

Common function arguments to generateComponents() are listed below.

Argument	Remarks
`ionization`	Ionization mode: `"positive"` or `"negative"`. Not needed if adduct annotations are available.
`rtRange`, `mzRange`	Retention and m/z increment range. Retention times can be negative to allow series with increasing m/z values and decreasing retention times.
`elements`	Vector with elements to consider.
`rtDev`, `absMzDev`	Maximum retention time and m/z deviation.
`...`	Further arguments passed to the `homol.search()` function.

# default settings
componNT <- generateComponents(fGroups, "nontarget", ionization = "positive")

# customized settings
componNT2 <- generateComponents(fGroups, "nontarget", ionization = "positive",
                                elements = c("C", "H"), rtRange = c(-60, 60))

4.4.3 Intensity and MS/MS similarity

The previous componentization methods utilized chemical properties to relate features. The two componentization algorithms described in this section use a statistical approach based on hierarchical clustering. The first algorithm normalizes all feature intensities and then clusters features with similar intensity profiles across sample analyses together. The second algorithm compares all MS/MS spectra from all feature groups, and then uses hierarchical clustering to generate components from feature groups that have a high MS/MS spectrum similarity.

Some common arguments to generateComponents() are listed below. It is recommended to test various settings (especially for method) to optimize the clustering results.

Argument	Algorithm	Default	Remarks
`method`	All	`"complete"`	Clustering method. See `?hclust`
`metric`	`intclust`	`"euclidean"`	Metric used to calculate the distance matrix. See `?daisy`.

The components are generated by automatically assigning clusters using the dynamicTreeCut R package. However, the cluster assignment can be performed manually or with different parameters, as is demonstrated below.

The resulting components are stored in an object from the componentsIntClust or componentsSpecClust S4 class, which are both derived from the componentsClust class (which in turn is derived from the components class). Several methods are defined that can be used on these objects to re-assign clusters, perform plotting operations and so on. Below are some examples. For plotting see the relevant visualization section. More info can be found in the reference manual (e.g. ?componentsIntClust, ?componentsSpecClust and ?componentsClust).

# generate intensity profile components with default settings
componInt <- generateComponents(fGroups, "intclust")

# manually re-assign clusters
componInt <- treeCut(componInt, k = 10)

# automatic re-assignment of clusters (adjusted max tree height)
componInt <- treeCutDynamic(componInt, maxTreeHeight = 0.7)

# MS/MS similarity components
componMSMS <- generateComponents(fGroups, "specclust", MSPeakLists = mslists)