7.2 Linking parent and transformation product features

This section discusses one of the most important steps in a TP screening workflow, which is to link feature groups of parents with those of candidate transformation products. During this step, components are made, where each component consist of one or more feature groups of detected TPs for a particular parent. Note that componentization was already introduced before, but for very different algorithms. However, the data format for TP componentization is highly similar. After componentization, several filters are available to clean and prioritize the data. These can even allow workflows without obtaining potential TPs in advance, which is discussed in the last subsection.

7.2.1 Componentization

Like other algorithms, the generateComponents generic function is used to generate TP components, by setting the algorithm parameter to "tp".

The following arguments are of importance:

Argument Remarks
fGroups The input feature groups for the parents
fGroupsTPs The input feature groups for the TPs
ignoreParents Set to TRUE to ignore feature groups in fGroupsTPs that also occur in fGroups
TPs The input transformation products, ie as generated by generateTPs()
MSPeakLists, formulas, compounds Annotation objects used for similarity calculation between the parent and its TPs
minRTDiff The minimum retention time difference (seconds) of a TP for it to be considered to elute differently than its parent.

7.2.1.1 Feature group input

The fGroups, fGroupsTPs and ignoreParents arguments are used by the componentization algorithm to identify which feature groups can be considered as parents and which as TPs. Three scenarios are possible:

  1. fGroups=fGroupsTPs and ignoreParents=FALSE: in this case no distinction is made, and all feature groups are considered a parent or TP (default if fGroupsTPs is not specified).
  2. fGroups and fGroupsTPs contain different subsets of the same featureGroups object and ignoreParents=FALSE: only the feature groups in fGroups/fGroupsTPs are considered as parents/TPs.
  3. As above, but with ignoreParents=TRUE: the same distinction is made as above, but any feature groups in fGroupsTPs are ignored if also present in fGroups.

The first scenario is often used if it is unknown which feature groups may be parents or which are TPs. Furthermore, this scenario may also be used if the dataset is sufficiently simple, for instance, because a suspect screening with the results from convertToSuspects (discussed in the previous section) would reliably discriminate between parents and TPs. A workflow with the first scenario is demonstrated in the second example.

In all other cases it is recommended to use either the second or third scenario, since making a prior distinction between parent and TP feature groups greatly simplifies the dataset and reduces false positives. A relative simple example where this can be used is when there are two sample groups: before and after treatment.

componTP <- generateComponents(algorithm = "tp",
                               fGroups = fGroups[rGroups = "before"],
                               fGroupsTPs = fGroups[rGroups = "after"])

In this example, only those feature groups present in the “before” replicate group are considered as parents, and those in “after” may be considered as a TP. Since it is likely that there will be some overlap in feature groups between both sample groups, the ignoreParents flag can be used to not consider any of the overlap for TP assignments:

componTP <- generateComponents(algorithm = "tp",
                               fGroups = fGroups[rGroups = "before"],
                               fGroupsTPs = fGroups[rGroups = "after"],
                               ignoreParents = TRUE)

More sophisticates ways are of course possible to provide an upfront distinction between parent/TP feature groups. In the fourth example a workflow is demonstrated where fold changes are used.

NOTE The feature groups specified for fGroups/fGroupsTPs must always originate from the same featureGroups object.

For the library and biotransformer algorithms it is mandatory that a suspect screening of parents and TPs is performed prior to componentization. This is necessary for the componentization algorithm to map the feature groups that belong to a particular parent or TP. To do so, the convertToSuspects function is used to prepare the suspect list:

# set includeParents to TRUE since both the parents and TPs are needed
suspects <- convertToSuspects(TPs, includeParents = TRUE)
fGroupsScr <- screenSuspects(fGroups, suspects, onlyHits = TRUE)

# do the componentization
# a similar distinction between fGroups/fGroupsScr as discussed above can of course also be done
componTP <- generateComponents(fGroups = fGroupsScr, ...)

If a parent screening was already performed in advance, for instance when the input parents to generateTPs are screening results, the screening results for parents and TPs can also be combined. The second example demonstrates this.

Note that in the case a parent suspect is matched to multiple feature groups, a component is made for each match. Similarly, if multiple feature groups match to a TP suspect, all of them will be incorporated in the component.

When TPs were generated with the logic algorithm a suspect screening must also be carried out in advance. However, in this case it is not necessary to include the parents (since each parent equals a feature group no mapping is necessary). The onlyHits variable to screenSuspects must not be set in order to keep the parents.

# only screen for TPs
suspects <- convertToSuspects(TPs, includeParents = FALSE)
# but keep all other feature groups as these may be parents
fGroupsScr <- screenSuspects(fGroups, suspects, onlyHits = FALSE)

# do the componentization...

7.2.1.2 Annotation similarity calculation

If additional annotation data for parents and TPs is given to the componentization algorithm, it will be used to calculate various similarity properties. Often, the chemical structure for a transformation product is similar to that of its parent. Hence, there is a good chance that a parent and its TPs also share similar MS/MS data.

Firstly, if MS peak lists are provided, then the spectrum similarity is calculated between each parent and its potential TP candidates. This is performed with all the three different alignment shifts (see the spectrum similarity section for more details).

In case formulas and/or compounds objects are specified, then a parent/TP comparison is made by counting the number of fragments and neutral losses that they share (by using the formula annotations). This property is mainly used for non-target workflows where the identity for a parent and TP is not yet well established. For this reason, fragments and neutral losses reported for all candidates for the parent/TP feature group are considered. Hence, it is highly recommend to pre-treat the annotation objects, for instance, with the topMost filter. If both formulas and compounds are given the results are pooled. Note that each unique fragment/neutral loss is only counted once, thus multiple formula/compound candidates with the same annotations will not skew the results.

7.2.2 Processing data

The output of TP componentization is an object of the componentsTPs class. This derives from the ‘regular’ components class, therefore, all the data processing functionality described before (extraction, subsetting, filtering etc) are also valid for TP components.

Several additional filters are available to prioritize the data:

Filter Remarks
retDirMatch If TRUE only keep TPs with an expected chromatographic retention direction compared to the parent.
minSpecSim, minSpecPrec, minSpecSimBoth The minimum spectrum similarity between the parent and TP. Calculated with no, "precursor" and "both" alignment shifting (see spectrum similarity).
minFragMatches, minNLMatches Minimum number of formula fragment/neutral loss matches between parent and TP (discussed in previous section).
formulas A formulas object used to further verify candidate TPs that were generated by the logic algorithm.

The retDirMatch filter compares the expected and observed retention time direction of a TP in order to decide if it should be kept. The direction is a value of either -1 (TP elutes before parent), +1 (TP elutes after parent) or 0 (TP elutes very close to the parent or its direction is unknown). The directions are taken from the generated transformation products. For the library and biotransformer algorithms the log P values are compared of a TP and its parent. Here, it is assumed that lower log P values result in earlier elution (i.e. typical with reversed phase LC). For the logic algorithm the retention time direction is taken from the transformation rules table. Note that specifying a large enough value for the minRTDiff argument to generateComponents is important to ensure that some tolerance exists while comparing retention time directions of parent and TPs. This filter does nothing if either the observed or expected direction is zero.

When TPs data was generated with the logic algorithm it is recommended to use the formulas filter. This filter uses formula annotations to verify that (1) a parent feature group contains the elements that are subtracted during the transformation and (2) the TP feature group contains the elements that were added during the transformation. Since the ‘right’ candidate formula is most likely not yet known, this filter looks at all candidates. Therefore, it is recommended to filter the formulas object, for instance, with the topMost filter.

Finally, the plotGraph() method function that was introduced exploring transformation hierarchies for structure TPs, can also incorporate componentization results to simplify the plot and mark TP hits:

plotGraph(TPsBT, which = "Atrazine", components = componTP)

7.2.3 Omitting transformation product input

The TPs argument to generateComponents can also be omitted. In this case every feature group of fGroupTPs is considered to be a potential TP for the potential parents specified for fGroups. An advantage is that the screening workflow is not limited to any known TPs or transformations. However, such a workflow has high demands on prioritiation steps before and after the componentization to rule out the many false positives that may occur.

When no transformation data is supplied it is crucial to make a prior distinction between parent and TP feature groups. Afterwards, the MS/MS spectral and other annotation similarity filters mentioned in the previous section may be a powerful way to further prioritize data.

The fourth example demonstrates such a workflow.

7.2.4 Reporting TP components

The TP components can be reported with the report function. This is done by setting the components function argument (i.e. equally to all other component types). The results will be displayed with a customized format that allows easy exploring of each parent with its TPs. In addition, the TPs argument can be set to include additional data such as transformation pathways.

report(fGroups, components = componTP, TPs = TPs)