7.2 Linking parent and transformation product features
This section discusses one of the most important steps in a TP screening workflow, which is to link feature groups of parents with those of candidate transformation products. During this step, components are made, where each component consist of one or more feature groups of detected TPs for a particular parent. Note that componentization was already introduced before, but for very different algorithms. However, the data format for TP componentization is highly similar. After componentization, several filters are available to clean and prioritize the data. These can even allow workflows without obtaining potential TPs in advance, which is discussed in the last subsection.
7.2.1 Componentization
Like other algorithms, the generateComponents
generic function is used to generate TP components, by setting the algorithm
parameter to "tp"
.
The following arguments are of importance:
Argument | Remarks |
---|---|
fGroups |
The input feature groups for the parents |
fGroupsTPs |
The input feature groups for the TPs |
ignoreParents |
Set to TRUE to ignore feature groups in fGroupsTPs that also occur in fGroups |
TPs |
The input transformation products, ie as generated by generateTPs() |
MSPeakLists , formulas , compounds |
Annotation objects used for similarity calculation between the parent and its TPs |
minRTDiff |
The minimum retention time difference (seconds) of a TP for it to be considered to elute differently than its parent. |
7.2.1.1 Feature group input
The fGroups
, fGroupsTPs
and ignoreParents
arguments are used by the componentization algorithm to identify which feature groups can be considered as parents and which as TPs. Three scenarios are possible:
fGroups=fGroupsTPs
andignoreParents=FALSE
: in this case no distinction is made, and all feature groups are considered a parent or TP (default iffGroupsTPs
is not specified).fGroups
andfGroupsTPs
contain different subsets of the samefeatureGroups
object andignoreParents=FALSE
: only the feature groups infGroups
/fGroupsTPs
are considered as parents/TPs.- As above, but with
ignoreParents=TRUE
: the same distinction is made as above, but any feature groups infGroupsTPs
are ignored if also present infGroups
.
The first scenario is often used if it is unknown which feature groups may be parents or which are TPs. Furthermore, this scenario may also be used if the dataset is sufficiently simple, for instance, because a suspect screening with the results from convertToSuspects
(discussed in the previous section) would reliably discriminate between parents and TPs. A workflow with the first scenario is demonstrated in the second example.
In all other cases it is recommended to use either the second or third scenario, since making a prior distinction between parent and TP feature groups greatly simplifies the dataset and reduces false positives. A relative simple example where this can be used is when there are two sample groups: before and after treatment.
componTP <- generateComponents(algorithm = "tp",
fGroups = fGroups[rGroups = "before"],
fGroupsTPs = fGroups[rGroups = "after"])
In this example, only those feature groups present in the “before” replicate group are considered as parents, and those in “after” may be considered as a TP. Since it is likely that there will be some overlap in feature groups between both sample groups, the ignoreParents
flag can be used to not consider any of the overlap for TP assignments:
componTP <- generateComponents(algorithm = "tp",
fGroups = fGroups[rGroups = "before"],
fGroupsTPs = fGroups[rGroups = "after"],
ignoreParents = TRUE)
More sophisticates ways are of course possible to provide an upfront distinction between parent/TP feature groups. In the fourth example a workflow is demonstrated where fold changes are used.
NOTE The feature groups specified for
fGroups
/fGroupsTPs
must always originate from the samefeatureGroups
object.
For the library
and biotransformer
algorithms it is mandatory that a suspect screening of parents and TPs is performed prior to componentization. This is necessary for the componentization algorithm to map the feature groups that belong to a particular parent or TP. To do so, the convertToSuspects
function is used to prepare the suspect list:
# set includeParents to TRUE since both the parents and TPs are needed
suspects <- convertToSuspects(TPs, includeParents = TRUE)
fGroupsScr <- screenSuspects(fGroups, suspects, onlyHits = TRUE)
# do the componentization
# a similar distinction between fGroups/fGroupsScr as discussed above can of course also be done
componTP <- generateComponents(fGroups = fGroupsScr, ...)
If a parent screening was already performed in advance, for instance when the input parents to generateTPs
are screening results, the screening results for parents and TPs can also be combined. The second example demonstrates this.
Note that in the case a parent suspect is matched to multiple feature groups, a component is made for each match. Similarly, if multiple feature groups match to a TP suspect, all of them will be incorporated in the component.
When TPs were generated with the logic
algorithm a suspect screening must also be carried out in advance. However, in this case it is not necessary to include the parents (since each parent equals a feature group no mapping is necessary). The onlyHits
variable to screenSuspects
must not be set in order to keep the parents.
7.2.1.2 Annotation similarity calculation
If additional annotation data for parents and TPs is given to the componentization algorithm, it will be used to calculate various similarity properties. Often, the chemical structure for a transformation product is similar to that of its parent. Hence, there is a good chance that a parent and its TPs also share similar MS/MS data.
Firstly, if MS peak lists are provided, then the spectrum similarity is calculated between each parent and its potential TP candidates. This is performed with all the three different alignment shifts (see the spectrum similarity section for more details).
In case formulas
and/or compounds
objects are specified, then a parent/TP comparison is made by counting the number of fragments and neutral losses that they share (by using the formula annotations). This property is mainly used for non-target workflows where the identity for a parent and TP is not yet well established. For this reason, fragments and neutral losses reported for all candidates for the parent/TP feature group are considered. Hence, it is highly recommend to pre-treat the annotation objects, for instance, with the topMost
filter. If both formulas
and compounds
are given the results are pooled. Note that each unique fragment/neutral loss is only counted once, thus multiple formula/compound candidates with the same annotations will not skew the results.
7.2.2 Processing data
The output of TP componentization is an object of the componentsTPs
class. This derives from the ‘regular’ components
class, therefore, all the data processing functionality described before (extraction, subsetting, filtering etc) are also valid for TP components.
Several additional filters are available to prioritize the data:
Filter | Remarks |
---|---|
retDirMatch |
If TRUE only keep TPs with an expected chromatographic retention direction compared to the parent. |
minSpecSim , minSpecPrec , minSpecSimBoth |
The minimum spectrum similarity between the parent and TP. Calculated with no, "precursor" and "both" alignment shifting (see spectrum similarity). |
minFragMatches , minNLMatches |
Minimum number of formula fragment/neutral loss matches between parent and TP (discussed in previous section). |
formulas |
A formulas object used to further verify candidate TPs that were generated by the logic algorithm. |
The retDirMatch
filter compares the expected and observed retention time direction of a TP in order to decide if it should be kept. The direction is a value of either -1
(TP elutes before parent), +1
(TP elutes after parent) or 0
(TP elutes very close to the parent or its direction is unknown). The directions are taken from the generated transformation products. For the library
and biotransformer
algorithms the log P values are compared of a TP and its parent. Here, it is assumed that lower log P values result in earlier elution (i.e. typical with reversed phase LC). For the logic
algorithm the retention time direction is taken from the transformation rules table. Note that specifying a large enough value for the minRTDiff
argument to generateComponents
is important to ensure that some tolerance exists while comparing retention time directions of parent and TPs. This filter does nothing if either the observed or expected direction is zero.
When TPs data was generated with the logic
algorithm it is recommended to use the formulas
filter. This filter uses formula annotations to verify that (1) a parent feature group contains the elements that are subtracted during the transformation and (2) the TP feature group contains the elements that were added during the transformation. Since the ‘right’ candidate formula is most likely not yet known, this filter looks at all candidates. Therefore, it is recommended to filter the formulas
object, for instance, with the topMost
filter.
Finally, the plotGraph()
method function that was introduced exploring transformation hierarchies for structure TPs, can also incorporate componentization results to simplify the plot and mark TP hits:
7.2.3 Omitting transformation product input
The TPs
argument to generateComponents
can also be omitted. In this case every feature group of fGroupTPs
is considered to be a potential TP for the potential parents specified for fGroups
. An advantage is that the screening workflow is not limited to any known TPs or transformations. However, such a workflow has high demands on prioritiation steps before and after the componentization to rule out the many false positives that may occur.
When no transformation data is supplied it is crucial to make a prior distinction between parent and TP feature groups. Afterwards, the MS/MS spectral and other annotation similarity filters mentioned in the previous section may be a powerful way to further prioritize data.
The fourth example demonstrates such a workflow.
7.2.4 Reporting TP components
The TP components can be reported with the report
function. This is done by setting the components
function argument (i.e. equally to all other component types). The results will be displayed with a customized format that allows easy exploring of each parent with its TPs. In addition, the TPs
argument can be set to include additional data such as transformation pathways.