6.2 Generating sets workflow data
As was shown in the previous section, the generation of workflow data with a sets workflow largely follows that as what was discussed in the previous chapters. The same generator functions are used:
Workflow step | Function | Output S4 class |
---|---|---|
Grouping features | groupFeatures() |
featureGroupsSet |
Suspect screening | screenSuspects() |
featureGroupsScreeningSet |
MS peak lists | generateMSPeakLists() |
MSPeakListsSet |
Formula annotation | generateFormulas() |
formulasSet |
Compound annotation | generateCompounds() |
compoundsSet |
Componentization | generateComponents() |
algorithm dependent |
(the data pre-treatment and feature finding steps have been omitted as they are not specific to sets workflows).
While the same function generics are used to generate data, the class of the output objects differ (e.g. formulasSet
instead of formulas
). However, since all these classes inherit from their non-sets workflow counterparts, using the workflow data in a sets workflow is nearly identical to what was discussed in the previous chapters (further discussed in the next section).
As discussed before, an important step is the neutralization of features. Other workflow steps also have internal mechanics to deal with data from different sets:
Workflow step | Handling of set data |
---|---|
Finding/Grouping features | Neutralization of m/z values |
Suspect screening | Merging results from screening performed for each set |
Componentization | Algorithm dependent (discussed below) |
MS peak lists | MS data is obtained and stored per set. The final peak lists are combined (not averaged) |
Formula/Compound annotation | Annotation is performed for each set separately and used to generate a final consensus |
In most cases the algorithms of the workflow steps are first performed for each set, and this data is then merged. To illustrate the importance of this, consider these examples
- A suspect screening with a suspect list that contains known MS/MS fragments
- Annotation where MS/MS fragments are used to predict the chemical formula
- Componentization in order to establish adduct assignments for the features
In all cases data is used that is highly dependent on the MS method (eg polarity) that was used to acquire the sample data. Nevertheless, all the steps needed to obtain and combine set data are performed automatically in the background, and are therefore largely invisible.
NOTE Because feature groups in sets workflows always have adduct annotations, it is never required to specify the adduct or ionization mode when generating annotations, components or do suspect screening (i.e. the
adduct
/ionization
arguments should not be specified).
6.2.1 Componentization
When the componentization algorithms related to adduct/isotope annotations (e.g. CAMERA, RAMClustR and cliqueMS) and nontarget are used, then componentization occurs per set and the final object (a componentsSet
or componentsNTSet
) contains all the components together. Since these algorithms are highly dependent upon MS data polarity, no attempt is made to merge components from different sets.
The other componentization algorithms work on the complete data. For more details, see the reference manual (?generateComponents
).
6.2.2 Formula and compound annotation
For formula and compound annotation, the data generated for each set is combined to generate a set consensus. The annotation tables are merged, scores are averaged and candidates are re-ranked. More details can be found in the reference manual (e.g. ?generateCompounds
). In addition, it possible to only keep candidates that exist in a minimum number of sets. For this, the setThreshold
and setThresholdAnn
argument can be used:
# candidate must be present in all sets
formulas <- generateFormulas(fGroups, mslists, "genform", setThreshold = 1)
# candidate must be present in all sets with annotation data
compounds <- generateCompounds(fGroups, mslists, "metfrag", setThresholdAnn = 1)
In the first example, a formula candidate for a feature group is only kept if it was found for all of the sets. In the second example, a compound candidate is only kept if it was present in all of the sets with annotation data available. The following examples of a common positive/negative sets workflow illustrate the differences:
Candidate | annotations | candidate present | setThreshold=1 |
setThresholdAnn=1 |
---|---|---|---|---|
#1 | + , - |
+ , - |
Keep | Keep |
#2 | + , - |
+ |
Remove | Remove |
#3 | + |
+ |
Remove | Keep |
For more information refer to the reference manual (e.g. ?generateCompounds
).