Automatically calculate chemical formulae for all feature groups.
generateFormulas(fGroups, MSPeakLists, algorithm, ...)
# S4 method for class 'featureGroups'
generateFormulas(fGroups, MSPeakLists, algorithm, ...)
featureGroups
object for which formulae should be generated. This should be the same or
a subset of the object that was used to create the specified MSPeakLists
. In the case of a subset only the
remaining feature groups in the subset are considered.
An MSPeakLists
object that was generated for the supplied fGroups
.
A character string describing the algorithm that should be
used: "bruker"
, "genform"
, "sirius"
Any parameters to be passed to the selected formula generation algorithm.
A formulas
object containing all generated formulae.
Several algorithms are provided to automatically generate formulae for given feature groups. All algorithms use the accurate mass of a feature to back-calculate candidate formulae. Depending on the algorithm and data availability, other data such as isotopic pattern and MS/MS fragments may be used to further improve formula assignment and ranking.
generateFormulas
is a generic function that will generateFormulas by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as generateFormulasDA
and generateFormulasGenForm
. While these
functions may be called directly, generateFormulas
provides a generic interface and is therefore usually preferred.
Formula candidate assignment occurs in one of the following ways:
Candidates are first generated for each feature and then pooled to form consensus candidates for the feature group.
Candidates are directly generated for each feature group by group averaged MS peak list data.
With approach (1), scorings and mass errors are averaged and outliers are removed (controlled by
featThreshold
and featThresholdAnn
arguments). Other candidate properties that cannot be averaged are
from the feature from the analysis as specified in the "analysis"
column of the results. The second approach only generates candidate formulae once for every feature group, and is therefore generally much
faster. However, this inherently prevents removal of outliers.
Note that with either approach subsequent workflow steps that use formula data (e.g.
addFormulaScoring
and reporting functions) only use formula data that was eventually assigned
to feature groups.
Each algorithm implements their own scoring system. Their names have been harmonized where
possible. An overview is obtained with the formulaScorings
function:
name | genform | sirius | bruker | description |
combMatch | comb_match | - | - | MS and MS/MS combined match value |
isoScore | MS_match | isoScore | - | How well the isotopic pattern matches |
mSigma | - | - | mSigma | Deviation of the isotopic pattern |
MSMSScore | MSMS_match | treeScore | - | How well MS/MS data matches |
score | - | score | Score | Overall MS formula score |
With a sets workflow, annotation is first performed for each set. This is important, since the annotation algorithms typically cannot work with data from mixed ionization modes. The annotation results are then combined to generate a sets consensus:
The annotation tables for each feature group from the set specific data are combined. Rows with overlapping candidates (determined by the neutral formula) are merged.
Set specific data (e.g. the ionic formula) is retained by renaming their columns with set specific names.
The MS/MS fragment annotations (fragInfo
column) from each set are combined.
The scorings for each set are averaged to calculate overall scores. if setAvgSpecificScores=FALSE
then
scorings that are considered set specific (e.g. MS/MS and isotopic pattern match) are not averaged.
The candidates are re-ranked based on their average ranking among the set data (if a candidate is absent in a set it is assigned the poorest rank in that set).
The coverage of each candidate among sets is calculated. Depending on the setThreshold
and
setThresholdAnn
arguments, candidates with low abundance are removed.
The formulas
output class and its methods and the algorithm specific functions:
generateFormulasDA
, generateFormulasGenForm
, generateFormulasSIRIUS
The GenForm manual (also known as MOLGEN-MSMS).