Uses SIRIUS to generate chemical formulae candidates.

generateFormulasSIRIUS(fGroups, ...)

# S4 method for featureGroups
generateFormulasSIRIUS(
  fGroups,
  MSPeakLists,
  relMzDev = 5,
  adduct = NULL,
  projectPath = NULL,
  elements = "CHNOP",
  profile = "qtof",
  database = NULL,
  noise = NULL,
  cores = NULL,
  getFingerprints = FALSE,
  topMost = 100,
  token = NULL,
  extraOptsGeneral = NULL,
  extraOptsFormula = NULL,
  calculateFeatures = TRUE,
  featThreshold = 0,
  featThresholdAnn = 0.75,
  absAlignMzDev = 0.002,
  verbose = TRUE,
  splitBatches = FALSE,
  dryRun = FALSE
)

# S4 method for featureGroupsSet
generateFormulasSIRIUS(
  fGroups,
  MSPeakLists,
  relMzDev = 5,
  adduct = NULL,
  projectPath = NULL,
  ...,
  setThreshold = 0,
  setThresholdAnn = 0,
  setAvgSpecificScores = FALSE
)

Arguments

fGroups

featureGroups object for which formulae should be generated. This should be the same or a subset of the object that was used to create the specified MSPeakLists. In the case of a subset only the remaining feature groups in the subset are considered.

...

(sets workflow) Further arguments passed to the non-sets workflow method.

MSPeakLists

An MSPeakLists object that was generated for the supplied fGroups.

relMzDev

Maximum relative deviation between the measured and candidate formula m/z values (in ppm). Sets the --ppm-max command line option.

adduct

An adduct object (or something that can be converted to it with as.adduct). Examples: "[M-H]-", "[M+Na]+". If the featureGroups object has adduct annotations then these are used if adducts=NULL.

(sets workflow) The adduct argument is not supported for sets workflows, since the adduct annotations will then always be used.

projectPath, dryRun

These are mainly for internal purposes. projectPath sets the output directory for the SIRIUS output (a temporary directory if NULL). If dryRun is TRUE then no computations are done and only the results from projectPath are processed.

(sets workflow) projectPath should be a character specifying the paths for each set.

elements

Elements to be considered for formulae calculation. This will heavily affects the number of candidates! Always try to work with a minimal set by excluding elements you don't expect. The minimum/maximum number of elements can also be specified, for example: a value of "C[5]H[10-15]O" will only consider formulae with up to five carbon atoms, between ten and fifteen hydrogen atoms and any amount of oxygen atoms. Sets the --elements command line option.

profile

Name of the configuration profile, for example: "qtof", "orbitrap", "fticr". Sets the --profile commandline option.

database

If not NULL, use a database for retrieval of formula candidates. Possible values are: "pubchem", "bio", "kegg", "hmdb". Sets the --database commandline option.

noise

Median intensity of the noise (NULL ignores this parameter). Sets the --noise commandline option.

cores

The number of cores SIRIUS will use. If NULL then the default of all cores will be used.

getFingerprints

Set to TRUE to load SIRIUS-CSI:FingerID MS/MS fingerprints for the formula candidates. This is currently only supported with calculateFeatures=FALSE to avoid heavy server traffic. The fingerprints are stored in the fingerprints slot of the returned formulasSIRIUS object, and are used by the predictTox and predictRespFactors methods.

topMost

Only keep this number of candidates (per feature group) with highest score. Sets the --candidates command line option.

token

A character string with the refresh token to be used for logging in with SIRIUS (from version 5 only). The token can be obtained with the getSIRIUSToken function, or by running SIRIUS directly (e.g. with the login command). See the SIRIUS website for more information. If NULL then the log in is skipped.

extraOptsGeneral, extraOptsFormula

a character vector with any extra commandline parameters for SIRIUS. For SIRIUS versions <4.4 there is no distinction between general and formula options. Otherwise commandline options specified in extraOptsGeneral are added prior to the formula command, while options specified in extraOptsFormula are added in afterwards. See the SIRIUS manual for more details. Set to NULL to ignore.

calculateFeatures

If TRUE fomulae are first calculated for all features prior to feature group assignment (see Candidate assignment in generateFormulas).

featThreshold

If calculateFeatures=TRUE: minimum presence (0-1) of a formula in all features before it is considered as a candidate for a feature group. For instance, featThreshold=0.75 dictates that a formula should be present in at least 75% of the features inside a feature group.

featThresholdAnn

As featThreshold, but only considers features with annotations. For instance, featThresholdAnn=0.75 dictates that a formula should be present in at least 75% of the features with annotations inside a feature group. @param topMost Only keep this number of candidates (per feature group) with highest score. Sets the --candidates command line option.

absAlignMzDev

When the group formula annotation consensus is made from feature annotations, the m/z values of annotated MS/MS fragments may slightly deviate from those of the corresponding group MS/MS peak list. The absAlignMzDev argument specifies the maximum m/z window used to re-align the mass peaks.

verbose

If TRUE then more output is shown in the terminal.

splitBatches

If TRUE then the calculations done by SIRIUS will be evenly split over multiple SIRIUS calls (which may be run in parallel depending on the set package options). If splitBatches=FALSE then all feature calculations are performed from a single SIRIUS execution, which is often the fastest if calculations are performed on a single computer.

setThreshold

(sets workflow) Minimum abundance for a candidate among all sets (0-1). For instance, a value of 1 means that the candidate needs to be present in all the set data.

setThresholdAnn

(sets workflow) As setThreshold, but only taking into account the set data that contain annotations for the feature group of the candidate.

setAvgSpecificScores

(sets workflow) If TRUE then set specific scorings (e.g. MS/MS match) are also averaged.

Value

A formulasSIRIUS object.

Details

This function uses sirius to generate formula candidates. This function is called when calling generateFormulas with algorithm="sirius".

Similarity of measured and theoretical isotopic patterns will be used for scoring candidates. Note that SIRIUS requires availability of MS/MS data.

Note

For annotations performed with SIRIUS it is often the fastest to keep the default splitBatches=FALSE. In this case, all SIRIUS output will be printed to the terminal (unless verbose=FALSE or patRoon.MP.method="future"). Furthermore, please note that only annotations to be performed for the same adduct are grouped in a single batch execution.

Parallelization

generateFormulasSIRIUS uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.

References

Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019). “SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.” Nature Methods, 16(4), 299--302. doi:10.1038/s41592-019-0344-8 .

Duhrkop K, Bocker S (2015). “Fragmentation Trees Reloaded.” In Przytycka TM (ed.), Research in Computational Molecular Biology, 65--79. ISBN 978-3-319-16706-0.

Duhrkop K, Shen H, Meusel M, Rousu J, Bocker S (2015). “Searching molecular structure databases with tandem mass spectra using CSI:FingerID.” Proceedings of the National Academy of Sciences, 112(41), 12580--12585. doi:10.1073/pnas.1509788112 .

Bocker S, Letzel MC, Liptak Z, Pervukhin A (2008). “SIRIUS: decomposing isotope patterns for metabolite identification.” Bioinformatics, 25(2), 218--224. doi:10.1093/bioinformatics/btn603 .

See also

generateFormulas for more details and other algorithms.