Uses SIRIUS in combination with CSI:FingerID for compound annotation.

generateCompoundsSIRIUS(fGroups, ...)

# S4 method for featureGroups
generateCompoundsSIRIUS(
  fGroups,
  MSPeakLists,
  relMzDev = 5,
  adduct = NULL,
  projectPath = NULL,
  elements = "CHNOP",
  profile = "qtof",
  formulaDatabase = NULL,
  fingerIDDatabase = "pubchem",
  noise = NULL,
  cores = NULL,
  topMost = 100,
  topMostFormulas = 5,
  token = NULL,
  extraOptsGeneral = NULL,
  extraOptsFormula = NULL,
  verbose = TRUE,
  splitBatches = FALSE,
  dryRun = FALSE
)

# S4 method for featureGroupsSet
generateCompoundsSIRIUS(
  fGroups,
  MSPeakLists,
  relMzDev = 5,
  adduct = NULL,
  projectPath = NULL,
  ...,
  setThreshold = 0,
  setThresholdAnn = 0,
  setAvgSpecificScores = FALSE
)

Arguments

fGroups

featureGroups object which should be annotated. This should be the same or a subset of the object that was used to create the specified MSPeakLists. In the case of a subset only the remaining feature groups in the subset are considered.

...

(sets workflow) Further arguments passed to the non-sets workflow method.

MSPeakLists

A MSPeakLists object that was generated for the supplied fGroups.

relMzDev

Maximum relative deviation between the measured and candidate formula m/z values (in ppm). Sets the --ppm-max command line option.

adduct

An adduct object (or something that can be converted to it with as.adduct). Examples: "[M-H]-", "[M+Na]+". If the featureGroups object has adduct annotations then these are used if adducts=NULL.

(sets workflow) The adduct argument is not supported for sets workflows, since the adduct annotations will then always be used.

projectPath, dryRun

These are mainly for internal purposes. projectPath sets the output directory for the SIRIUS output (a temporary directory if NULL). If dryRun is TRUE then no computations are done and only the results from projectPath are processed.

(sets workflow) projectPath should be a character specifying the paths for each set.

elements

Elements to be considered for formulae calculation. This will heavily affects the number of candidates! Always try to work with a minimal set by excluding elements you don't expect. The minimum/maximum number of elements can also be specified, for example: a value of "C[5]H[10-15]O" will only consider formulae with up to five carbon atoms, between ten and fifteen hydrogen atoms and any amount of oxygen atoms. Sets the --elements command line option.

profile

Name of the configuration profile, for example: "qtof", "orbitrap", "fticr". Sets the --profile commandline option.

formulaDatabase

If not NULL, use a database for retrieval of formula candidates. Possible values are: "pubchem", "bio", "kegg", "hmdb". Sets the --database commandline option.

fingerIDDatabase

Database specifically used for CSI:FingerID. If NULL, the value of the formulaDatabase parameter will be used or "pubchem" when that is also NULL. Sets the --fingerid-db option.

noise

Median intensity of the noise (NULL ignores this parameter). Sets the --noise commandline option.

cores

The number of cores SIRIUS will use. If NULL then the default of all cores will be used.

topMost

Only keep this number of candidates (per feature group) with highest score. Set to NULL to always keep all candidates, however, please note that this may result in significant usage of CPU/RAM resources for large numbers of candidates.

topMostFormulas

Do not return more than this number of candidate formulae. Note that only compounds for these formulae will be searched. Sets the --candidates commandline option.

token

A character string with the refresh token to be used for logging in with SIRIUS (from version 5 only). The token can be obtained with the getSIRIUSToken function, or by running SIRIUS directly (e.g. with the login command). See the SIRIUS website for more information. If NULL then the log in is skipped.

extraOptsGeneral, extraOptsFormula

a character vector with any extra commandline parameters for SIRIUS. For SIRIUS versions <4.4 there is no distinction between general and formula options. Otherwise commandline options specified in extraOptsGeneral are added prior to the formula command, while options specified in extraOptsFormula are added in afterwards. See the SIRIUS manual for more details. Set to NULL to ignore.

verbose

If TRUE then more output is shown in the terminal.

splitBatches

If TRUE then the calculations done by SIRIUS will be evenly split over multiple SIRIUS calls (which may be run in parallel depending on the set package options). If splitBatches=FALSE then all feature calculations are performed from a single SIRIUS execution, which is often the fastest if calculations are performed on a single computer.

setThreshold

(sets workflow) Minimum abundance for a candidate among all sets (0-1). For instance, a value of 1 means that the candidate needs to be present in all the set data.

setThresholdAnn

(sets workflow) As setThreshold, but only taking into account the set data that contain annotations for the feature group of the candidate.

setAvgSpecificScores

(sets workflow) If TRUE then set specific scorings (e.g. MS/MS match) are also averaged.

Value

A compoundsSIRIUS object.

Details

This function uses SIRIUS to generate compound candidates. This function is called when calling generateCompounds with algorithm="sirius".

Similar to generateFormulasSIRIUS, candidate formulae are generated with SIRIUS. These results are then feed to CSI:FingerID to acquire candidate structures. This method requires the availability of MS/MS data, and feature groups without it will be ignored.

Note

For annotations performed with SIRIUS it is often the fastest to keep the default splitBatches=FALSE. In this case, all SIRIUS output will be printed to the terminal (unless verbose=FALSE or patRoon.MP.method="future"). Furthermore, please note that only annotations to be performed for the same adduct are grouped in a single batch execution.

Parallelization

generateCompoundsSIRIUS uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.

References

Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019). “SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.” Nature Methods, 16(4), 299--302. doi:10.1038/s41592-019-0344-8 .

Duhrkop K, Bocker S (2015). “Fragmentation Trees Reloaded.” In Przytycka TM (ed.), Research in Computational Molecular Biology, 65--79. ISBN 978-3-319-16706-0.

Duhrkop K, Shen H, Meusel M, Rousu J, Bocker S (2015). “Searching molecular structure databases with tandem mass spectra using CSI:FingerID.” Proceedings of the National Academy of Sciences, 112(41), 12580--12585. doi:10.1073/pnas.1509788112 .

Bocker S, Letzel MC, Liptak Z, Pervukhin A (2008). “SIRIUS: decomposing isotope patterns for metabolite identification.” Bioinformatics, 25(2), 218--224. doi:10.1093/bioinformatics/btn603 .

See also

generateCompounds for more details and other algorithms.