Uses SIRIUS to generate chemical formulae candidates.
generateFormulasSIRIUS(fGroups, ...)
# S4 method for class 'featureGroups'
generateFormulasSIRIUS(
fGroups,
MSPeakLists,
relMzDev = 5,
adduct = NULL,
projectPath = NULL,
elements = "CHNOP",
profile = "qtof",
database = NULL,
noise = NULL,
cores = NULL,
getFingerprints = FALSE,
topMost = 100,
login = FALSE,
alwaysLogin = FALSE,
extraOptsGeneral = NULL,
extraOptsFormula = NULL,
calculateFeatures = TRUE,
featThreshold = 0,
featThresholdAnn = 0.75,
absAlignMzDev = 0.002,
verbose = TRUE,
splitBatches = FALSE,
dryRun = FALSE
)
# S4 method for class 'featureGroupsSet'
generateFormulasSIRIUS(
fGroups,
MSPeakLists,
relMzDev = 5,
adduct = NULL,
projectPath = NULL,
...,
setThreshold = 0,
setThresholdAnn = 0,
setAvgSpecificScores = FALSE
)
featureGroups
object for which formulae should be generated. This should be the same or
a subset of the object that was used to create the specified MSPeakLists
. In the case of a subset only the
remaining feature groups in the subset are considered.
Further arguments passed to the non-sets workflow method.
An MSPeakLists
object that was generated for the supplied fGroups
.
Maximum relative deviation between the measured and candidate formula m/z values (in ppm). Sets the --ppm-max command line option.
An adduct
object (or something that can be converted to it with as.adduct
).
Examples: "[M-H]-"
, "[M+Na]+"
. If the featureGroups
object has
adduct annotations then these are used if adducts=NULL
.
The adduct
argument is not supported for sets workflows, since the
adduct annotations will then always be used.
These are mainly for internal purposes. projectPath
sets the output directory for
the SIRIUS
output (a temporary directory if NULL
). If dryRun
is TRUE
then no
computations are done and only the results from projectPath
are processed.
projectPath
should be a character
specifying the paths for each set.
Elements to be considered for formulae calculation. This will heavily affects the number of
candidates! Always try to work with a minimal set by excluding elements you don't expect. The minimum/maximum
number of elements can also be specified, for example: a value of "C[5]H[10-15]O"
will only consider
formulae with up to five carbon atoms, between ten and fifteen hydrogen atoms and any amount of oxygen atoms. Sets
the --elements command line option.
Name of the configuration profile, for example: "qtof", "orbitrap", "fticr". Sets the --profile commandline option.
If not NULL
, use a database for retrieval of formula
candidates. Possible values are: "pubchem", "bio", "kegg", "hmdb". Sets the
--database commandline option.
Median intensity of the noise (NULL
ignores this parameter). Sets the --noise
commandline option.
The number of cores SIRIUS
will use. If NULL
then the default of all cores will be
used.
Set to TRUE
to load SIRIUS-CSI:FingerID
MS/MS fingerprints for the formula
candidates. This is currently only supported with calculateFeatures=FALSE
to avoid heavy server traffic. The
fingerprints are stored in the fingerprints
slot of the returned formulasSIRIUS
object, and
are used by the predictTox
and predictRespFactors
methods.
Only keep this number of candidates (per feature group) with highest score. Sets the --candidates command line option.
Specifies if and how account logging of SIRIUS should be handled:
login=FALSE
: no automatic login is performed and the active login status is not checked.
login="check"
: aborts if no active login is present.
login="interactive"
: interactively ask for login (using getPass).
login=c(username="...", password="...")
: perform the login with the given details. For security reasons,
please do not enter the details directly, but use e.g. environment variables or store/retrieve them with the
keyring package.
if alwaysLogin=TRUE
then a login is always performed, otherwise only if SIRIUS reports no active login.
See the SIRIUS website and patRoon handbook for more information.
a character
vector with any extra commandline parameters for
SIRIUS
. For SIRIUS
versions <4.4
there is no distinction between general and formula
options. Otherwise commandline options specified in extraOptsGeneral
are added prior to the formula
command, while options specified in extraOptsFormula
are added in afterwards. See the SIRIUS
manual for more details. Set to NULL
to ignore.
If TRUE
fomulae are first calculated for all features
prior to feature group assignment (see Candidate assignment
in generateFormulas
).
If calculateFeatures=TRUE
: minimum presence (0-1) of a formula in all features
before it is considered as a candidate for a feature group. For instance, featThreshold=0.75
dictates that a
formula should be present in at least 75% of the features inside a feature group.
As featThreshold
, but only considers features with annotations. For instance,
featThresholdAnn=0.75
dictates that a formula should be present in at least 75% of the features with
annotations inside a feature group. @param topMost Only keep this number of candidates
(per feature group) with highest score. Sets the --candidates command line
option.
When the group formula annotation consensus is made from feature annotations, the m/z
values of annotated MS/MS fragments may slightly deviate from those of the corresponding group MS/MS peak list. The
absAlignMzDev
argument specifies the maximum m/z window used to re-align the mass peaks.
If TRUE
then more output is shown in the terminal.
If TRUE
then the calculations done by SIRIUS
will be evenly split over multiple
SIRIUS
calls (which may be run in parallel depending on the set package
options). If splitBatches=FALSE
then all feature calculations are performed from a single SIRIUS
execution, which is often the fastest if calculations are performed on a single computer.
Minimum abundance for a candidate among all sets (0-1). For instance, a value of 1 means that the candidate needs to be present in all the set data.
As setThreshold
, but only taking into account the set data that contain
annotations for the feature group of the candidate.
If TRUE
then set specific scorings (e.g. MS/MS match) are also
averaged.
A formulasSIRIUS
object.
This function uses sirius to generate formula candidates. This function is called when calling generateFormulas
with
algorithm="sirius"
.
Similarity of measured and theoretical isotopic patterns will be used for scoring candidates. Note that
SIRIUS
requires availability of MS/MS data.
For annotations performed with SIRIUS
it is often the fastest to keep the default
splitBatches=FALSE
. In this case, all SIRIUS
output will be printed to the terminal (unless
verbose=FALSE
or patRoon.MP.method="future"). Furthermore, please note that only annotations to be
performed for the same adduct are grouped in a single batch execution.
generateFormulasSIRIUS uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
Dhrkop2019patRoon
Duhrkop2015patRoon
Duhrkop2015-2patRoon
Bcker2008patRoon
generateFormulas
for more details and other algorithms.