R/generics.R, R/main.R, R/formulas.R, and 6 more
id-conf.RdFunctions to estimate the identification confidence for suspects and annotation candidates.
estimateIDConfidence(obj, ...)
# S4 method for class 'formulas'
estimateIDConfidence(
obj,
absMzDev = defaultLim("mz", "medium"),
normalizeScores = "max",
IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"),
logPath = NULL
)
# S4 method for class 'compounds'
estimateIDConfidence(
obj,
absMzDev = defaultLim("mz", "medium"),
MSPeakLists = NULL,
formulas = NULL,
specSimParams = getDefSpecSimParams(removePrecursor = TRUE),
formulasNormalizeScores = "max",
compoundsNormalizeScores = "max",
IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"),
logPath = NULL
)
# S4 method for class 'featureGroupsScreening'
estimateIDConfidence(
obj,
MSPeakLists = NULL,
formulas = NULL,
compounds = NULL,
absMzDev = defaultLim("mz", "medium"),
checkFragments = c("mz", "formula", "compound"),
formulasNormalizeScores = "max",
compoundsNormalizeScores = "max",
IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"),
logPath = file.path("log", "ident")
)
# S4 method for class 'featureGroupsScreeningSet'
estimateIDConfidence(
obj,
MSPeakLists = NULL,
formulas = NULL,
compounds = NULL,
absMzDev = defaultLim("mz", "medium"),
checkFragments = c("mz", "formula", "compound"),
formulasNormalizeScores = "max",
compoundsNormalizeScores = "max",
IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"),
logPath = file.path("log", "ident")
)
# S4 method for class 'compoundsSet'
estimateIDConfidence(
obj,
absMzDev = defaultLim("mz", "medium"),
MSPeakLists = NULL,
formulas = NULL,
formulasNormalizeScores = "max",
compoundsNormalizeScores = "max",
IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"),
logPath = NULL
)
# S4 method for class 'formulasSet'
estimateIDConfidence(
obj,
absMzDev = defaultLim("mz", "medium"),
normalizeScores = "max",
IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"),
logPath = NULL
)
numericIDLevel(level)
genIDLevelRulesFile(out, inLevels = NULL, exLevels = NULL)The object for which identification confidence should be estimated.
Method specific arguments.
Maximum absolute m/z deviation.
A character that specifies how normalization of
annotation scorings occurs. Either
"max" (normalize to max value) or "minmax" (perform min-max
normalization). Note that normalization of negative scores (e.g. output by
SIRIUS) is always performed as min-max. Furthermore, currently
normalization for compounds takes the original min/max scoring
values into account when candidates were generated. Thus, for
compounds scoring, normalization is not affected when candidate
results were removed after they were generated (e.g. by use of
filter).
A file path to a YAML file with rules used for estimation of identification levels. See the
Suspect annotation section for more details. If not specified then a default rules file will be used.
A directory path to store logging information. If NULL then logging is disabled. NOTE:
To avoid slowdowns by logging for potentially large number of candidates, logging is disabled for the
formulas and compounds methods by default.
Annotation data (MSPeakLists, formulas and
compounds). All arguments can be NULL, but it is recommended to set them if possible to allow
the most complete estimations.
A named list with parameters that influence the calculation of MS spectra similarities.
See the spectral similarity parameters documentation for more details.
Which type(s) of MS/MS fragments from workflow data should be checked to evaluate the number of
suspect fragment matches (i.e. from the fragments_mz/fragments_formula columns in the suspect
list). Valid values are: "mz", "formula", "compounds". The former uses m/z values in
the specified MSPeakLists object, whereas the others use the formulae that were annotated to MS/MS peaks in
the given formulas or compounds objects. Multiple values are possible: in this case the maximum
number of fragment matches will be reported.
The identification level to be converted.
The file path to the target file.
A regular expression for the identification levels to include or exclude,
respectively. For instance, exLevels="4|5" would exclude level 4 and 5 from the output file. Set to
NULL to ignore.
estimateIDConfidence amends the input object with aforementioned identification confidence properties.
The estimateIDConfidence methods are used to estimate various properties to estimate the confidence of
identifications assigned to suspects and feature annotation candidates. These functions are typically executed after
running screenSuspects, generateFormulas and generateCompounds. Afterwards,
the following columns are added to the result tables (obtained with e.g. screenInfo,
annotations and as.data.table):
annSim The annotation similarity, defined as the similarity between the MS/MS peak list of a
feature with (a) only the peaks that were annotated and (b) all the peaks. Thus, a value of one means that all
MS/MS peaks were annotated. The similarity calculation is configured with the specSimParams argument to
estimateIDConfidence.
annSimForm The annotation similarity specifically for formula annotations (equaling the annSim
column from formula annotations). Only calculated for suspects and
compounds.
annSimBoth The annotation similarity calculated with the combined set of annotated MS/MS peaks from
formula and compound annotations. Only calculated for suspects and
compounds.
estIDLevel Provides an estimation of the identification level, roughly following that of
(Schymanski et al. 2014)
. However, please note that this value is only an estimation, and manual
interpretation is still necessary to assign final identification levels. The estimation is done through a set of
rules, see the Identification level rules section below.
In addition, the following columns are specifically added to suspect screening results:
annSimComp The annotation similarity specifically for compound annotations (this equals the
annSim column in compound annotations.
formRank,compRank The rank of the suspect within the formula/compound annotation results.
maxFrags The maximum number of MS/MS fragments that can be matched for this suspect (based on the
fragments_* columns from the suspect list).
maxFragMatches,maxFragMatchesRel The absolute and relative amount of experimental MS/MS peaks
that were matched from the fragments specified in the suspect list. The value for maxFragMatchesRel is
relative to the value for maxFrags. The calculation of this column is influenced by the
checkFragments argument to estimateIDConfidence.
The data for these columns is only calculated if estimateIDConfidence has the required data to do so. For
instance, annSimForm and formRank are only calculated if the formulas argument is set, and
levels for estIDLevel will be poor if no compound annotations are available.
numericIDLevel Extracts the numeric part of a given identification level (e.g. "3a"
becomes 3).
genIDLevelRulesFile Generates a template YAML file that is used to configure the rules for automatic
estimation of identification levels. This file can then be used as input for estimateIDConfidence.
The estimation of identification levels is configured through a YAML file which specifies the rules for each level. The default file is shown below.
1:
suspectFragments: 3
retention: 12
2a:
or:
- individualMoNAScore:
min: 0.9
higherThanNext: .inf
- libMatch:
min: 0.9
higherThanNext: .inf
rank:
max: 1
type: compound
3a:
or:
- individualMoNAScore: 0.7
- libMatch: 0.7
3b:
suspectFragments: 3
3c:
annMSMSSim:
type: compound
min: 0.7
4a:
annMSMSSim:
type: formula
min: 0.7
isoScore:
min: 0.5
higherThanNext: 0.2
rank:
max: 1
type: formula
4b:
isoScore:
min: 0.9
higherThanNext: 0.2
rank:
max: 1
type: formula
5:
all: yes
Most of the file should be self-explanatory. Some notes:
Each rule is either a field of suspectFragments (minimum number of MS/MS fragments matched from
suspect list), retention (maximum retention deviation from suspect list), rank (the maximum
annotation rank from formula or compound annotations), all (this level is always matched) or any of the
scorings available from the formula or compound annotations.
In case any of the rules could be applied to either formula or compound annotations, the annotation type must
be specified with the type field (formula or compound).
Identification levels should start with a number and may optionally be followed by a alphabetic character. The lowest levels are checked first.
If relative=yes then the relative scoring will be used for testing.
For suspectFragments: if the number of fragments from the suspect list (maxFrags column) is
less then the minimum rule value, the minimum is adjusted to the number of available fragments.
The or and and keywords can be used to combine multiple conditions.
Any conditions that require suspect data (e.g. suspectFragments) are only met with the suspects
method for estimateIDConfidence method.
A template rules file can be generated with the genIDLevelRulesFile function, and this file can
subsequently passed to estimateIDConfidence. The file format is highly flexible and (sub)levels can be added
or removed if desired. Note that the default file is currently only suitable when annotation is performed with
GenForm and MetFrag, for other algorithms it is crucial to modify the rules.
estimateIDConfidence performs its estimations per set. In addition, the
following overall (not set specific) columns are calculated:
formRank and compRank based on the ranking of the formula/compound in the set consensus data.
estIDLevel: based on the 'best' estimated identification level among the sets data (i.e. the
lowest). In case there is a tie between sub-levels (e.g. 3a and 3b), then the sub-level is
stripped (e.g. 3).
Annotation similarities: taken as the maximum value from the data for each set.
Schymanski EL, Jeon J, Gulde R, Fenner K, Ruff M, Singer HP, Hollender J (2014).
“Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence.”
Environmental Science and Technology, 48(4), 2097–2098.
doi:10.1021/es5002105
.
Stein SE, Scott DR (1994).
“Optimization and testing of mass spectral library search algorithms for compound identification.”
Journal of the American Society for Mass Spectrometry, 5(9), 859–866.
doi:10.1016/1044-0305(94)87009-8
.