R/generics.R, R/feature_groups-screening.R, R/feature_groups-screening-set.R
    featureGroupsScreening-class.RdThis class derives from featureGroups and adds suspect screening information.
screenInfo(obj)
annotateSuspects(
  fGroups,
  MSPeakLists = NULL,
  formulas = NULL,
  compounds = NULL,
  ...
)
# S4 method for class 'featureGroupsScreening'
screenInfo(obj)
# S4 method for class 'featureGroupsScreening'
show(object)
# S4 method for class 'featureGroupsScreening,ANY,ANY,missing'
x[i, j, ..., rGroups, suspects = NULL, drop = TRUE]
# S4 method for class 'featureGroupsScreening'
delete(obj, i = NULL, j = NULL, ...)
# S4 method for class 'featureGroupsScreening'
as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE)
# S4 method for class 'featureGroupsScreening'
annotateSuspects(
  fGroups,
  MSPeakLists,
  formulas,
  compounds,
  absMzDev = 0.005,
  specSimParams = getDefSpecSimParams(removePrecursor = TRUE),
  checkFragments = c("mz", "formula", "compound"),
  formulasNormalizeScores = "max",
  compoundsNormalizeScores = "max",
  IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"),
  logPath = file.path("log", "ident")
)
# S4 method for class 'featureGroupsScreening'
filter(
  obj,
  ...,
  onlyHits = NULL,
  selectHitsBy = NULL,
  selectBestFGroups = FALSE,
  maxLevel = NULL,
  maxFormRank = NULL,
  maxCompRank = NULL,
  minAnnSimForm = NULL,
  minAnnSimComp = NULL,
  minAnnSimBoth = NULL,
  absMinFragMatches = NULL,
  relMinFragMatches = NULL,
  minRF = NULL,
  maxLC50 = NULL,
  negate = FALSE
)
# S4 method for class 'featureGroupsScreeningSet'
screenInfo(obj)
# S4 method for class 'featureGroupsScreeningSet'
show(object)
# S4 method for class 'featureGroupsScreeningSet,ANY,ANY,missing'
x[i, j, ..., rGroups, suspects = NULL, sets = NULL, drop = TRUE]
# S4 method for class 'featureGroupsScreeningSet'
delete(obj, i = NULL, j = NULL, ...)
# S4 method for class 'featureGroupsScreeningSet'
as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE)
# S4 method for class 'featureGroupsScreeningSet'
annotateSuspects(
  fGroups,
  MSPeakLists,
  formulas,
  compounds,
  absMzDev = 0.005,
  specSimParams = getDefSpecSimParams(removePrecursor = TRUE),
  checkFragments = c("mz", "formula", "compound"),
  formulasNormalizeScores = "max",
  compoundsNormalizeScores = "max",
  IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"),
  logPath = file.path("log", "ident")
)
# S4 method for class 'featureGroupsScreeningSet'
filter(
  obj,
  ...,
  onlyHits = NULL,
  selectHitsBy = NULL,
  selectBestFGroups = FALSE,
  maxLevel = NULL,
  maxFormRank = NULL,
  maxCompRank = NULL,
  minAnnSimForm = NULL,
  minAnnSimComp = NULL,
  minAnnSimBoth = NULL,
  absMinFragMatches = NULL,
  relMinFragMatches = NULL,
  minRF = NULL,
  maxLC50 = NULL,
  negate = FALSE
)
# S4 method for class 'featureGroupsScreeningSet'
unset(obj, set)The featureGroupsScreening object.
Annotation data (MSPeakLists, formulas and
compounds) obtained for this featureGroupsScreening object. All arguments can be NULL
to exclude it from the annotation.
Further arguments passed to the base method.
Used for subsetting data analyses, feature groups and
replicate groups, see featureGroups.
An optional character vector with suspect names. If
specified, only featureGroups will be kept that are assigned to
these suspects.
Ignored.
If a character then any suspects that were matched to the same feature group are
collapsed to a single row and suspect names are separated by the value of collapseSuspects. If NULL
then no collapsing occurs, and each suspect match is reported on a single row. See the Suspect collapsing
section below for additional details.
For as.data.table: if TRUE then only feature groups with suspect hits are reported.
For filter
if negate=FALSE and onlyHits=TRUE then all feature groups without suspect hits will be removed.
  Otherwise nothing will be done.
if negate=TRUE then onlyHits=TRUE will select feature groups without suspect hits,
  onlyHits=FALSE will only retain feature groups with suspect matches and this filter is ignored if
  onlyHits=NULL.
Maximum absolute m/z deviation.
A named list with parameters that influence the calculation of MS spectra similarities.
See the spectral similarity parameters documentation for more details.
Which type(s) of MS/MS fragments from workflow data should be checked to evaluate the number of
suspect fragment matches (i.e. from the fragments_mz/fragments_formula columns in the suspect
list). Valid values are: "mz", "formula", "compounds". The former uses m/z values in
the specified MSPeakLists object, whereas the others use the formulae that were annotated to MS/MS peaks in
the given formulas or compounds objects. Multiple values are possible: in this case the maximum
number of fragment matches will be reported.
A character that specifies how normalization of
annotation scorings occurs. Either
"max" (normalize to max value) or "minmax" (perform min-max
normalization). Note that normalization of negative scores (e.g. output by
SIRIUS) is always performed as min-max. Furthermore, currently
normalization for compounds takes the original min/max scoring
values into account when candidates were generated. Thus, for
compounds scoring, normalization is not affected when candidate
results were removed after they were generated (e.g. by use of
filter).
A file path to a YAML file with rules used for estimation of identification levels. See the
Suspect annotation section for more details. If not specified then a default rules file will be used.
A directory path to store logging information. If NULL then logging is disabled.
Should be "intensity" or "level". For cases where the same suspect is matched to
multiple feature groups, only the suspect to the feature group with highest mean intensity
(selectHitsBy="intensity") or best identification level (selectHitsBy="level") is kept. In case of
ties only the first hit is kept. Set to NULL to ignore this filter. If negate=TRUE then only those
hits with lowest mean intensity/poorest identification level are kept.
If TRUE then for any cases where a single feature group is matched to several
suspects only the suspect assigned to the feature group with best identification score is kept. In case of ties
only the first is kept.
Filter suspects by maximum
identification level (e.g. "3a"), formula/compound rank or with minimum formula/compound/combined
annotation similarity. Set to NULL to ignore.
Only retain suspects with this minimum number MS/MS matches with the
fragments specified in the suspect list (i.e. fragments_mz/fragments_formula).
relMinFragMatches sets the minimum that is relative (0-1) to the maximum number of MS/MS fragments
specified in the fragments_* columns of the suspect list. Set to NULL to ignore.
Filter suspect hits by the given minimum predicted response factor (as calculated by
predictRespFactors). Set to NULL to ignore.
Filter suspect hits by the given maximum toxicity (LC50) (as calculated by
predictTox). Set to NULL to ignore.
If set to TRUE then filtering operations are performed in opposite manner.
(sets workflow) A character with name(s) of the sets to keep (or remove if negate=TRUE).
(sets workflow) The name of the set.
annotateSuspects returns a featureGroupsScreening object, which is a
  featureGroups object amended with annotation data.
filter returns a filtered featureGroupsScreening object.
screenInfo(featureGroupsScreening): Returns a table with screening information
(see screenInfo slot).
show(featureGroupsScreening): Shows summary information for this object.
x[i: Subset on analyses, feature groups and/or
suspects.
as.data.table(featureGroupsScreening): Obtain a summary table (a data.table) with retention, m/z,
intensity and optionally other feature data. Furthermore, the output table will be merged with information from
screenInfo, such as suspect names and other properties and annotation data.
annotateSuspects(featureGroupsScreening): Incorporates annotation data obtained during the workflow to annotate suspects
with matched known MS/MS fragments, formula/candidate ranks and automatic estimation of identification levels. See
the Suspect annotation section for more details. The estimation of identification levels for each suspect is
logged in the log/ident directory.
filter(featureGroupsScreening): Performs rule based filtering. This method builds on the comprehensive filter
functionality from the base filter,featureGroups-method. It adds several filters to select
e.g. the best ranked suspects or those with a minimum estimated identification level. NOTE: most
filters only affect suspect hits, not feature groups. Set onlyHits=TRUE to subsequently remove any
feature groups that lost any suspect matches due to other filter steps.
screenInfoA (data.table) with results from suspect screening. This table will be amended with
annotation data when annotateSuspects is run.
MS2QuantMetaMetadata from MS2Quant filled in by predictRespFactors.
filter removes suspect hits with NA values when any of the filters related to minimum or maximum
  values are applied (unless negate=TRUE).
The annotateSuspects method is used to annotate suspects after
  screenSuspects was used to collect suspect screening results and other workflow steps such as formula
  and compound annotation steps have been completed. The annotation results, which can be acquired with the
  as.data.table and screenInfo methods, amends the current screening data with the following columns:
formRank,compRank The rank of the suspect within the formula/compound annotation results.
annSimForm,annSimComp,annSimBoth A similarity measure between measured and annotated
  MS/MS peaks from annotation of formulae, compounds or both. The similarity is calculated as the spectral similarity
  between a peaklist with (a) all MS/MS peaks and (b) only annotated peaks. Thus, a value of one means that all MS/MS
  peaks were annotated. If both formula and compound annotations are available then annSimBoth is calculated
  after combining all the annotated peaks, otherwise annSimBoth equals the available value for
  annSimForm or annSimComp. The similarity calculation can be configured with the specSimParams
  argument to annotateSuspects. Note for annotation with generateCompoundsLibrary results: the method
  and default parameters for annSimComp calculation slightly differs to those from the spectral similarity
  calculated with compound annotation (libMatch score), hence small differences in results are typically
  observed.
maxFrags The maximum number of MS/MS fragments that can be matched for this suspect (based on the
  fragments_* columns from the suspect list).
maxFragMatches,maxFragMatchesRel The absolute and relative amount of experimental MS/MS peaks
  that were matched from the fragments specified in the suspect list. The value for maxFragMatchesRel is
  relative to the value for maxFrags. The calculation of this column is influenced by the
  checkFragments argument to annotateSuspects.
estIDLevel Provides an estimation of the identification level, roughly following that of
  (Schymanski et al. 2014)
. However, please note that this value is only an estimation, and manual
  interpretation is still necessary to assign final identification levels. The estimation is done through a set of
  rules, see the Identification level rules section below.
Note that only columns are present if sufficient data is available for their calculation.
The estimation of identification levels is configured through a YAML file which specifies the rules for each level. The default file is shown below.
1:
    suspectFragments: 3
    retention: 12
2a:
    or:
        - individualMoNAScore:
            min: 0.9
            higherThanNext: .inf
        - libMatch:
            min: 0.9
            higherThanNext: .inf
    rank:
        max: 1
        type: compound
3a:
    or:
        - individualMoNAScore: 0.4
        - libMatch: 0.4
3b:
    suspectFragments: 3
3c:
    annMSMSSim:
        type: compound
        min: 0.7
4a:
    annMSMSSim:
        type: formula
        min: 0.7
    isoScore:
        min: 0.5
        higherThanNext: 0.2
    rank:
        max: 1
        type: formula
4b:
    isoScore:
        min: 0.9
        higherThanNext: 0.2
    rank:
        max: 1
        type: formula
5:
    all: yes
Most of the file should be self-explanatory. Some notes:
Each rule is either a field of suspectFragments (minimum number of MS/MS fragments matched from
  suspect list), retention (maximum retention deviation from suspect list), rank (the maximum
  annotation rank from formula or compound annotations), all (this level is always matched) or any of the
  scorings available from the formula or compound annotations.
In case any of the rules could be applied to either formula or compound annotations, the annotation type must
  be specified with the type field (formula or compound).
Identification levels should start with a number and may optionally be followed by a alphabetic character. The lowest levels are checked first.
If relative=yes then the relative scoring will be used for testing.
For suspectFragments: if the number of fragments from the suspect list (maxFrags column) is
  less then the minimum rule value, the minimum is adjusted to the number of available fragments.
The or and and keywords can be used to combine multiple conditions.
A template rules file can be generated with the genIDLevelRulesFile function, and this file can
  subsequently passed to annotateSuspects. The file format is highly flexible and (sub)levels can be added or
  removed if desired. Note that the default file is currently only suitable when annotation is performed with GenForm
  and MetFrag, for other algorithms it is crucial to modify the rules.
featureGroupsScreening
featureGroupsSetScreeningUnset
The as.data.table method fir featureGroupsScreening supports an
  additional format where each suspect hit is reported on a separate row (enabled by setting
  collapseSuspects=NULL). In this format the suspect
  properties from the screenInfo method are merged with each suspect row. Alternatively, if suspect
  collapsing is enabled (the default) then the regular as.data.table format is used, and amended with the
  names of all suspects matched to a feature group (separated by the value of the collapseSuspects argument).
Suspect collapsing also influences how calculated feature concentrations/toxicities are reported (i.e.
  obtained with calculateConcs/calculateTox). If these values were directly predicted for
  suspects, i.e. by using predictRespFactors/predictTox on the feature groups
  object, and suspects are not collapsed, then the calculated concentration/toxicity reported for each
  suspect row is not aggregated and specific for that suspect (unless not available). Hence, this allows you to
  obtain specific concentration/toxicity values for each suspect/feature group pair.
The featureGroupsScreeningSet class is applicable for sets workflows. This class is derived from featureGroupsScreening and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
unset Converts the object data for a specified set into a 'non-set' object (featureGroupsScreeningUnset), which allows it to be used in 'regular' workflows. Only the screening results present in the specified set are kept.
The following methods are changed or with new functionality:
annotateSuspects Suspect annotation is performed per set. Thus, formula/compound ranks, estimated
  identification levels etc are calculated for each set. Subsequently, these results are merged in the final
  screenInfo. In addition, an overall formRank and compRank column is created based on the
  rankings of the suspect candidate in the set consensus data. Furthermore, an overall estIDLevel is generated
  that is based on the 'best' estimated identification level among the sets data (i.e. the lowest). In case
  there is a tie between sub-levels (e.g. 3a and 3b), then the sub-level is stripped
  (e.g. 3).
filter All filters related to estimated identification levels and formula/compound rankings  are
  applied to the overall set data (see above). All others are applied to set specific data: in this case candidates
  are only removed if none of the set data confirms to the filter.
This class derives also from featureGroupsSet. Please see its documentation for more relevant details
  with sets workflows.
Note that the formRank and compRank columns are not updated when the data is subset.
Schymanski EL, Jeon J, Gulde R, Fenner K, Ruff M, Singer HP, Hollender J (2014).
“Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence.”
Environmental Science and Technology, 48(4), 2097–2098.
doi:10.1021/es5002105
.
 
 Stein SE, Scott DR (1994).
“Optimization and testing of mass spectral library search algorithms for compound identification.”
Journal of the American Society for Mass Spectrometry, 5(9), 859–866.
doi:10.1016/1044-0305(94)87009-8
.