Class for suspect screened feature groups.

This class derives from featureGroups and adds suspect screening information.

screenInfo(obj)

annotateSuspects(
  fGroups,
  MSPeakLists = NULL,
  formulas = NULL,
  compounds = NULL,
  ...
)

# S4 method for class 'featureGroupsScreening'
screenInfo(obj)

# S4 method for class 'featureGroupsScreening'
show(object)

# S4 method for class 'featureGroupsScreening,ANY,ANY,missing'
x[i, j, ..., rGroups, suspects = NULL, drop = TRUE]

# S4 method for class 'featureGroupsScreening'
delete(obj, i = NULL, j = NULL, ...)

# S4 method for class 'featureGroupsScreening'
as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE)

# S4 method for class 'featureGroupsScreening'
annotateSuspects(
  fGroups,
  MSPeakLists,
  formulas,
  compounds,
  absMzDev = 0.005,
  specSimParams = getDefSpecSimParams(removePrecursor = TRUE),
  checkFragments = c("mz", "formula", "compound"),
  formulasNormalizeScores = "max",
  compoundsNormalizeScores = "max",
  IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"),
  logPath = file.path("log", "ident")
)

# S4 method for class 'featureGroupsScreening'
filter(
  obj,
  ...,
  onlyHits = NULL,
  selectHitsBy = NULL,
  selectBestFGroups = FALSE,
  maxLevel = NULL,
  maxFormRank = NULL,
  maxCompRank = NULL,
  minAnnSimForm = NULL,
  minAnnSimComp = NULL,
  minAnnSimBoth = NULL,
  absMinFragMatches = NULL,
  relMinFragMatches = NULL,
  minRF = NULL,
  maxLC50 = NULL,
  negate = FALSE
)

# S4 method for class 'featureGroupsScreeningSet'
screenInfo(obj)

# S4 method for class 'featureGroupsScreeningSet'
show(object)

# S4 method for class 'featureGroupsScreeningSet,ANY,ANY,missing'
x[i, j, ..., rGroups, suspects = NULL, sets = NULL, drop = TRUE]

# S4 method for class 'featureGroupsScreeningSet'
delete(obj, i = NULL, j = NULL, ...)

# S4 method for class 'featureGroupsScreeningSet'
as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE)

# S4 method for class 'featureGroupsScreeningSet'
annotateSuspects(
  fGroups,
  MSPeakLists,
  formulas,
  compounds,
  absMzDev = 0.005,
  specSimParams = getDefSpecSimParams(removePrecursor = TRUE),
  checkFragments = c("mz", "formula", "compound"),
  formulasNormalizeScores = "max",
  compoundsNormalizeScores = "max",
  IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"),
  logPath = file.path("log", "ident")
)

# S4 method for class 'featureGroupsScreeningSet'
filter(
  obj,
  ...,
  onlyHits = NULL,
  selectHitsBy = NULL,
  selectBestFGroups = FALSE,
  maxLevel = NULL,
  maxFormRank = NULL,
  maxCompRank = NULL,
  minAnnSimForm = NULL,
  minAnnSimComp = NULL,
  minAnnSimBoth = NULL,
  absMinFragMatches = NULL,
  relMinFragMatches = NULL,
  minRF = NULL,
  maxLC50 = NULL,
  negate = FALSE
)

# S4 method for class 'featureGroupsScreeningSet'
unset(obj, set)

Arguments

obj, object, x, fGroups

The featureGroupsScreening object.

MSPeakLists, formulas, compounds

Annotation data (MSPeakLists, formulas and compounds) obtained for this featureGroupsScreening object. All arguments can be NULL to exclude it from the annotation.

...

Further arguments passed to the base method.

i, j, rGroups

Used for subsetting data analyses, feature groups and replicate groups, see featureGroups.

suspects

An optional character vector with suspect names. If specified, only featureGroups will be kept that are assigned to these suspects.

drop

Ignored.

collapseSuspects

If a character then any suspects that were matched to the same feature group are collapsed to a single row and suspect names are separated by the value of collapseSuspects. If NULL then no collapsing occurs, and each suspect match is reported on a single row. See the Suspect collapsing section below for additional details.

onlyHits

For as.data.table: if TRUE then only feature groups with suspect hits are reported.

For filter

if negate=FALSE and onlyHits=TRUE then all feature groups without suspect hits will be removed. Otherwise nothing will be done.
if negate=TRUE then onlyHits=TRUE will select feature groups without suspect hits, onlyHits=FALSE will only retain feature groups with suspect matches and this filter is ignored if onlyHits=NULL.

absMzDev

Maximum absolute m/z deviation.

specSimParams

A named list with parameters that influence the calculation of MS spectra similarities. See the spectral similarity parameters documentation for more details.

checkFragments

Which type(s) of MS/MS fragments from workflow data should be checked to evaluate the number of suspect fragment matches (i.e. from the fragments_mz/fragments_formula columns in the suspect list). Valid values are: "mz", "formula", "compounds". The former uses m/z values in the specified MSPeakLists object, whereas the others use the formulae that were annotated to MS/MS peaks in the given formulas or compounds objects. Multiple values are possible: in this case the maximum number of fragment matches will be reported.

compoundsNormalizeScores, formulasNormalizeScores

A character that specifies how normalization of annotation scorings occurs. Either

"max" (normalize to max value) or "minmax" (perform min-max normalization). Note that normalization of negative scores (e.g. output by SIRIUS) is always performed as min-max. Furthermore, currently normalization for compounds takes the original min/max scoring values into account when candidates were generated. Thus, for compounds scoring, normalization is not affected when candidate results were removed after they were generated (e.g. by use of filter).

IDFile

A file path to a YAML file with rules used for estimation of identification levels. See the Suspect annotation section for more details. If not specified then a default rules file will be used.

logPath

A directory path to store logging information. If NULL then logging is disabled.

selectHitsBy

Should be "intensity" or "level". For cases where the same suspect is matched to multiple feature groups, only the suspect to the feature group with highest mean intensity (selectHitsBy="intensity") or best identification level (selectHitsBy="level") is kept. In case of ties only the first hit is kept. Set to NULL to ignore this filter. If negate=TRUE then only those hits with lowest mean intensity/poorest identification level are kept.

selectBestFGroups

If TRUE then for any cases where a single feature group is matched to several suspects only the suspect assigned to the feature group with best identification score is kept. In case of ties only the first is kept.

maxLevel, maxFormRank, maxCompRank, minAnnSimForm, minAnnSimComp, minAnnSimBoth

Filter suspects by maximum identification level (e.g. "3a"), formula/compound rank or with minimum formula/compound/combined annotation similarity. Set to NULL to ignore.

absMinFragMatches, relMinFragMatches

Only retain suspects with this minimum number MS/MS matches with the fragments specified in the suspect list (i.e. fragments_mz/fragments_formula). relMinFragMatches sets the minimum that is relative (0-1) to the maximum number of MS/MS fragments specified in the fragments_* columns of the suspect list. Set to NULL to ignore.

minRF

Filter suspect hits by the given minimum predicted response factor (as calculated by predictRespFactors). Set to NULL to ignore.

maxLC50

Filter suspect hits by the given maximum toxicity (LC50) (as calculated by predictTox). Set to NULL to ignore.

negate

If set to TRUE then filtering operations are performed in opposite manner.

sets

(sets workflow) A character with name(s) of the sets to keep (or remove if negate=TRUE).

set

(sets workflow) The name of the set.

Value

annotateSuspects returns a featureGroupsScreening object, which is a featureGroups object amended with annotation data.

filter returns a filtered featureGroupsScreening object.

Methods (by generic)

screenInfo(featureGroupsScreening): Returns a table with screening information (see screenInfo slot).
show(featureGroupsScreening): Shows summary information for this object.
x[i: Subset on analyses, feature groups and/or suspects.
as.data.table(featureGroupsScreening): Obtain a summary table (a data.table) with retention, m/z, intensity and optionally other feature data. Furthermore, the output table will be merged with information from screenInfo, such as suspect names and other properties and annotation data.
annotateSuspects(featureGroupsScreening): Incorporates annotation data obtained during the workflow to annotate suspects with matched known MS/MS fragments, formula/candidate ranks and automatic estimation of identification levels. See the Suspect annotation section for more details. The estimation of identification levels for each suspect is logged in the log/ident directory.
filter(featureGroupsScreening): Performs rule based filtering. This method builds on the comprehensive filter functionality from the base filter,featureGroups-method. It adds several filters to select e.g. the best ranked suspects or those with a minimum estimated identification level. NOTE: most filters only affect suspect hits, not feature groups. Set onlyHits=TRUE to subsequently remove any feature groups that lost any suspect matches due to other filter steps.

Slots

screenInfo: A (data.table) with results from suspect screening. This table will be amended with annotation data when annotateSuspects is run.
MS2QuantMeta: Metadata from MS2Quant filled in by predictRespFactors.

Note

filter removes suspect hits with NA values when any of the filters related to minimum or maximum values are applied (unless negate=TRUE).

Suspect annotation

The annotateSuspects method is used to annotate suspects after screenSuspects was used to collect suspect screening results and other workflow steps such as formula and compound annotation steps have been completed. The annotation results, which can be acquired with the as.data.table and screenInfo methods, amends the current screening data with the following columns:

formRank,compRank The rank of the suspect within the formula/compound annotation results.
annSimForm,annSimComp,annSimBoth A similarity measure between measured and annotated MS/MS peaks from annotation of formulae, compounds or both. The similarity is calculated as the spectral similarity between a peaklist with (a) all MS/MS peaks and (b) only annotated peaks. Thus, a value of one means that all MS/MS peaks were annotated. If both formula and compound annotations are available then annSimBoth is calculated after combining all the annotated peaks, otherwise annSimBoth equals the available value for annSimForm or annSimComp. The similarity calculation can be configured with the specSimParams argument to annotateSuspects. Note for annotation with generateCompoundsLibrary results: the method and default parameters for annSimComp calculation slightly differs to those from the spectral similarity calculated with compound annotation (libMatch score), hence small differences in results are typically observed.
maxFrags The maximum number of MS/MS fragments that can be matched for this suspect (based on the fragments_* columns from the suspect list).
maxFragMatches,maxFragMatchesRel The absolute and relative amount of experimental MS/MS peaks that were matched from the fragments specified in the suspect list. The value for maxFragMatchesRel is relative to the value for maxFrags. The calculation of this column is influenced by the checkFragments argument to annotateSuspects.
estIDLevel Provides an estimation of the identification level, roughly following that of (Schymanski et al. 2014) . However, please note that this value is only an estimation, and manual interpretation is still necessary to assign final identification levels. The estimation is done through a set of rules, see the Identification level rules section below.

Note that only columns are present if sufficient data is available for their calculation.

Identification level rules

The estimation of identification levels is configured through a YAML file which specifies the rules for each level. The default file is shown below.

1:
    suspectFragments: 3
    retention: 12
2a:
    or:
        - individualMoNAScore:
            min: 0.9
            higherThanNext: .inf
        - libMatch:
            min: 0.9
            higherThanNext: .inf
    rank:
        max: 1
        type: compound
3a:
    or:
        - individualMoNAScore: 0.4
        - libMatch: 0.4
3b:
    suspectFragments: 3
3c:
    annMSMSSim:
        type: compound
        min: 0.7
4a:
    annMSMSSim:
        type: formula
        min: 0.7
    isoScore:
        min: 0.5
        higherThanNext: 0.2
    rank:
        max: 1
        type: formula
4b:
    isoScore:
        min: 0.9
        higherThanNext: 0.2
    rank:
        max: 1
        type: formula
5:
    all: yes

Most of the file should be self-explanatory. Some notes:

Each rule is either a field of suspectFragments (minimum number of MS/MS fragments matched from suspect list), retention (maximum retention deviation from suspect list), rank (the maximum annotation rank from formula or compound annotations), all (this level is always matched) or any of the scorings available from the formula or compound annotations.
In case any of the rules could be applied to either formula or compound annotations, the annotation type must be specified with the type field (formula or compound).
Identification levels should start with a number and may optionally be followed by a alphabetic character. The lowest levels are checked first.
If relative=yes then the relative scoring will be used for testing.
For suspectFragments: if the number of fragments from the suspect list (maxFrags column) is less then the minimum rule value, the minimum is adjusted to the number of available fragments.
The or and and keywords can be used to combine multiple conditions.

A template rules file can be generated with the genIDLevelRulesFile function, and this file can subsequently passed to annotateSuspects. The file format is highly flexible and (sub)levels can be added or removed if desired. Note that the default file is currently only suitable when annotation is performed with GenForm and MetFrag, for other algorithms it is crucial to modify the rules.

S4 class hierarchy

featureGroups
- featureGroupsScreening
  - featureGroupsSetScreeningUnset

Suspect collapsing

The as.data.table method fir featureGroupsScreening supports an additional format where each suspect hit is reported on a separate row (enabled by setting collapseSuspects=NULL). In this format the suspect properties from the screenInfo method are merged with each suspect row. Alternatively, if suspect collapsing is enabled (the default) then the regular as.data.table format is used, and amended with the names of all suspects matched to a feature group (separated by the value of the collapseSuspects argument).

Suspect collapsing also influences how calculated feature concentrations/toxicities are reported (i.e. obtained with calculateConcs/calculateTox). If these values were directly predicted for suspects, i.e. by using predictRespFactors/predictTox on the feature groups object, and suspects are not collapsed, then the calculated concentration/toxicity reported for each suspect row is not aggregated and specific for that suspect (unless not available). Hence, this allows you to obtain specific concentration/toxicity values for each suspect/feature group pair.

Sets workflows

The featureGroupsScreeningSet class is applicable for sets workflows. This class is derived from featureGroupsScreening and therefore largely follows the same user interface.

The following methods are specifically defined for sets workflows:

unset Converts the object data for a specified set into a 'non-set' object (featureGroupsScreeningUnset), which allows it to be used in 'regular' workflows. Only the screening results present in the specified set are kept.

The following methods are changed or with new functionality:

annotateSuspects Suspect annotation is performed per set. Thus, formula/compound ranks, estimated identification levels etc are calculated for each set. Subsequently, these results are merged in the final screenInfo. In addition, an overall formRank and compRank column is created based on the rankings of the suspect candidate in the set consensus data. Furthermore, an overall estIDLevel is generated that is based on the 'best' estimated identification level among the sets data (i.e. the lowest). In case there is a tie between sub-levels (e.g. 3a and 3b), then the sub-level is stripped (e.g. 3).
filter All filters related to estimated identification levels and formula/compound rankings are applied to the overall set data (see above). All others are applied to set specific data: in this case candidates are only removed if none of the set data confirms to the filter.

This class derives also from featureGroupsSet. Please see its documentation for more relevant details with sets workflows.

Note that the formRank and compRank columns are not updated when the data is subset.

References

Schymanski EL, Jeon J, Gulde R, Fenner K, Ruff M, Singer HP, Hollender J (2014). “Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence.” Environmental Science and Technology, 48(4), 2097–2098. doi:10.1021/es5002105 .

Stein SE, Scott DR (1994). “Optimization and testing of mass spectral library search algorithms for compound identification.” Journal of the American Society for Mass Spectrometry, 5(9), 859–866. doi:10.1016/1044-0305(94)87009-8 .

Author

Rick Helmus <r.helmus@uva.nl>, Emma Schymanski <emma.schymanski@uni.lu> (contributions to identification level rules), Bas van de Velde (contributions to spectral similarity calculation).