Functions to predict toxicities from SMILES and/or SIRIUS+CSI:FingerID fingerprints using the MS2Tox package.

# S4 method for featureGroups
calculateTox(fGroups, featureAnn)

# S4 method for featureGroupsSet
calculateTox(fGroups, featureAnn)

# S4 method for compounds
predictTox(
  obj,
  LC50Mode = "static",
  concUnit = "ugL",
  updateScore = FALSE,
  scoreWeight = 1,
  parallel = TRUE
)

# S4 method for featureGroupsScreening
predictTox(obj, LC50Mode = "static", concUnit = "ugL")

# S4 method for featureGroupsScreening
calculateTox(fGroups, featureAnn = NULL)

# S4 method for featureGroupsScreeningSet
predictTox(obj, ...)

# S4 method for featureGroupsScreeningSet
calculateTox(fGroups, featureAnn = NULL)

# S4 method for compoundsSet
predictTox(obj, ...)

# S4 method for compoundsSIRIUS
predictTox(obj, type = "FP", LC50Mode = "static", concUnit = "ugL")

# S4 method for formulasSet
predictTox(obj, ...)

# S4 method for formulasSIRIUS
predictTox(obj, LC50Mode = "static", concUnit = "ugL")

Arguments

fGroups

For predictTox methods for feature annotations: The featureGroups object for which the annotations were performed.

For calculateTox: The featureGroups object for which toxicities should be assigned.

featureAnn

A featureAnnotations object (e.g. formulasSIRIUS or compounds) which contains toxicities. Optional if calculateTox is called on suspect screening results (i.e. featureGroupsScreening method).

obj

The workflow object for which predictions should be performed, e.g. feature groups with screening results (featureGroupsScreening) or compound annotations (compounds).

LC50Mode

The mode used for predictions: should be "static" or "flow".

concUnit

The concentration unit for calculated toxicities. Can be molar based ("nM", "uM", "mM", "M") or mass based ("ngL", "ugL", "mgL", "gL"). Furthermore, can be prefixed with "log " for logarithmic concentrations (e.g. "log mM").

updateScore, scoreWeight

If updateScore=TRUE then the annotation score column is updated by adding normalized values of the response factor (weighted by scoreWeight). Currently, this only makes sense for annotations performed with MetFrag!

parallel

If set to TRUE then code is executed in parallel through the futures package. Please see the parallelization section in the handbook for more details.

...

(sets workflow) Further arguments passed to the non-sets workflow method.

type

Which types of predictions should be performed: should be "FP" (SIRIUS+CSI:FingerID fingerprints), "SMILES" or "both". Only relevant for compoundsSIRIUS method.

Value

predictTox returns an object amended with LC 50 values (LC50_SMILES/LC50_SIRFP columns).

calculateTox returns a featureGroups based object amended with toxicity values for each feature group (accessed with the toxicities method).

Details

The MS2Tox R package predicts toxicities from SMILES and/or MS/MS fingerprints obtained with SIRIUS+CSI:FingerID. The predictTox method functions interface with this package to predict toxicities, which can then be assigned to feature groups with the calculateTox method function.

Note

The rcdk package and OpenBabel tool are used internally to calculate molecular weights. Please make sure that OpenBabel is installed.

Predicting toxicities

The toxicities are predicted with the predictTox generic functions, which accepts the following input:

  • Suspect screening results. The SMILES data is used to predict toxicities for suspect hits.

  • Formula annotation data obtained with "sirius" algorithm (generateFormulasSIRIUS). The predictions are performed for each formula candidate using SIRIUS+CSI:FingerID fingerprints. For this reason, the getFingerprint argument must be set to TRUE when generating the formula data.

  • Compound annotation data obtained with the "sirius" algorithm (generateCompoundsSIRIUS). The predictions are performed for each annotation candidate using its SMILES and/or SIRIUS+CSI:FingerID fingerprints. The predictions are performed on a per formula basis, hence, toxicities for isomers will be equal.

  • Compound annotation data obtained with algorithms other than "sirius". The toxicities are predicted from SMILES data.

When SMILES data is used then predictions of toxicities are generally more accurate. However, calculations with SIRIUS+CSI:FingerID fingerprints are faster and only require the formula and MS/MS spectrum, i.e. not the full structure. Hence, calculations with SMILES are mostly useful in suspect screening workflows, or with high confidence compound annotation data, whereas MS/MS fingerprints are suitable with unknowns.

For annotation data the calculations are performed for all candidates. This can especially lead to long running calculations when SMILES data is used. Hence, it is strongly recommended to first prioritize the annotation results, e.g. with the topMost argument to the filter method.

When toxicities are predicted from SIRIUS+CSI:FingerID fingerprints then only formula and MS/MS spectra are used, even if compound annotations are used for input. The major difference is that with formula annotation input all formula candidates for which a fingerprint could be generated are considered, whereas with compound annotations only candidate formulae are considered for which also a structure could be assigned. Hence, the formula annotation input could be more comprehensive, whereas predictions from structure annotations could lead to more representative results as only formulae are considered for which at least one structure could be assigned.

Assigning toxicities

The calculateTox generic function is used to assign toxicities for each feature using the toxicities discussed in the previous section. The function takes toxicities from suspect screening results and/or feature annotation data. If multiple toxicities were predicted for the same feature group, for instance when multiple annotation candidates or suspect hits for this feature group are present, then a toxicities is assigned for all toxicities. These values can later be easily aggregated with e.g. the as.data.table function.

References

OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33 .

Guha, R. (2007). 'Chemical Informatics Functionality in R'. Journal of Statistical Software 6(18)

Peets P, Wang W, MacLeod M, Breitholtz M, Martin JW, Kruve A (2022). “MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMS.” Environmental Science & Technology, 56(22), 15508-15517. doi:10.1021/acs.est.2c02536 , PMID: 36269851, https://doi.org/10.1021/acs.est.2c02536.