generateTPsCTS(
  parents,
  transLibrary,
  generations = 1,
  errorRetries = 3,
  skipInvalid = TRUE,
  prefCalcChemProps = TRUE,
  neutralChemProps = FALSE,
  neutralizeTPs = TRUE,
  calcLogP = "rcdk",
  calcSims = FALSE,
  fpType = "extended",
  fpSimMethod = "tanimoto",
  parallel = TRUE
)

Arguments

parents

The parents for which transformation products should be obtained. This can be (1) a suspect list (see suspect screening for more information), (2) the resulting output of screenSuspects or (3) a compounds annotation object. In the former two cases, the suspect (hits) are used as parents, whereas in the latter case all candidates are used as parents.

transLibrary

A character specifying which transformation library should be used. Currently supported are: "hydrolysis", "abiotic_reduction", "photolysis_unranked", "photolysis_ranked", "mammalian_metabolism", "combined_abioticreduction_hydrolysis", "combined_photolysis_abiotic_hydrolysis", "pfas_environmental", "pfas_metabolism".

generations

An integer that specifies the number of transformation generations to predict.

errorRetries

The maximum number of connection retries. Sets the times argument to the http::RETRY function.

skipInvalid

If set to TRUE then the parents will be skipped (with a warning) for which insufficient information (e.g. SMILES) is available.

prefCalcChemProps

If TRUE then calculated chemical properties such as the formula and InChIKey are preferred over what is already present in the parent suspect list. For efficiency reasons it is recommended to set this to TRUE. See the Validating and calculating chemical properties section for more details.

neutralChemProps

If TRUE then the neutral form of the molecule is considered to calculate SMILES, formulae etc. Enabling this may improve feature matching when considering common adducts (e.g. [M+H]+, [M-H]-). See the Validating and calculating chemical properties section for more details.

neutralizeTPs

If TRUE then all resulting TP structure information is neutralized. This argument has a similar meaning as neutralChemProps. This is defaulted to TRUE for prediction algorithms, as these may output charged molecules. NOTE: if neutrlization results in duplicate TPs, i.e. when the neutral form of the TP was also generated by the algorithm, then the neutralized TP will be removed.

calcLogP

A character specifying whether Log P values should be calculated with rcdk::get.xlogp (calcLogP="rcdk"), OpenBabel (calcLogP="obabel") or not at all (calcLogP="none"). The log P values will be calculated of parent and TPs to predict their retention order (retDir).

calcSims

If set to TRUE then structural similarities between the parent and its TPs are calculated. A minimum similarity can be obtained by using the filter method. May be useful under the assumption that parents and TPs who have a high structural similarity, also likely have a high MS/MS spectral similarity (which can be evaluated after componentization with generateComponentsTPs).

fpType

The type of structural fingerprint that should be calculated. See the type argument of the get.fingerprint function of rcdk.

fpSimMethod

The method for calculating similarities (i.e. not dissimilarity!). See the method argument of the fp.sim.matrix function of the fingerprint package.

parallel

If set to TRUE then code is executed in parallel through the futures package. Please see the parallelization section in the handbook for more details.

Value

The TPs are stored in an object derived from the transformationProductsStructure class.

Details

This function uses CTS to obtain transformation products. This function is called when calling generateTPs with algorithm="cts".

This function uses the httr package to access the Web API of CTS for automatic TP prediction. Hence, an Internet connection is mandatory. Please take care to not 'abuse' the CTS servers, e.g. by running very large batch calculations in parallel, as this may result in rejected connections.

An important advantage of this algorithm is that it provides structural information for generated TPs. However, this also means that if the input is from a parent suspect list or screening then either SMILES or InChI information must be available for the parents.

Note

When the parents argument is a compounds object, the candidate library identifier is used in case the candidate has no defined compoundName.

Validating and calculating chemical properties

Chemical properties such as SMILES, InChIKey and formula in the parent suspect list are automatically validated and calculated if missing/invalid.

The internal validation/calculation process performs the following steps:

  • Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid entries will be set to NA.

  • If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by (de-)protonation (using the –neutralized option of OpenBabel). An additional column molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires either SMILES or InChI data to be available.

  • The SMILES and InChI data are used to calculate missing or invalid SMILES, InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing InChIKey and formula data is overwritten by calculated values whenever possible.

  • The chemical formulae which were not calculated are verified and normalized. This process may be time consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.

  • Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible (prefCalcChemProps=TRUE).

Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.

This functionality relies heavily on OpenBabel, please make sure it is installed.

References

rcdk1

OBoyle2011patRoon

Wolfe2016patRoon

TebesStevens2017patRoon

Yuan2020patRoon

Yuan2021patRoon

OBoyle2011patRoon

See also

generateTPs for more details and other algorithms.

The website: https://qed.epa.gov/cts/ and the CTS User guide.