R/TP-biotransformer.R
generateTPsBioTransformer.Rd
Uses BioTransformer to predict TPs
generateTPsBioTransformer(
parents,
type = "env",
generations = 2,
maxExpGenerations = generations + 2,
extraOpts = NULL,
skipInvalid = TRUE,
prefCalcChemProps = TRUE,
neutralChemProps = FALSE,
neutralizeTPs = TRUE,
calcSims = FALSE,
fpType = "extended",
fpSimMethod = "tanimoto",
MP = FALSE
)
The parents for which transformation products should be obtained. This can be (1) a suspect list (see
suspect screening for more information), (2) the resulting output of
screenSuspects
or (3) a compounds
annotation object. In the former two cases, the
suspect (hits) are used as parents, whereas in the latter case all candidates are used as parents.
The type of prediction. Valid values are: "env"
, "ecbased"
, "cyp450"
,
"phaseII"
, "hgut"
, "superbio"
, "allHuman"
. Sets the -b
command line option.
The number of generations (steps) for the predictions. Sets the -s
command line option.
More generations may be reported, see the Hierarchy expansion
section below.
The maximum number of generations during hierarchy expansion, see below.
A character
with extra command line options passed to the biotransformer.jar
tool.
If set to TRUE
then the parents will be skipped (with a warning) for which insufficient
information (e.g. SMILES) is available.
If TRUE
then calculated chemical properties such as the formula and
InChIKey are preferred over what is already present in the parent suspect list. For efficiency reasons it is
recommended to set this to TRUE
. See the Validating and calculating chemical properties
section for
more details.
If TRUE
then the neutral form of the molecule is considered to calculate
SMILES, formulae etc. Enabling this may improve feature matching when considering common adducts
(e.g. [M+H]+
, [M-H]-
). See the Validating and calculating chemical properties
section
for more details.
If TRUE
then all resulting TP structure information is neutralized. This argument has a
similar meaning as neutralChemProps
. This is defaulted to TRUE
for prediction algorithms, as these
may output charged molecules. NOTE: if neutrlization results in duplicate TPs, i.e. when the
neutral form of the TP was also generated by the algorithm, then the neutralized TP will be removed.
If set to TRUE
then structural similarities between the parent and its TPs are calculated. A
minimum similarity can be obtained by using the filter
method. May be useful under the assumption that parents and TPs who have a high structural similarity, also likely
have a high MS/MS spectral similarity (which can be evaluated after componentization with
generateComponentsTPs
).
The type of structural fingerprint that should be calculated. See the type
argument of the
get.fingerprint
function of rcdk.
The method for calculating similarities (i.e. not dissimilarity!). See the method
argument
of the fp.sim.matrix
function of the fingerprint package.
If TRUE
then multiprocessing is enabled. Since BioTransformer
supports native
parallelization, additional multiprocessing generally doesn't lead to significant reduction in computational times.
Furthermore, enabling multiprocessing can lead to very high CPU/RAM usage.
The TPs are stored in an object derived from the transformationProductsStructure
class.
This function uses BioTransformer to obtain transformation products. This function is called when calling generateTPs
with
algorithm="biotransformer"
.
In order to use this function the .jar
command line utility should be installed and specified in the
patRoon.path.BioTransformer
option. The .jar
file can be obtained via
https://bitbucket.org/djoumbou/biotransformer/src/master. Alternatively, the patRoonExt package can be
installed to automatically install/configure the necessary files.
An important advantage of this algorithm is that it provides structural information for generated TPs. However, this also means that if the input is from a parent suspect list or screening then either SMILES or InChI information must be available for the parents.
When the parents
argument is a compounds
object, the candidate library identifier
is used in case the candidate has no defined compoundName
.
BioTransformer
only reports the direct parent for a TP, not
the complete pathway. For instance, consider the following results:
parent –> TP1
parent –> TP2
TP1 –> TP2
TP2 –> TP3
In this case, TP3 may be formed either as:
parent –> TP1 –> TP2 –> TP3
parent –> TP2 –> TP3
For this reason, patRoon simply expands the hierarchy and assumes that all routes are possible. For instance,
Parent
/- -\
/- -\
- -
TP1 TP2
| |
| |
TP2 TP3
|
|
TP3
Note that this may result in pathways with more generations than defined by the generations
argument. Thus,
the maxExpGenerations
argument is used to avoid excessive expansions.
Chemical properties such as SMILES, InChIKey and formula in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the –neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
generateTPsBioTransformer
uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
rcdk1
OBoyle2011patRoon
DjoumbouFeunang2019patRoon
Wicker2015patRoon
generateTPs
for more details and other algorithms.