Adds calculated mobility and/or CCS data to a suspect list.
# S4 method for class 'data.table'
assignMobilities(
obj,
from = NULL,
matchFromBy = "InChIKey1",
overwrite = FALSE,
adducts = c("[M+H]+", "[M-H]-", NA),
predictAdductOnly = TRUE,
CCSParams = NULL,
prepareChemProps = TRUE,
prefCalcChemProps = TRUE,
neutralChemProps = FALSE,
virtualenv = "patRoon-C3SDB"
)
# S4 method for class 'data.frame'
assignMobilities(obj, ...)The suspect list to which the mobility and/or CCS data should be added. Should be a data.frame or
data.table.
Specifies from where IMS data is added to the suspect list. This can be the following:
"pubchemlite": CCS data is matched from predicted values of the
PubChemLite database. Note: this requires a local copy of the
CCS amended PubChemLite database (see the Handbook for more details),
which is automatically installed by patRoonExt.
"c3sdb": Uses the C3SDB Python package to
predict CCS values. This requires a local installation of C3SDB, e.g. performed with
installC3SDB.
A data.table or data.frame to which IMS data is matched. Should contain the column defined by
matchFromBy and columns storing (non-)adduct specific mobility/CCS columns (see Details).
NULL: No IMS data is added to the suspect list.
Any NA values in from are ignored.
Which column should be used to match the IMS data from from and suspects. Valid options are
"InChIKey", "InChIKey1" (first block InChIKey), "InChI", "SMILES", "name".
However, this also depends on which columns are available in either of the data sources. InChIKey1 values
are automatically calculated from InChIKeys, if possible.
Matching by InChiKey1 is recommended by default to allow tolerance between different datasources. Note that
most compound annotation algorithms also match by InChIKey1 and current IMS resolving power is generally
insufficient to distinguish the different stereoisomers/tautomers specified by the full InChIKey.
Set to TRUE to overwrite any existing suspect IMS data with data from from.
A character with the adduct(s) to consider for assigning mobility data to suspects and mobility
<–> CCS conversions. This may be limited by what is available in the data source specified by
from (see C3SDBAdducts for from="c3sdb"). Inclusion of NA in adducts
allows the use of non-adduct specific values (see Details).
The value for adducts is automatically expanded by the adducts specified in the adduct column of the
suspect list. Hence, adducts can be empty (character()) if no calculations for other adducts are
desired.
If from="c3sdb" and predictAdductOnly=TRUE then only predictions are performed
for the adduct specified in the adduct column in the suspect list (if present).
A list with parameters for mobility <–> CCS conversion. See
getCCSParams for details and to make such parameter lists. Set to NULL to skip conversions.
Set to TRUE to perform chemical property calculation and validation on the suspect
list (described below).
If TRUE then calculated chemical properties such as the formula and
InChIKey are preferred over what is already present in the suspect data (if prepareChemProps=TRUE) and from data (if a table). For efficiency reasons it is
recommended to set this to TRUE. See the Validating and calculating chemical properties section for
more details.
If TRUE then the neutral form of the molecule is considered to calculate
SMILES, formulae etc. Enabling this may improve feature matching when considering common adducts
(e.g. [M+H]+, [M-H]-). See the Validating and calculating chemical properties section
for more details.
The virtual Python environment in which C3SDB is installed. This is passed to
reticulate::use_virtualenv. Set to NULL to skip this and not setup
the environment.
Arguments passed to data.table method.
The assignMobilities method for suspect lists is used to (1) add IMS data to suspects from predictions or
library data and (2) convert (previously added) mobility <–> CCS values. These steps are controlled by the
from and CCSParams arguments, respectively.
Mobility and CCS values assigned in the suspect list are either adduct specific or not. Adduct specific
values are preferred, as the 'correct' value can be automatically selected during suspect screening based on the
adduct assigned to the feature (or passed as the adduct argument to screenSuspects). The
non-adduct specific values are typically used when the corresponding adduct for the mobility/CCS value is
unknown (or not of interest). These values get precedence over adduct specific values. The adduct specific values are
stored in mobility_<adduct> and CCS_<adduct> columns, where <adduct> is the adduct name
(e.g. [M+H]+, [M-H]-). The mobility and CCS columns store any non-adduct
specific values. The adducts argument ultimately defines the use of adduct and non-adduct specific values.
The mobility <–> CCS conversions occur both ways, i.e. missing CCS values will be
converted from mobility values and vice versa. If adduct specific values are converted then the charge value
used for these calculations is taken from the corresponding adduct. For non-adduct specific values the charge is
taken from the adduct specified in suspect list if present, or from the default charge specified in CCSParams
otherwise.
Chemical properties such as SMILES,
InChIKey and formulae in the suspect data (if prepareChemProps=TRUE) and from data (if a table) are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the –neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33 .
Schymanski EL, Kondić T, Neumann S, Thiessen PA, Zhang J, Bolton EE (2021).
“Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFrag.”
Journal of Cheminformatics, 13(1).
ISSN 1758-2946, doi:10.1186/s13321-021-00489-0
, http://dx.doi.org/10.1186/s13321-021-00489-0.
Elapavalore A, Ross DH, Grouès V, Aurich D, Krinsky AM, Kim S, Thiessen PA, Zhang J, Dodds JN, Baker ES, Bolton EE, Xu L, Schymanski EL (2025).
“PubChemLite Plus Collision Cross Section (CCS) Values for Enhanced Interpretation of Nontarget Environmental Data.”
Environmental Science & Technology Letters, 12(2), 166–174.
ISSN 2328-8930, doi:10.1021/acs.estlett.4c01003
, http://dx.doi.org/10.1021/acs.estlett.4c01003.
Ross DH, Cho JH, Xu L (2020).
“Breaking Down Structural Diversity for Comprehensive Prediction of Ion-Neutral Collision Cross Sections.”
Analytical Chemistry, 92(6), 4548–4557.
ISSN 1520-6882, doi:10.1021/acs.analchem.9b05772
, http://dx.doi.org/10.1021/acs.analchem.9b05772.