4.6 Annotation
The annotation consists of collecting MS peak lists and then formula and/or compound annotation:
Note that compound annotation is normally not dependent upon formula annotation. However, formula data can be used to improve ranking of candidates afterwards by the addFormulaScoring()
function, which will be discussed later in this section. Furthermore, suspect annotation is not mandatory, and may use data from peak lists, formulae and/or comounds.
4.6.1 MS peak lists
Algorithm | Usage | Remarks |
---|---|---|
mzR | generateMSPeakLists(algorithm = "mzr", ...) |
Uses mzR for spectra retrieval. Recommended default. |
DataAnalysis | generateMSPeakLists(algorithm = "bruker", ...) |
Loads data after automatically generating MS and MS/MS spectra in DataAnalysis |
DataAnalysis FMF | generateMSPeakLists(algorithm = "brukerfmf", ...) |
Uses spectra from the find molecular features algorithm. |
The recommended default algorithm is mzr
: this algorithm is generally faster and is not limited to a vendor data format as it will read the open mzML
and mzXML
file formats. On the other hand, when DataAnalysis is used with Bruker data the spectra can be automatically background subtracted and there is no need for file conversion. Note that the brukerfmf
algorithm only works when findFeatures()
was called with the bruker
algorithm.
When generateMSPeakists()
is called it will
- Find all MS and MS/MS spectra that ‘belong’ to a feature. For MS spectra this means that all spectra close to the retention time of a feature will be collected. In addition, for MS/MS normally only spectra will be considered that have a precursor mass close to that of the feature (however, this can be disabled for data that was recorded with data independent acquisition (DIA, MS^E, bbCID, …)).
- Average all MS and MS/MS spectra to produce peak lists for each feature.
- Average all peak lists for features within the same group.
Data from either (2) or (3) is used for subsequent annotation steps. Formula calculation can use either (as a trade-off between possibly more accurate results by outlier removal vs speed), whereas compound annotation will always use data from (3) since annotating single features (as opposed to their groups) would take a very long time.
There are several common function arguments to generateMSPeakLists()
that can be used to optimize its behaviour:
Argument | Algorithm(s) | Remarks |
---|---|---|
maxMSRtWindow |
mzr , bruker |
Maximum time window +/- the feature retention time (in seconds) to collect spectra for averaging. Higher values may significantly increase processing times. |
precursorMzWindow |
mzr |
Maximum precursor m/z search window to find MS/MS spectra. Set to NULL to disable (i.e. for DIA experiments). |
topMost |
mzr |
Only retain feature data for no more than this amount analyses with highest intensity. For instance, a value of 1 will only keep peak lists for the feature with highest intensity in a feature group. |
bgsubtr |
bruker |
Perform background subtraction (if the spectra type supports this, e.g. MS and bbCID) |
minMSIntensity , minMSMSIntensity |
bruker , brukerfmf |
Minimum MS and MS/MS intensity. Note that DataAnalysis reports many zero intensity peaks so a value of at least 1 is recommended. |
MSMSType |
bruker |
The type of spectra that should be used for MSMS: "BBCID" for bbCID experiments, otherwise "MSMS" (the default). |
In addition, several parameters can be set that affect spectral averaging. These parameters are passed as a list
to the avgFeatParams
(mzr
algorithm only) and avgFGroupParams
arguments, which affect averaging of feature and feature group data, respectively. Some typical parameters include:
clusterMzWindow
: Maximum m/z window used to cluster mass peaks when averaging. The better the MS resolution, the lower this value should be.topMost
: Retain no more than this amount of most intense mass peaks. Useful to filter out ‘noisy’ peaks.minIntensityPre
/minIntensityPost
: Mass peaks below this intensity will be removed before/after averaging.
See ?generateMSPeakLists
for all possible parameters.
A suitable list object to set averaging parameters can be obtained with the getDefAvgPListParams()
function.
4.6.2 Formulae
Formulae can be automatically calculated for all features using the generateFormulas()
function. The following algorithms are currently supported:
Algorithm | Usage | Remarks |
---|---|---|
GenForm | generateFormulas(algorithm = "genform", ...) |
Bundled with patRoon . Reasonable default. |
SIRIUS | generateFormulas(algorithm = "sirius", ...) |
Requires MS/MS data. |
DataAnalysis | generateFormulas(algorithm = "bruker", ...) |
Requires FMF features (i.e. findFeatures(algorithm = "bruker", ...) ). Uses SmartFormula algorithms. |
Calculation with GenForm is often a good default. It is fast and basic rules can be applied to filter out obvious non-existing formulae. A possible drawback of GenForm, however, is that may become slow when many candidates are calculated, for instance, due to a relative high feature m/z (e.g. >600) or loose elemental restricitions. More thorough calculation is performed with SIRIUS: this algorithm often yields fewer and often more plausible results. However, SIRIUS requires MS/MS data (hence features without will not have results) and formula prediction may not work well for compounds that structurally deviate from the training sets used by SIRIUS. Calculation with DataAnalysis is only possible when features are obtained with DataAnalysis as well. An advantage is that analysis files do not have to be converted, however, compared to other algorithms calculation is often relative slow.
There are two methods for formula assignment:
- Formulae are first calculated for each individual feature within a feature group. These results are then pooled, outliers are removed and remaining formulae are assigned to the feature group (i.e.
calculateFeatures = TRUE
). - Formulae are directly calculated for each feature group by using group averaged peak lists (see previous section) (i.e.
calculateFeatures = FALSE
).
The first method is more thorough and the possibility to remove outliers may sometimes result in better formula assignment. However, the second method is much faster and generally recommended for large number of analyses.
By default, formulae are either calculated by only MS/MS data (SIRIUS) or with both MS and MS/MS data (GenForm/Bruker). The latter also allows formula calculation when no MS/MS data is present. Furthermore, with Bruker algorithms, data from both MS and MS/MS formula data can be combined to allow inclusion of candidates that would otherwise be excluded by e.g. poor MS/MS data. However, a disadvantage is that formulae needs to be calculated twice. The MSMode
argument (listed below) can be used to customize this behaviour.
An overview of common parameters that are typically set to customize formula calculation is listed below.
Argument | Algorithm(s) | Remarks |
---|---|---|
relMzDev | genform , sirius |
The maximum relative m/z deviation for a formula to be considered (in ppm). |
elements | genform , sirius |
Which elements to consider. By default "CHNOP" . Try to limit possible elements as much as possible. |
calculateFeatures | genform , sirius |
Whether formulae should be calculated first for all features (see discussion above) (always TRUE with DataAnalysis). |
featThresholdAnn | All | Minimum relative amount (0-1) that a candidate formula for a feature group should be found among all annotated features (e.g. 1 means that a candidate is only considered if it was assigned to all annotated features). |
adduct | All | The adduct to consider for calculation (e.g. "[M+H]+" , "[M-H]-" , more details in the adduct section). Don’t set this when adduct annotations are available. |
MSMode | genform , bruker |
Whether formulae should be generated only from MS data ("ms" ), MS/MS data ("msms" ) or both ("both" ). The latter is default, see discussion above. |
profile | sirius |
Instrument profile, e.g. "qtof" , "orbitrap" , "fticr" . |
Some typical examples:
formulasGF <- generateFormulas(fGroups, mslists, "genform") # GenForm, default settings
formulasGF2 <- generateFormulas(fGroups, mslists, "genform", calculateFeatures = FALSE) # direct feature group assignment (faster)
formulasSIR <- generateFormulas(fGroups, mslists, "sirius", elements = "CHNOPSClBr") # SIRIUS, common elements for pollutant
formulasSIR2 <- generateFormulas(fGroups, mslists, "sirius", adduct = "[M-H]-") # SIRIUS, negative ionization
formulasBr <- generateFormulas(fGroups, mslists, "bruker", MSMode = "MSMS") # Only consider MSMS data (SmartFormula3D)
4.6.3 Compounds
An important step in a typical non-target workflow is structural identification for features of interest, as this information may finally reveal what a feature is. In a first step all possible candidate structures for a feature are obtained from a database (based on e.g. monoisotopic mass or formula). These candidates are then ranked, for instance, by matching the feature MS/MS data with in-silico or library MS/MS spectra or its relevance to the environment.
Structure assignment in patRoon
is performed automatically for all feature groups with the generateCompounds()
function. Currently, this function supports the following algorithms:
Algorithm | Usage | Remarks |
---|---|---|
MetFrag | generateCompounds(algorithm = "metfrag", ...) |
Supports many databases (including offline and custom), matching MS/MS data with in-silico and library MS/MS data, and many other scorings to rank candidates. |
SIRIUS with CSI:FingerID | generateCompounds(algorithm = "sirius", ...) |
Matches with in-silico MS/MS data, incorporates formula annotations to improve candidate selection. |
Library | generateCompounds(algorithm = "library", ...) |
Obtains candidates by matching MS/MS data with an offline MS library, e.g. obtained from MassBank.eu or MoNA. |
All algorithms rank their candidates by matching MS/MS data with in-silico generated MS/MS data (MetFrag and SIRIUS) and/or experimental MS/MS data from an MS library (MetFrag with MoNA scoring and Library algorithm). The latter may yield better candidates, and the Library algorithm is also generally much faster. However, in-silico annotation is not limited by the availability of experimental MS/MS data.
Compound annotation is often a relative time and resource intensive procedure. For this reason, annotation occurs for each feature group and not individual features. Nevertheless, it is not uncommon that this is the most time consuming step in the workflow. For this reason, prioritization of features is highly important, even more so to avoid ‘abusing’ servers when an online database is used for compound retrieval.
4.6.3.1 Database selection for MetFrag and SIRIUS
Selecting the right database is important for proper candidate assignment. If the ‘right’ chemical compound is not present in the used database, it is impossible to assign the correct structure. Luckily, however, several large databases such as PubChem and ChemSpider are openly available which contain tens of millions of compounds. On the other hand, these databases may also lead to many unlikely candidates and therefore more specialized (or custom databases) may be preferred. Which database will be used is dictated by the database
argument to generateCompounds()
, currently the following options exist:
Database | Algorithm(s) | Remarks |
---|---|---|
pubchem |
"metfrag" , "sirius" |
PubChem is currently the largest compound database and is used by default. |
chemspider |
"metfrag" |
ChemSpider is another large database. Requires security token from here (see next section). |
comptox |
"metfrag" |
The EPA CompTox contains many compounds and scorings relevant to environmental studies. Needs manual download (see next section). |
pubchemlite |
"metfrag" |
A specialized subset of the PubChem database. Needs manual download (see next section). |
for-ident |
"metfrag" |
The FOR-IDENT (STOFF-IDENT) database for water related substances. |
kegg |
"metfrag" , "sirius" |
The KEGG database for biological compounds |
hmdb |
"metfrag" , "sirius" |
The HMDB contains many human metabolites. |
bio |
"sirius" |
Selects all supports biological databases. |
csv , psv , sdf |
"metfrag" |
Custom database (see next section). CSV example. |
4.6.3.2 Configuring MetFrag databases and scoring
Some extra configuration may be necessary when using certain databases with MetFrag. In order to use the ChemSpider database a security token should be requested and set with the chemSpiderToken
argument to generateCompounds()
. The CompTox and PubChemLite databases need to be manually downloaded from CompTox (or variations with smoking or wastewater metadata) and PubChemLite (or the PubChem derived OECD PFAS database). The file location of this and other local databases (csv
, psv
, sdf
) needs to be manually configured, see the examples below and/or ?generateCompounds
for more information on how to do this.
# PubChem: the default
compsMF <- generateCompounds(fGroups, mslists, "metfrag", adduct = "[M+H]+")
# ChemSpider: needs security token
compsMF2 <- generateCompounds(fGroups, mslists, "metfrag", database = "chemspider",
chemSpiderToken = "MY_TOKEN_HERE", adduct = "[M+H]+")
# CompTox: set global option to database path
options(patRoon.path.MetFragCompTox = "~/CompTox_17March2019_SelectMetaData.csv")
compsMF3 <- generateCompounds(fGroups, mslists, "metfrag", database = "comptox", adduct = "[M+H]+")
# CompTox: set database location without global option
compsMF4 <- generateCompounds(fGroups, mslists, "metfrag", database = "comptox", adduct = "[M+H]+",
extraOpts = list(LocalDatabasePath = "~/CompTox_17March2019_SelectMetaData.csv"))
# Same, but for custom database
compsMF5 <- generateCompounds(fGroups, mslists, "metfrag", database = "csv", adduct = "[M+H]+",
extraOpts = list(LocalDatabasePath = "~/mydb.csv"))
An example of a custom .csv database can be found here.
With MetFrag compound databases are not only used to retrieve candidate structures but are also used to obtain metadata for further ranking. Each database has its own scorings, a table with currently supported scorings can be obtained with the compoundScorings()
function (some columns omitted):
name | metfrag | database | default |
---|---|---|---|
score | Score | TRUE | |
fragScore | FragmenterScore | TRUE | |
metFusionScore | OfflineMetFusionScore | TRUE | |
individualMoNAScore | OfflineIndividualMoNAScore | TRUE | |
numberPatents | PubChemNumberPatents | pubchem | TRUE |
numberPatents | Patent_Count | pubchemlite | TRUE |
pubMedReferences | PubChemNumberPubMedReferences | pubchem | TRUE |
pubMedReferences | ChemSpiderNumberPubMedReferences | chemspider | TRUE |
pubMedReferences | NUMBER_OF_PUBMED_ARTICLES | comptox | TRUE |
pubMedReferences | PubMed_Count | pubchemlite | TRUE |
extReferenceCount | ChemSpiderNumberExternalReferences | chemspider | TRUE |
dataSourceCount | ChemSpiderDataSourceCount | chemspider | TRUE |
referenceCount | ChemSpiderReferenceCount | chemspider | TRUE |
RSCCount | ChemSpiderRSCCount | chemspider | TRUE |
formulaScore | FALSE | ||
RF_SMILES | FALSE | ||
RF_SIRFP | FALSE | ||
LC50_SMILES | FALSE | ||
LC50_SIRFP | FALSE | ||
smartsInclusionScore | SmartsSubstructureInclusionScore | FALSE | |
smartsExclusionScore | SmartsSubstructureExclusionScore | FALSE | |
suspectListScore | SuspectListScore | FALSE | |
retentionTimeScore | RetentionTimeScore | FALSE | |
CPDATCount | CPDAT_COUNT | comptox | TRUE |
TOXCASTActive | TOXCAST_PERCENT_ACTIVE | comptox | TRUE |
dataSources | DATA_SOURCES | comptox | TRUE |
pubChemDataSources | PUBCHEM_DATA_SOURCES | comptox | TRUE |
EXPOCASTPredExpo | EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY | comptox | TRUE |
ECOTOX | ECOTOX | comptox | TRUE |
NORMANSUSDAT | NORMANSUSDAT | comptox | TRUE |
MASSBANKEU | MASSBANKEU | comptox | TRUE |
TOX21SL | TOX21SL | comptox | TRUE |
TOXCAST | TOXCAST | comptox | TRUE |
KEMIMARKET | KEMIMARKET | comptox | TRUE |
MZCLOUD | MZCLOUD | comptox | TRUE |
pubMedNeuro | PubMedNeuro | comptox | TRUE |
CIGARETTES | CIGARETTES | comptox | TRUE |
INDOORCT16 | INDOORCT16 | comptox | TRUE |
SRM2585DUST | SRM2585DUST | comptox | TRUE |
SLTCHEMDB | SLTCHEMDB | comptox | TRUE |
THSMOKE | THSMOKE | comptox | TRUE |
ITNANTIBIOTIC | ITNANTIBIOTIC | comptox | TRUE |
STOFFIDENT | STOFFIDENT | comptox | TRUE |
KEMIMARKET_EXPO | KEMIMARKET_EXPO | comptox | TRUE |
KEMIMARKET_HAZ | KEMIMARKET_HAZ | comptox | TRUE |
REACH2017 | REACH2017 | comptox | TRUE |
KEMIWW_WDUIndex | KEMIWW_WDUIndex | comptox | TRUE |
KEMIWW_StpSE | KEMIWW_StpSE | comptox | TRUE |
KEMIWW_SEHitsOverDL | KEMIWW_SEHitsOverDL | comptox | TRUE |
ZINC15PHARMA | ZINC15PHARMA | comptox | TRUE |
PFASMASTER | PFASMASTER | comptox | TRUE |
peakFingerprintScore | AutomatedPeakFingerprintAnnotationScore | FALSE | |
lossFingerprintScore | AutomatedLossFingerprintAnnotationScore | FALSE | |
agroChemInfo | AgroChemInfo | pubchemlite | FALSE |
bioPathway | BioPathway | pubchemlite | FALSE |
drugMedicInfo | DrugMedicInfo | pubchemlite | FALSE |
foodRelated | FoodRelated | pubchemlite | FALSE |
pharmacoInfo | PharmacoInfo | pubchemlite | FALSE |
safetyInfo | SafetyInfo | pubchemlite | FALSE |
toxicityInfo | ToxicityInfo | pubchemlite | FALSE |
knownUse | KnownUse | pubchemlite | FALSE |
disorderDisease | DisorderDisease | pubchemlite | FALSE |
identification | Identification | pubchemlite | FALSE |
annoTypeCount | FPSum | pubchemlite | TRUE |
annoTypeCount | AnnoTypeCount | pubchemlite | TRUE |
annotHitCount | AnnotHitCount | pubchemlite | TRUE |
libMatch | TRUE |
The first two columns contain the generic and original MetFrag naming schemes for each scoring type. While both naming schemes can be used, the generic is often shorter and harmonized with other algorithms (e.g. SIRIUS). The database column specifies for which databases a particular scoring is available (empty if not database specific). Most scorings are selected by default (as specified by the default column), however, this behaviour can be customized by using the scoreTypes
argument:
# Only in-silico and PubChem number of patents scorings
compsMF1 <- generateCompounds(fGroups, mslists, "metfrag", adduct = "[M+H]+",
scoreTypes = c("fragScore" "numberPatents"))
# Custom scoring in custom database
compsMF2 <- generateCompounds(fGroups, mslists, "metfrag", adduct = "[M+H]+",
database = "csv",
extraOpts = list(LocalDatabasePath = "~/mydb.csv"),
scoreTypes = c("fragScore", "myScore", "myScore2"))
By default ranking is performed with equal weight (i.e. 1) for all scorings. This can be changed by the scoreWeights
argument, which should be a vector
containing the weights for all scorings following the order of scoreTypes
, for instance:
compsMF <- generateCompounds(fGroups, mslists, "metfrag", adduct = "[M+H]+",
scoreTypes = c("fragScore" "numberPatents"),
scoreWeights = c(1, 2))
Sometimes thousands or more structural candidates are found when annotating a feature group. In this situation processing all these candidates will too involving (especially when external databases are used). To avoid this a default cut-off is set: when the number of candidates exceed a certain amount the search will be aborted and no results will be reported for that feature group. The maximum number of candidates can be set with the maxCandidatesToStop
argument. The default value is relative conservative, especially for local databases it may be useful to increase this number.
4.6.3.3 MetFrag error and timeout handling
The use of online databases has the drawback that an error may occur, for instance, as a result of a connection error or when the aforementioned maximum number of candidates is reached (maxCandidatesToStop
argument). By default, the processing is restarted if an error has occurred (configured by the errorRetries
argument). Similarly, the timeoutRetries
and timeout
arguments can be used to avoid being ‘stuck’ on obtaining results, for instance, due to an unstable internet connection. If no compounds could be assigned due to an error a warning will be issued. In this case it is best to see what went wrong by manually checking the log files, which by default are stored in the log/metfrag folder.
4.6.3.4 Annotation with the Library algorithm
To use the Library algorithm we first need to load an MS library. Currently, MS libraries in the MSP and MoNA JSON formats are supported. Note that the former format is not so well standardized, and the support in patRoon
was mainly tailored for MSP files from MassBank.eu and MoNA. To load the MS library the loadMSLibrary()
function is used:
mslibrary <- loadMSLibrary("~/MassBank_NIST.msp", "msp") # MassBank.eu MSP library
mslibrary <- loadMSLibrary("~/MoNA-export-CASMI_2016.msp", "msp") # MoNA MSP library
mslibrary <- loadMSLibrary("~/MoNA-export-MassBank.json", "json") # MoNA JSON library
NOTE Currently it is only possible to load formula annotated MS/MS peaks with the MoNA JSON format.
Once loaded, the MS library can be post-processed with various filtering, subsetting and export functionality, which may be useful for more tailored compound annotation. This is further discussed in the advanced chapter.
The compound annotation is performed with generateCompounds()
:
4.6.3.5 Formula scoring
Ranking of candidate structures may further be improved by incorporating formula information by using the addFormulaScoring()
function:
Here, corresponding formula and explained fragments will be used to calculate a formulaScore for each candidate. Note that SIRIUS candidates are already based on calculated formulae, hence, running this function on SIRIUS results is less sensible unless scoring from another formula calculation algorithm is desired.
4.6.4 Suspect annotation
The data obtained during the previously described annotation steps can be used to improve a suspect screening workflow. The annotateSuspects()
method uses the annotation data to calculate various annotation properties for each suspect, such as their rank in formula/compound candidates, which fragments from the suspect list were matched, and a rough indication of the identification level according to Schymanski et al. (2014)
fGroupsSusp <- annotateSuspects(fGroupsSusp, MSPeakLists = mslists,
formulas = formulas, compounds = compounds)
#> Annotating 5 suspects...
#> ================================================================================
The calculation of identification levels is performed by a set of pre-defined rules. The genIDLevelRulesFile()
can be used to inspect the default rules or to create your own rules file, which can subsequently passed to annotateSuspects()
with the IDFile
argument. See ?annotateSuspects
for more details on the file format and options. The default identification levels can be summarized as follows:
Level | Description | Rules |
---|---|---|
1 | Target match | Retention time deviates <12 seconds from suspect list. At least 3 (or all if the suspect list contains less) fragments from the suspect list must match. |
2a | Good MS/MS library match | Suspect is top ranked in the compounds results. The individualMoNAScore (MetFrag) or libMatch (Library algorithm) is at least 0.9 and no other candidates were matched with the MS library. |
3a | Fair library match | The individualMoNAScore or libMatch is at least 0.4. |
3b | Known MS/MS match | At least 3 (or all if the suspect list contains less) fragments from the suspect list must match. |
3c | Good in-silico MS/MS match | The annotation MS/MS similarity (annSimComp column) is at least 0.7 . |
4a | Good formula MS/MS match | Suspect is top ranked formula candidate, annotation MS/MS similarity (annSimForm column) is at least 0.7 and isotopic match (isoScore ) of at least 0.5. The latter two scores are at least 0.2 higher than next best ranked candidate. |
4b | Good formula isotopic pattern match | Suspect is top ranked formula candidate and isotopic match (isoScore ) of at least 0.9 and at least 0.2 higher than next best ranked candidate. |
5 | Unknown | All else. |
In general, the more data provided by the suspect list and to annotateSuspects()
, the better identification level estimation works. For instance, when considering the default rules, either the fragments_mz
or fragments_formula
column is necessary to be able assign a level 3b
. Similarly, the suspect list needs retention times (as well as fragment data) to be able to assign level 1
. As you can imagine, providing the annotation workflow objects (i.e. MSPeakLists
, formulas
, compounds
) to annotateSuspects()
is necessary for calculation of most levels.
The annotateSuspects()
function will log decisions for identification level assignments to the log/
sub-directory in the current working directory. This is useful to inspect level assignments and especially useful when you customized any rules.
NOTE: The current identification level rules are only optimized for GenForm and MetFrag annotation algorithms.
4.6.5 Account login for SIRIUS
Recent version of SIRIUS require an active account login to make queries to CSI:FingerID. This is primarily relevant when performing a compound annotation workflow with SIRIUS
or a formula annotation workflow with getFingerprints=TRUE
, e.g. when predicting toxicities or concentrations.
As a first step, please create an account as described in the SIRIUS documentation: https://v6.docs.sirius-ms.io/account-and-license/.
Then, to login there are two options:
- Manually login: either by using the SIRIUS GUI or the CLI. For the latter, see e.g.
sirius.exe login --help
for more details. - Let
patRoon
automatically handle logins.
The login
parameter for generateCompounds()
and generateFormulas()
determines how logins are dealt with by patRoon
. There are four options:
login=FALSE
: no logins are performed and no checks are performed to verify if there is an existing login.login="check"
: no logins are performed, but an active login is required to proceed.login="interactive"
: if no active login is present, then the username and password will be asked interactively and used to automatically login.login=c(username="...", password="...")
: if no active login is present, then the provided username and password will be used to automatically login.
NOTE: For the fourth option, please don’t provide the login details directly as plain-text for security reasons. See below for proper alternatives.
The first two options are primarily meant for manual login. The function parameter alwaysLogin=TRUE
can be set to force a login for the third and fourth options.
The fourth option is primarily useful for e.g. heavy users of SIRIUS or unattended automatic workflows. To securely provide the login details, it is best to store them elsewhere. This webpage provides a detailed overview of how credentials can be safely stored. For instance, you can save the credentials in your .Renviron
file and retreieve them when calling generateCompounds()
:
In your .Renviron
file add:
and then in your R
script:
compounds <- generateCompounds(..., login = c(username = Sys.getenv("SIRIUS_USERNAME"),
password = Sys.getenv("SIRIUS_PASSWORD")))
Alternatively, you could use the keyring package, e.g.
install.packages("keyring") # execute in case you don't have keyring installed yet
keyring::key_set("SIRIUS", username = "myaccount@email.com") # execute this once to store the password
compounds <- generateCompounds(..., login = c(username = "myaccount@email.com",
password = keyring::key_get("SIRIUS", "myaccount@email.com"))))