4.6 Annotation

The annotation consists of collecting MS peak lists and then formula and/or compound annotation:

Note that compound annotation is normally not dependent upon formula annotation. However, formula data can be used to improve ranking of candidates afterwards by the addFormulaScoring() function, which will be discussed later in this section. Furthermore, suspect annotation is not mandatory, and may use data from peak lists, formulae and/or comounds.

4.6.1 MS peak lists

Algorithm	Usage	Remarks
mzR	`generateMSPeakLists(algorithm = "mzr", ...)`	Uses mzR for spectra retrieval. Recommended default.
DataAnalysis	`generateMSPeakLists(algorithm = "bruker", ...)`	Loads data after automatically generating MS and MS/MS spectra in DataAnalysis
DataAnalysis FMF	`generateMSPeakLists(algorithm = "brukerfmf", ...)`	Uses spectra from the find molecular features algorithm.

The recommended default algorithm is mzr: this algorithm is generally faster and is not limited to a vendor data format as it will read the open mzML and mzXML file formats. On the other hand, when DataAnalysis is used with Bruker data the spectra can be automatically background subtracted and there is no need for file conversion. Note that the brukerfmf algorithm only works when findFeatures() was called with the bruker algorithm.

When generateMSPeakists() is called it will

Find all MS and MS/MS spectra that ‘belong’ to a feature. For MS spectra this means that all spectra close to the retention time of a feature will be collected. In addition, for MS/MS normally only spectra will be considered that have a precursor mass close to that of the feature (however, this can be disabled for data that was recorded with data independent acquisition (DIA, MS^E, bbCID, …)).
Average all MS and MS/MS spectra to produce peak lists for each feature.
Average all peak lists for features within the same group.

Data from either (2) or (3) is used for subsequent annotation steps. Formula calculation can use either (as a trade-off between possibly more accurate results by outlier removal vs speed), whereas compound annotation will always use data from (3) since annotating single features (as opposed to their groups) would take a very long time.

There are several common function arguments to generateMSPeakLists() that can be used to optimize its behaviour:

Argument	Algorithm(s)	Remarks
`maxMSRtWindow`	`mzr`, `bruker`	Maximum time window +/- the feature retention time (in seconds) to collect spectra for averaging. Higher values may significantly increase processing times.
`precursorMzWindow`	`mzr`	Maximum precursor m/z search window to find MS/MS spectra. Set to `NULL` to disable (i.e. for DIA experiments).
`topMost`	`mzr`	Only retain feature data for no more than this amount analyses with highest intensity. For instance, a value of 1 will only keep peak lists for the feature with highest intensity in a feature group.
`bgsubtr`	`bruker`	Perform background subtraction (if the spectra type supports this, e.g. MS and bbCID)
`minMSIntensity`, `minMSMSIntensity`	`bruker`, `brukerfmf`	Minimum MS and MS/MS intensity. Note that DataAnalysis reports many zero intensity peaks so a value of at least 1 is recommended.
`MSMSType`	`bruker`	The type of spectra that should be used for MSMS: `"BBCID"` for bbCID experiments, otherwise `"MSMS"` (the default).

In addition, several parameters can be set that affect spectral averaging. These parameters are passed as a list to the avgFeatParams (mzr algorithm only) and avgFGroupParams arguments, which affect averaging of feature and feature group data, respectively. Some typical parameters include:

clusterMzWindow: Maximum m/z window used to cluster mass peaks when averaging. The better the MS resolution, the lower this value should be.
topMost: Retain no more than this amount of most intense mass peaks. Useful to filter out ‘noisy’ peaks.
minIntensityPre / minIntensityPost: Mass peaks below this intensity will be removed before/after averaging.

See ?generateMSPeakLists for all possible parameters.

A suitable list object to set averaging parameters can be obtained with the getDefAvgPListParams() function.

# lower default clustering window, other settings remain default
avgPListParams <- getDefAvgPListParams(clusterMzWindow = 0.001)

# Apply to both feature and feature group averaging
plists <- generateMSPeakLists(fGroups, "mzr", avgFeatParams = avgPListParams, avgFGroupParams = avgPListParams)

4.6.2 Formulae

Formulae can be automatically calculated for all features using the generateFormulas() function. The following algorithms are currently supported:

Algorithm	Usage	Remarks
GenForm	`generateFormulas(algorithm = "genform", ...)`	Bundled with `patRoon`. Reasonable default.
SIRIUS	`generateFormulas(algorithm = "sirius", ...)`	Requires MS/MS data.
DataAnalysis	`generateFormulas(algorithm = "bruker", ...)`	Requires FMF features (i.e. `findFeatures(algorithm = "bruker", ...)`). Uses SmartFormula algorithms.

Calculation with GenForm is often a good default. It is fast and basic rules can be applied to filter out obvious non-existing formulae. A possible drawback of GenForm, however, is that may become slow when many candidates are calculated, for instance, due to a relative high feature m/z (e.g. >600) or loose elemental restricitions. More thorough calculation is performed with SIRIUS: this algorithm often yields fewer and often more plausible results. However, SIRIUS requires MS/MS data (hence features without will not have results) and formula prediction may not work well for compounds that structurally deviate from the training sets used by SIRIUS. Calculation with DataAnalysis is only possible when features are obtained with DataAnalysis as well. An advantage is that analysis files do not have to be converted, however, compared to other algorithms calculation is often relative slow.

There are two methods for formula assignment:

Formulae are first calculated for each individual feature within a feature group. These results are then pooled, outliers are removed and remaining formulae are assigned to the feature group (i.e. calculateFeatures = TRUE).
Formulae are directly calculated for each feature group by using group averaged peak lists (see previous section) (i.e. calculateFeatures = FALSE).

The first method is more thorough and the possibility to remove outliers may sometimes result in better formula assignment. However, the second method is much faster and generally recommended for large number of analyses.

By default, formulae are either calculated by only MS/MS data (SIRIUS) or with both MS and MS/MS data (GenForm/Bruker). The latter also allows formula calculation when no MS/MS data is present. Furthermore, with Bruker algorithms, data from both MS and MS/MS formula data can be combined to allow inclusion of candidates that would otherwise be excluded by e.g. poor MS/MS data. However, a disadvantage is that formulae needs to be calculated twice. The MSMode argument (listed below) can be used to customize this behaviour.

An overview of common parameters that are typically set to customize formula calculation is listed below.

Argument	Algorithm(s)	Remarks
relMzDev	`genform`, `sirius`	The maximum relative m/z deviation for a formula to be considered (in ppm).
elements	`genform`, `sirius`	Which elements to consider. By default `"CHNOP"`. Try to limit possible elements as much as possible.
calculateFeatures	`genform`, `sirius`	Whether formulae should be calculated first for all features (see discussion above) (always `TRUE` with DataAnalysis).
featThresholdAnn	All	Minimum relative amount (0-1) that a candidate formula for a feature group should be found among all annotated features (e.g. 1 means that a candidate is only considered if it was assigned to all annotated features).
adduct	All	The adduct to consider for calculation (e.g. `"[M+H]+"`, `"[M-H]-"`, more details in the adduct section). Don’t set this when adduct annotations are available.
MSMode	`genform`, `bruker`	Whether formulae should be generated only from MS data (`"ms"`), MS/MS data (`"msms"`) or both (`"both"`). The latter is default, see discussion above.
profile	`sirius`	Instrument profile, e.g. `"qtof"`, `"orbitrap"`, `"fticr"`.

Some typical examples:

formulasGF <- generateFormulas(fGroups, mslists, "genform") # GenForm, default settings
formulasGF2 <- generateFormulas(fGroups, mslists, "genform", calculateFeatures = FALSE) # direct feature group assignment (faster)
formulasSIR <- generateFormulas(fGroups, mslists, "sirius", elements = "CHNOPSClBr") # SIRIUS, common elements for pollutant
formulasSIR2 <- generateFormulas(fGroups, mslists, "sirius", adduct = "[M-H]-") # SIRIUS, negative ionization
formulasBr <- generateFormulas(fGroups, mslists, "bruker", MSMode = "MSMS") # Only consider MSMS data (SmartFormula3D)

4.6.3 Compounds

An important step in a typical non-target workflow is structural identification for features of interest, as this information may finally reveal what a feature is. In a first step all possible candidate structures for a feature are obtained from a database (based on e.g. monoisotopic mass or formula). These candidates are then ranked, for instance, by matching the feature MS/MS data with in-silico or library MS/MS spectra or its relevance to the environment.

Structure assignment in patRoon is performed automatically for all feature groups with the generateCompounds() function. Currently, this function supports the following algorithms:

Algorithm	Usage	Remarks
MetFrag	`generateCompounds(algorithm = "metfrag", ...)`	Supports many databases (including offline and custom), matching MS/MS data with in-silico and library MS/MS data, and many other scorings to rank candidates.
SIRIUS with CSI:FingerID	`generateCompounds(algorithm = "sirius", ...)`	Matches with in-silico MS/MS data, incorporates formula annotations to improve candidate selection.
Library	`generateCompounds(algorithm = "library", ...)`	Obtains candidates by matching MS/MS data with an offline MS library, e.g. obtained from MassBank.eu or MoNA.

All algorithms rank their candidates by matching MS/MS data with in-silico generated MS/MS data (MetFrag and SIRIUS) and/or experimental MS/MS data from an MS library (MetFrag with MoNA scoring and Library algorithm). The latter may yield better candidates, and the Library algorithm is also generally much faster. However, in-silico annotation is not limited by the availability of experimental MS/MS data.

Compound annotation is often a relative time and resource intensive procedure. For this reason, annotation occurs for each feature group and not individual features. Nevertheless, it is not uncommon that this is the most time consuming step in the workflow. For this reason, prioritization of features is highly important, even more so to avoid ‘abusing’ servers when an online database is used for compound retrieval.

4.6.3.1 Database selection for MetFrag and SIRIUS

Selecting the right database is important for proper candidate assignment. If the ‘right’ chemical compound is not present in the used database, it is impossible to assign the correct structure. Luckily, however, several large databases such as PubChem and ChemSpider are openly available which contain tens of millions of compounds. On the other hand, these databases may also lead to many unlikely candidates and therefore more specialized (or custom databases) may be preferred. Which database will be used is dictated by the database argument to generateCompounds(), currently the following options exist:

Database	Algorithm(s)	Remarks
`pubchem`	`"metfrag"`, `"sirius"`	PubChem is currently the largest compound database and is used by default.
`chemspider`	`"metfrag"`	ChemSpider is another large database. Requires security token from here (see next section).
`comptox`	`"metfrag"`	The EPA CompTox contains many compounds and scorings relevant to environmental studies. Needs manual download (see next section).
`pubchemlite`	`"metfrag"`	A specialized subset of the PubChem database. Needs manual download (see next section).
`for-ident`	`"metfrag"`	The FOR-IDENT (STOFF-IDENT) database for water related substances.
`kegg`	`"metfrag"`, `"sirius"`	The KEGG database for biological compounds
`hmdb`	`"metfrag"`, `"sirius"`	The HMDB contains many human metabolites.
`bio`	`"sirius"`	Selects all supports biological databases.
`csv`, `psv`, `sdf`	`"metfrag"`	Custom database (see next section). CSV example.

4.6.3.2 Configuring MetFrag databases and scoring

Some extra configuration may be necessary when using certain databases with MetFrag. In order to use the ChemSpider database a security token should be requested and set with the chemSpiderToken argument to generateCompounds(). The CompTox and PubChemLite databases need to be manually downloaded from CompTox (or variations with smoking or wastewater metadata) and PubChemLite (or the PubChem derived OECD PFAS database). The file location of this and other local databases (csv, psv, sdf) needs to be manually configured, see the examples below and/or ?generateCompounds for more information on how to do this.

# PubChem: the default
compsMF <- generateCompounds(fGroups, mslists, "metfrag", adduct = "[M+H]+")

# ChemSpider: needs security token
compsMF2 <- generateCompounds(fGroups, mslists, "metfrag", database = "chemspider",
                              chemSpiderToken = "MY_TOKEN_HERE", adduct = "[M+H]+")

# CompTox: set global option to database path
options(patRoon.path.MetFragCompTox = "~/CompTox_17March2019_SelectMetaData.csv")
compsMF3 <- generateCompounds(fGroups, mslists, "metfrag", database = "comptox", adduct = "[M+H]+")

# CompTox: set database location without global option
compsMF4 <- generateCompounds(fGroups, mslists, "metfrag", database = "comptox", adduct = "[M+H]+",
                              extraOpts = list(LocalDatabasePath = "~/CompTox_17March2019_SelectMetaData.csv"))

# Same, but for custom database
compsMF5 <- generateCompounds(fGroups, mslists, "metfrag", database = "csv", adduct = "[M+H]+",
                              extraOpts = list(LocalDatabasePath = "~/mydb.csv"))

An example of a custom .csv database can be found here.

With MetFrag compound databases are not only used to retrieve candidate structures but are also used to obtain metadata for further ranking. Each database has its own scorings, a table with currently supported scorings can be obtained with the compoundScorings() function (some columns omitted):

name	metfrag	database	default
score	Score		TRUE
fragScore	FragmenterScore		TRUE
metFusionScore	OfflineMetFusionScore		TRUE
individualMoNAScore	OfflineIndividualMoNAScore		TRUE
numberPatents	PubChemNumberPatents	pubchem	TRUE
numberPatents	Patent_Count	pubchemlite	TRUE
pubMedReferences	PubChemNumberPubMedReferences	pubchem	TRUE
pubMedReferences	ChemSpiderNumberPubMedReferences	chemspider	TRUE
pubMedReferences	NUMBER_OF_PUBMED_ARTICLES	comptox	TRUE
pubMedReferences	PubMed_Count	pubchemlite	TRUE
extReferenceCount	ChemSpiderNumberExternalReferences	chemspider	TRUE
dataSourceCount	ChemSpiderDataSourceCount	chemspider	TRUE
referenceCount	ChemSpiderReferenceCount	chemspider	TRUE
RSCCount	ChemSpiderRSCCount	chemspider	TRUE
formulaScore			FALSE
RF_SMILES			FALSE
RF_SIRFP			FALSE
LC50_SMILES			FALSE
LC50_SIRFP			FALSE
smartsInclusionScore	SmartsSubstructureInclusionScore		FALSE
smartsExclusionScore	SmartsSubstructureExclusionScore		FALSE
suspectListScore	SuspectListScore		FALSE
retentionTimeScore	RetentionTimeScore		FALSE
CPDATCount	CPDAT_COUNT	comptox	TRUE
TOXCASTActive	TOXCAST_PERCENT_ACTIVE	comptox	TRUE
dataSources	DATA_SOURCES	comptox	TRUE
pubChemDataSources	PUBCHEM_DATA_SOURCES	comptox	TRUE
EXPOCASTPredExpo	EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY	comptox	TRUE
ECOTOX	ECOTOX	comptox	TRUE
NORMANSUSDAT	NORMANSUSDAT	comptox	TRUE
MASSBANKEU	MASSBANKEU	comptox	TRUE
TOX21SL	TOX21SL	comptox	TRUE
TOXCAST	TOXCAST	comptox	TRUE
KEMIMARKET	KEMIMARKET	comptox	TRUE
MZCLOUD	MZCLOUD	comptox	TRUE
pubMedNeuro	PubMedNeuro	comptox	TRUE
CIGARETTES	CIGARETTES	comptox	TRUE
INDOORCT16	INDOORCT16	comptox	TRUE
SRM2585DUST	SRM2585DUST	comptox	TRUE
SLTCHEMDB	SLTCHEMDB	comptox	TRUE
THSMOKE	THSMOKE	comptox	TRUE
ITNANTIBIOTIC	ITNANTIBIOTIC	comptox	TRUE
STOFFIDENT	STOFFIDENT	comptox	TRUE
KEMIMARKET_EXPO	KEMIMARKET_EXPO	comptox	TRUE
KEMIMARKET_HAZ	KEMIMARKET_HAZ	comptox	TRUE
REACH2017	REACH2017	comptox	TRUE
KEMIWW_WDUIndex	KEMIWW_WDUIndex	comptox	TRUE
KEMIWW_StpSE	KEMIWW_StpSE	comptox	TRUE
KEMIWW_SEHitsOverDL	KEMIWW_SEHitsOverDL	comptox	TRUE
ZINC15PHARMA	ZINC15PHARMA	comptox	TRUE
PFASMASTER	PFASMASTER	comptox	TRUE
peakFingerprintScore	AutomatedPeakFingerprintAnnotationScore		FALSE
lossFingerprintScore	AutomatedLossFingerprintAnnotationScore		FALSE
agroChemInfo	AgroChemInfo	pubchemlite	FALSE
bioPathway	BioPathway	pubchemlite	FALSE
drugMedicInfo	DrugMedicInfo	pubchemlite	FALSE
foodRelated	FoodRelated	pubchemlite	FALSE
pharmacoInfo	PharmacoInfo	pubchemlite	FALSE
safetyInfo	SafetyInfo	pubchemlite	FALSE
toxicityInfo	ToxicityInfo	pubchemlite	FALSE
knownUse	KnownUse	pubchemlite	FALSE
disorderDisease	DisorderDisease	pubchemlite	FALSE
identification	Identification	pubchemlite	FALSE
annoTypeCount	FPSum	pubchemlite	TRUE
annoTypeCount	AnnoTypeCount	pubchemlite	TRUE
annotHitCount	AnnotHitCount	pubchemlite	TRUE
libMatch			TRUE

The first two columns contain the generic and original MetFrag naming schemes for each scoring type. While both naming schemes can be used, the generic is often shorter and harmonized with other algorithms (e.g. SIRIUS). The database column specifies for which databases a particular scoring is available (empty if not database specific). Most scorings are selected by default (as specified by the default column), however, this behaviour can be customized by using the scoreTypes argument:

# Only in-silico and PubChem number of patents scorings
compsMF1 <- generateCompounds(fGroups, mslists, "metfrag", adduct = "[M+H]+",
                              scoreTypes = c("fragScore" "numberPatents"))

# Custom scoring in custom database
compsMF2 <- generateCompounds(fGroups, mslists, "metfrag", adduct = "[M+H]+",
                              database = "csv",
                              extraOpts = list(LocalDatabasePath = "~/mydb.csv"),
                              scoreTypes = c("fragScore", "myScore", "myScore2"))

By default ranking is performed with equal weight (i.e. 1) for all scorings. This can be changed by the scoreWeights argument, which should be a vector containing the weights for all scorings following the order of scoreTypes, for instance:

compsMF <- generateCompounds(fGroups, mslists, "metfrag", adduct = "[M+H]+",
                             scoreTypes = c("fragScore" "numberPatents"),
                             scoreWeights = c(1, 2))

Sometimes thousands or more structural candidates are found when annotating a feature group. In this situation processing all these candidates will too involving (especially when external databases are used). To avoid this a default cut-off is set: when the number of candidates exceed a certain amount the search will be aborted and no results will be reported for that feature group. The maximum number of candidates can be set with the maxCandidatesToStop argument. The default value is relative conservative, especially for local databases it may be useful to increase this number.

4.6.3.3 MetFrag error and timeout handling

The use of online databases has the drawback that an error may occur, for instance, as a result of a connection error or when the aforementioned maximum number of candidates is reached (maxCandidatesToStop argument). By default, the processing is restarted if an error has occurred (configured by the errorRetries argument). Similarly, the timeoutRetries and timeout arguments can be used to avoid being ‘stuck’ on obtaining results, for instance, due to an unstable internet connection. If no compounds could be assigned due to an error a warning will be issued. In this case it is best to see what went wrong by manually checking the log files, which by default are stored in the log/metfrag folder.

4.6.3.4 Annotation with the Library algorithm

To use the Library algorithm we first need to load an MS library. Currently, MS libraries in the MSP and MoNA JSON formats are supported. Note that the former format is not so well standardized, and the support in patRoon was mainly tailored for MSP files from MassBank.eu and MoNA. To load the MS library the loadMSLibrary() function is used:

mslibrary <- loadMSLibrary("~/MassBank_NIST.msp", "msp") # MassBank.eu MSP library
mslibrary <- loadMSLibrary("~/MoNA-export-CASMI_2016.msp", "msp") # MoNA MSP library
mslibrary <- loadMSLibrary("~/MoNA-export-MassBank.json", "json") # MoNA JSON library

NOTE Currently it is only possible to load formula annotated MS/MS peaks with the MoNA JSON format.

Once loaded, the MS library can be post-processed with various filtering, subsetting and export functionality, which may be useful for more tailored compound annotation. This is further discussed in the advanced chapter.

The compound annotation is performed with generateCompounds():

compsLib <- generateCompounds(fGroups, mslists, "library", MSLibrary = mslibrary)

# set minimum MS/MS spectral match for candidates to 0.5
compsLib <- generateCompounds(fGroups, mslists, "library", MSLibrary = mslibrary, minSim = 0.5)

4.6.3.5 Formula scoring

Ranking of candidate structures may further be improved by incorporating formula information by using the addFormulaScoring() function:

comps <- addFormulaScoring(coms, formulas, updateScore = TRUE)

Here, corresponding formula and explained fragments will be used to calculate a formulaScore for each candidate. Note that SIRIUS candidates are already based on calculated formulae, hence, running this function on SIRIUS results is less sensible unless scoring from another formula calculation algorithm is desired.

4.6.3.6 Further options and parameters

There are many more options and parameters that affect compound annotation. For a full overview please have a look at the reference manual (e.g. by running ?generateCompounds).

4.6.4 Suspect annotation

The data obtained during the previously described annotation steps can be used to improve a suspect screening workflow. The annotateSuspects() method uses the annotation data to calculate various annotation properties for each suspect, such as their rank in formula/compound candidates, which fragments from the suspect list were matched, and a rough indication of the identification level according to Schymanski et al. (2014)

fGroupsSusp <- annotateSuspects(fGroupsSusp, MSPeakLists = mslists,
                                formulas = formulas, compounds = compounds)

#> Annotating 5 suspects...
#> ================================================================================

The calculation of identification levels is performed by a set of pre-defined rules. The genIDLevelRulesFile() can be used to inspect the default rules or to create your own rules file, which can subsequently passed to annotateSuspects() with the IDFile argument. See ?annotateSuspects for more details on the file format and options. The default identification levels can be summarized as follows:

Level	Description	Rules
1	Target match	Retention time deviates <12 seconds from suspect list. At least `3` (or all if the suspect list contains less) fragments from the suspect list must match.
2a	Good MS/MS library match	Suspect is top ranked in the `compounds` results. The `individualMoNAScore` (MetFrag) or `libMatch` (Library algorithm) is at least `0.9` and no other candidates were matched with the MS library.
3a	Fair library match	The `individualMoNAScore` or `libMatch` is at least 0.4.
3b	Known MS/MS match	At least `3` (or all if the suspect list contains less) fragments from the suspect list must match.
3c	Good in-silico MS/MS match	The annotation MS/MS similarity (`annSimComp` column) is at least `0.7`.
4a	Good formula MS/MS match	Suspect is top ranked formula candidate, annotation MS/MS similarity (`annSimForm` column) is at least `0.7` and isotopic match (`isoScore`) of at least 0.5. The latter two scores are at least `0.2` higher than next best ranked candidate.
4b	Good formula isotopic pattern match	Suspect is top ranked formula candidate and isotopic match (`isoScore`) of at least `0.9` and at least `0.2` higher than next best ranked candidate.
5	Unknown	All else.

In general, the more data provided by the suspect list and to annotateSuspects(), the better identification level estimation works. For instance, when considering the default rules, either the fragments_mz or fragments_formula column is necessary to be able assign a level 3b. Similarly, the suspect list needs retention times (as well as fragment data) to be able to assign level 1. As you can imagine, providing the annotation workflow objects (i.e. MSPeakLists, formulas, compounds) to annotateSuspects() is necessary for calculation of most levels.

The annotateSuspects() function will log decisions for identification level assignments to the log/ sub-directory in the current working directory. This is useful to inspect level assignments and especially useful when you customized any rules.

NOTE: The current identification level rules are only optimized for GenForm and MetFrag annotation algorithms.

4.6.5 Account login for SIRIUS

Recent version of SIRIUS require an active account login to make queries to CSI:FingerID. This is primarily relevant when performing a compound annotation workflow with SIRIUS or a formula annotation workflow with getFingerprints=TRUE, e.g. when predicting toxicities or concentrations.

As a first step, please create an account as described in the SIRIUS documentation: https://v6.docs.sirius-ms.io/account-and-license/.

Then, to login there are two options:

Manually login: either by using the SIRIUS GUI or the CLI. For the latter, see e.g. sirius.exe login --help for more details.
Let patRoon automatically handle logins.

The login parameter for generateCompounds() and generateFormulas() determines how logins are dealt with by patRoon. There are four options:

login=FALSE: no logins are performed and no checks are performed to verify if there is an existing login.
login="check": no logins are performed, but an active login is required to proceed.
login="interactive": if no active login is present, then the username and password will be asked interactively and used to automatically login.
login=c(username="...", password="..."): if no active login is present, then the provided username and password will be used to automatically login.

NOTE: For the fourth option, please don’t provide the login details directly as plain-text for security reasons. See below for proper alternatives.

The first two options are primarily meant for manual login. The function parameter alwaysLogin=TRUE can be set to force a login for the third and fourth options.

The fourth option is primarily useful for e.g. heavy users of SIRIUS or unattended automatic workflows. To securely provide the login details, it is best to store them elsewhere. This webpage provides a detailed overview of how credentials can be safely stored. For instance, you can save the credentials in your .Renviron file and retreieve them when calling generateCompounds():

In your .Renviron file add:

SIRIUS_USERNAME=MY_USERNAME
SIRIUS_PASSWORD=MY_PASSWORD

and then in your R script:

compounds <- generateCompounds(..., login = c(username = Sys.getenv("SIRIUS_USERNAME"),
                                              password = Sys.getenv("SIRIUS_PASSWORD")))

Alternatively, you could use the keyring package, e.g.

install.packages("keyring") # execute in case you don't have keyring installed yet
keyring::key_set("SIRIUS", username = "myaccount@email.com") # execute this once to store the password

compounds <- generateCompounds(..., login = c(username = "myaccount@email.com",
                                              password = keyring::key_get("SIRIUS", "myaccount@email.com"))))

References

Schymanski, Emma L., Junho Jeon, Rebekka Gulde, Kathrin Fenner, Matthias Ruff, Heinz P. Singer, and Juliane Hollender. 2014. “Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence.” Environmental Science and Technology 48 (4): 2097–98. https://doi.org/10.1021/es5002105.