4.3 Features
Collecting features from the analyses consists of finding all features, grouping them across analyses (optionally after retention time alignment), and if desired suspect screening:
4.3.1 Finding and grouping features
Several algorithms are available for finding features. These are listed in the table below alongside their usage and general remarks.
Algorithm | Usage | Remarks |
---|---|---|
OpenMS | findFeatures(algorithm = "openms", ...) |
Uses the FeatureFinderMetabo algorithm |
XCMS | findFeatures(algorithm = "xcms", ...) |
Uses xcms::xcmsSet() function |
XCMS (import) | importFeatures(algorithm = "xcms", ...) |
Imports an existing xcmsSet object |
XCMS3 | findFeatures(algorithm = "xcms3", ...) |
Uses xcms::findChromPeaks() from the new XCMS3 interface |
XCMS3 (import) | importFeatures(algorithm = "xcms3", ...) |
Imports an existing XCMSnExp object |
enviPick | findFeatures(algorithm = "envipick", ...) |
Uses enviPick::enviPickwrap() |
KPIC2 | findFeatures(algorithm = "kpic2", ...) |
Uses the KPIC2 R package |
KPIC2 (import) | importFeatures(algorithm = "kpic2", ...) |
Imports features from KPIC2 |
SIRIUS | findFeatures(algorithm = "sirius", ...) |
Uses SIRIUS to find features |
SAFD | findFeatures(algorithm = "safd", ...) |
Uses the SAFD algorithm (experimental) |
DataAnalysis | findFeatures(algorithm = "bruker", ...) |
Uses Find Molecular Features from DataAnalysis (Bruker only) |
Most often the performance of these algorithms heavily depend on the data and parameter settings that are used. Since obtaining a good feature dataset is crucial for the rest of the workflow, it is highly recommended to experiment with different settings (this process can also be automated, see the feature optimization section for more details). Some common parameters to look at are listed in the table below. However, there are many more parameters that can be set, please see the reference documentation for these (e.g. ?findFeatures
).
Algorithm | Common parameters |
---|---|
OpenMS | noiseThrInt , chromSNR , chromFWHM , mzPPM , minFWHM , maxFWHM (see ?findFeatures ) |
XCMS / XCMS3 | peakwidth , mzdiff , prefilter , noise (assuming default centWave algorithm, see ?findPeaks.centWave / ?CentWaveParam ) |
enviPick | dmzgap , dmzdens , drtgap , drtsmall , drtdens , drtfill , drttotal , minpeak , minint , maxint (see ?enviPickwrap ) |
KPIC2 | kmeans , level , min_snr (see ?findFeatures and ?getPIC / ?getPIC.kmeans ) |
SIRIUS | The sirius algorithm is currently parameterless |
SAFD | mzRange , maxNumbIter , resolution , minInt (see ?findFeatures ) |
DataAnalysis | See Find -> Parameters… -> Molecular Features in DataAnalysis. |
NOTE Support for SAFD is still experimental and some extra work is required to set everything up. Please see the reference documentation for this algorithm (
?findFeatures
).
NOTE DataAnalysis feature settings have to be configured in DataAnalysis prior to calling
findFeatures()
.
Similarly, for grouping features across analyses several algorithms are supported.
Algorithm | Usage | Remarks |
---|---|---|
OpenMS | groupFeatures(algorithm = "openms", ...) |
Uses the FeatureLinkerUnlabeled algorithm (and MapAlignerPoseClustering for retention alignment) |
XCMS | groupFeatures(algorithm = "xcms", ...) |
Uses xcms::group() xcms::retcor() functions |
XCMS (import) | importFeatureGroupsXCMS(...) |
Imports an existing xcmsSet object. |
XCMS3 | groupFeatures(algorithm = "xcms3", ...) |
Uses xcms::groupChromPeaks() and xcms::adjustRtime() functions |
XCMS3 (import) | importFeatureGroupsXCMS3(...) |
Imports an existing XCMSnExp object. |
KPIC2 | groupFeatures(algorithm = "kpic2", ...) |
Uses the KPIC2 package |
KPIC2 (import) | importFeatureGroupsKPIC2(...) |
Imports a PIC set object |
SIRIUS | groupFeatures(anaInfo, algorithm = "sirius") |
Finds and groups features with SIRIUS |
ProfileAnalysis | importFeatureGroups(algorithm = "brukerpa", ...) |
Import .csv file exported from Bruker ProfileAnalysis |
TASQ | importFeatureGroups(algorithm = "brukertasq", ...) |
Imports a Global result table (exported to Excel file and then saved as .csv file) |
NOTE: Grouping features with the
sirius
algorithm will perform both finding and grouping features with SIRIUS. This algorithm cannot work with features from another algorithm.
Just like finding features, each algorithm has their own set of parameters. Often the defaults are a good start but it is recommended to have look at them. See ?groupFeatures
for more details.
When using the XCMS algorithms both the ‘classical’ interface and latest XCMS3
interfaces are supported. Currently, both interfaces are mostly the same regarding functionalities and implementation. However, since future developments of XCMS are primarily focused the latter this interface is recommended.
Some examples of finding and grouping features are shown below.
# The anaInfo variable contains analysis information, see the previous section
# Finding features
fListOMS <- findFeatures(anaInfo, "openms") # OpenMS, with default settings
fListOMS2 <- findFeatures(anaInfo, "openms", noiseThrInt = 500, chromSNR = 10) # OpenMS, adjusted minimum intensity and S/N
fListXCMS <- findFeatures(anaInfo, "xcms", ppm = 10) # XCMS
fListXCMSImp <- importFeatures(anaInfo, "xcms", xset) # import XCMS xcmsSet object
fListXCMS3 <- findFeatures(anaInfo, "xcms3", CentWaveParam(peakwidth = c(5, 15))) # XCMS3
fListEP <- findFeatures(anaInfo, "envipick", minint = 1E3) # enviPick
fListKPIC2 <- findFeatures(anaInfo, "kpic2", kmeans = TRUE, level = 1E4) # KPIC2
fListSIRIUS <- findFeatures(anaInfo, "sirius") # SIRIUS
# Grouping features
fGroupsOMS <- groupFeatures(fListOMS, "openms") # OpenMS grouping, default settings
fGroupsOMS2 <- groupFeatures(fListOMS2, "openms", rtalign = FALSE) # OpenMS grouping, no RT alignment
fGroupsOMS3 <- groupFeatures(fListXCMS, "openms", maxGroupRT = 6) # group XCMS features with OpenMS, adjusted grouping parameter
# group enviPick features with XCMS3, disable minFraction
fGroupsXCMS <- groupFeatures(fListEP, "xcms3",
xcms::PeakDensityParam(sampleGroups = analInfo$group,
minFraction = 0))
# group with KPIC2 and set some custom grouping/aligning parameters
fGroupsKPIC2 <- groupFeatures(fListKPIC2, "kpic2", groupArgs = list(tolerance = c(0.002, 18)),
alignArgs = list(move = "loess"))
fGroupsSIRIUS <- groupFeatures(anaInfo, "sirius") # find/group features with SIRIUS
4.3.2 Suspect screening
After features have been grouped a so called suspect screening step may be performed to find features that may correspond to suspects within a given suspect list. The screenSuspects()
function is used for this purpose, for instance:
suspects <- data.frame(name = c("1H-benzotriazole", "N-Phenyl urea", "2-Hydroxyquinoline"),
mz = c(120.0556, 137.0709, 146.0600))
fGroupsSusp <- screenSuspects(fGroups, suspects)
4.3.2.1 Suspect list format
The example above has a very simple suspect list with just three compounds. The format of the suspect list is quite flexible, and can contain the following columns:
name
: The name of the suspect. Mandatory and should be unique and file-name compatible (if not, the name will be automatically re-named to make it compatible).rt
: The retention time in seconds. Optional. If specified any feature groups with a different retention time will not be considered to match suspects.mz
,SMILES
,InChI
,formula
,neutralMass
: at least one of these columns must hold data for each suspect row. Themz
column specifies the ionized mass of the suspect. If this is not available then data from any of the other columns is used to determine the suspect mass.adduct
: The adduct of the suspect. Optional. Set this if you are sure that a suspect should be matched by a particular adduct ion and no data in themz
column is available.fragments_mz
andfragments_formula
: optional columns that may assist suspect annotation.
In most cases a suspect list is best made as a csv
file which can then be imported with e.g. the read.csv()
function. This is exactly what happen when you specify a suspect list when using the newProject()
function.
Quite often, the ionized masses are not readily available and these have to be calculated. In this case, data in any of the SMILES
/InChI
/formula
/neutralMass
columns should be provided. Whenever possible, it is strongly recommended to fill in SMILES
column (or InChI
), as this will assist annotation. Applying this to the above example:
suspects <- data.frame(name = c("1H-benzotriazole", "N-Phenyl urea", "2-Hydroxyquinoline"),
SMILES = c("[nH]1nnc2ccccc12", "NC(=O)Nc1ccccc1", "Oc1ccc2ccccc2n1"))
fGroupsSusp <- screenSuspects(fGroups, suspects, adduct = "[M+H]+")
#> Calculating/Validating chemical data... Done!
#> ================================================================================
#> Found 3/3 suspects (100.00%)
NOTE: It is highly recommended to install OpenBabel to automatically validate and amend chemical properties such as SMILES, InChI, formulae etc in the suspect list.
Since suspect matching now occurs by the neutral mass it is required that the adduct information for the feature groups are set. This is done either by setting the adduct
function argument to screenSuspects
or by feature group adduct annotations.
Finally, when the adduct is known for a suspect it can be specified in the suspect list:
# Aldicarb is measured with a sodium adduct.
suspects <- data.frame(name = c("1H-benzotriazole", "N-Phenyl urea", "Aldicarb"),
SMILES = c("[nH]1nnc2ccccc12", "NC(=O)Nc1ccccc1", "CC(C)(C=NOC(=O)NC)SC"),
adduct = c("[M+H]+", "[M+H]+", "[M+Na]+"))
fGroupsSusp <- screenSuspects(fGroups, suspects)
To summarize:
- If a suspect has data in the
mz
column it will be directly matched with the m/z value of a feature group. - Otherwise, if the suspect has data in the
adduct
column, them/z
value for the suspect is calculated from its neutral mass and the adduct and then matched with them/z
of a feature group. - Otherwise, suspects and feature groups are matched by their the neutral mass.
The fragments_mz
and fragments_formula
columns in the suspect list can be used to specify known fragments for a suspect, which can help suspect annotation. The former specifies the ionized m/z of known MS/MS peaks, whereas the second specifies known formulas. Multiple values can be given by separating them with a semicolon:
4.3.2.2 Removing feature groups without hits
Note that any feature groups that were not matched to a suspect are not removed by default. If you want to remove these, you can use the onlyHits
parameter:
The advantage of removing non-hits is that it may significantly reduce the complexity of your dataset. On the other hand, retaining all features allows you to mix a full non-target analysis with a suspect screening workflow. The filter()
function (discussed here) can also be used to remove feature groups without a hit at a later stage.
4.3.2.3 Combining screening results
The amend
function argument to screenSuspects
can be used to combine screening results from different suspect lists.
fGroupsSusp <- screenSuspects(fGroups, suspects)
fGroupsSusp <- screenSuspects(fGroupsSusp, suspects2, onlyHits = TRUE, amend = TRUE)
In this example the suspect lists defined in suspects
and suspects2
are both used for screening. By setting amend=TRUE
the original screening results (i.e. from suspects
) are preserved. Note that onlyHits
should only be set in the final call to screenSuspects
to ensure that all feature groups are screened.