8.4 Chromatographic peak qualities
The algorithms used by findFeatures
detect chromatographic peaks automatically to find the features. However, it is common that not all detected features have ‘proper’ chromatographic peaks, and some features could be just noise. The MetaClean R
package supports various quality measures for chromatographic peaks. The quality measures include Gaussian fit, symmetry, sharpness and others. In addition, MetaClean averages all feature data for each feature group and adds a few additional group specific quality measures (e.g. retention time consistency). Please see Chetnik, Petrick, and Pandey (2020) for more details. The calculations are integrated into patRoon
, and are easily performed with the calculatePeakQualities()
generic function.
#> Verifying if your data is centroided... Done!
#> Calculating feature peak qualities and scores...
#> Verifying if your data is centroided... Done!
#> Calculating feature peak qualities and scores...
#> Verifying if your data is centroided... Done!
#> Calculating group peak qualities and scores...
#> ================================================================================
Most often the featureGroups
method is only used, unless you want to filter features (discussed below) prior to grouping.
An extension in patRoon
is that the qualities are used to calculate peak scores. The score for each quality measure is calculated by normalizing and scaling the values into a 0-1
range, where zero is the worst and one the best. Note that most scores are relative, hence, the values should only be used to compare features among each other. Finally, a totalScore
is calculated which sums all individual scores and serves as a rough overall score indicator for a feature (group).
The qualities and scores are easily obtained with the as.data.table()
function.
#> GaussianSimilarityScore SharpnessScore TPASRScore ZigZagScore totalScore
#> <num> <num> <num> <num> <num>
#> 1: 0.6314046 3.443351e-02 0.9956949 0.9103221 6.302180
#> 2: 0.9633994 9.900530e-10 0.9944988 0.3565674 6.513205
#> 3: 0.3613087 7.565147e-10 0.8006569 0.9999449 5.651379
#> 4: 0.9151027 8.600747e-03 0.9405262 0.9637153 5.892201
#> 5: 0.3676623 1.000000e+00 0.9907657 0.8435805 5.825267
# the qualities argument is necessary to include the scores.
# valid values are: "quality", "score" or "both"
as.data.table(fGroups, qualities = "both")[1:5, 25:29]
#> TPASRScore ZigZagScore ElutionShiftScore RetentionTimeCorrelationScore totalScore
#> <num> <num> <num> <num> <num>
#> 1: 0.7305554 0.9962254 0.8421657 0.9955769 7.932541
#> 2: 0.0000000 0.9744541 0.9960804 0.7746038 6.029360
#> 3: 0.6140008 0.9171568 0.9015949 0.9776651 7.480675
#> 4: 0.8227904 0.8907734 0.9403958 0.9963785 8.451631
#> 5: 0.9848653 0.8667116 0.5754979 0.9984902 8.740135
The feature quality values can also be reviewed interactively with reports generated with report
(see Reporting) and with checkFeatures
(see here). The filter
function can be used filter out low scoring features and feature groups:
# only keep features with at least 0.3 Modality score and 0.5 symmetry score
fList <- filter(fList, qualityRange = list(ModalityScore = c(0.3, Inf),
SymmetryScore = c(0.5, Inf)))
# same as above
fGroups <- filter(fGroups, featQualityRange = list(ModalityScore = c(0.3, Inf),
SymmetryScore = c(0.5, Inf)))
# filter group averaged data
fGroups <- filter(fGroups, groupQualityRange = list(totalScore = c(0.5, Inf)))
8.4.1 Applying machine learning with MetaClean
An important feature of MetaClean is to use the quality measures to train a machine learning model to automatically recognize ‘good’ and ‘bad’ features. patRoon
provides a few extensions to simplify training and using a model. Furthermore, while MetaClean
was primarily designed to work with XCMS, the extensions of patRoon
allow the usage of data from all the algorithms supported by patRoon
.
The getMCTrainData
function can be used to convert data from a feature check session to training data that can be used by MetaClean. This allows you to use interactively select good/bad peaks. The workflow looks like this:
# untick the 'keep' checkbox for all 'bad' feature groups
checkFeatures(fGroupsTrain, "train_session.yml")
# get train data. This gives comparable data as MetaClean::getPeakQualityMetrics()
trainData <- getMCTrainData(fGroupsTrain, "train_session.yml")
# use train data with MetaClean with MetaClean::runCrossValidation(),
# MetaClean::getEvaluationMeasures(), MetaClean::trainClassifier() etc
# --> see the MetaClean vignette for details
Once you have created a model with MetaClean it can be used with the predictCheckFeaturesSession()
function:
This will generate another check session file: all the feature groups that are considered good will be with a ‘keep’ state, the others without. As described elsewhere, the checkFeatures
function is used to review the results from a session and the filter
function can be used to remove unwanted feature groups. Note that calculatePeakQualitites()
must be called before getMCTrainData
/predictCheckFeaturesSession
can be used.
NOTE
MetaClean
only predicts at the feature group level. Thus, only the kept feature groups from a feature check session will be used for training, and any indivual features that were marked as removed will be ignored.