9.11 Feature regression analysis

Basic support in patRoon is available to perform simple linear regression (using y=ax+b with a the slope b the intercept) on feature intensities vs given x values. The x values typically are experimental conditions such as sampling time or initial concentration of a parent in a degradation experiment. The linearity could then be used as a way to prioritize features (as performed in Helmus et al. (2025)). Originally, this functionality was implemented as a very basic method to perform rough calculations of concentrations. However, the next section describes a much better way by using the MS2Quant package.

The x-values should be specified as metadata in the analysis information, for instance:

# create analysis information: for demonstrative purposes we just base it on the example data
anaInfoRegr <- anaInfo
anaInfoRegr$experiment <- c("UV", "UV", "UV", "H2O2", "H2O2", "H2O2")
anaInfoRegr$exposure <- c(0, 30, 60, 0, 30, 60) # time in minutes
anaInfoRegr[, c("analysis", "experiment", "exposure")]
#>         analysis experiment exposure
#> 1  solvent-pos-1         UV        0
#> 2  solvent-pos-2         UV       30
#> 3  solvent-pos-3         UV       60
#> 4 standard-pos-1       H2O2        0
#> 5 standard-pos-2       H2O2       30
#> 6 standard-pos-3       H2O2       60

The x-values can be set to NA in case no experimental conditions are available; the regression properties will be calculated from all non-NA data.

The as.data.table() function (or as.data.frame()) can then be used to calculate regression data:

# use areas for regression calculation and make sure that feature data is reported
# the exposure metadata is used as x values, the experiment metadata is used to calculate the regression for analyses groups
# (only relevant columns are shown for clarity)
as.data.table(fGroupsRegr, areas = TRUE, features = TRUE,
              regression = "exposure", regressionBy = "experiment")[, .(analysis, group, x_reg, RSQ, intercept, slope, p)]
#>             analysis        group     x_reg        RSQ intercept     slope          p
#>               <char>       <char>     <num>      <num>     <num>     <num>      <num>
#>    1:  solvent-pos-1    M99_R14_1 -765.0052 0.99525396   4351423  5176.367 0.04389244
#>    2:  solvent-pos-2    M99_R14_1 -775.8074 0.99525396   4351423  5176.367 0.04389244
#>    3:  solvent-pos-3    M99_R14_1 -778.7245 0.99525396   4351423  5176.367 0.04389244
#>    4: standard-pos-1    M99_R14_1  710.7070 0.90164426   4814974 -6310.333 0.20308172
#>    5: standard-pos-2    M99_R14_1  716.6610 0.90164426   4814974 -6310.333 0.20308172
#>   ---                                                                                
#> 2922:  solvent-pos-2 M433_R10_680 -538.6387 0.99048530   2595803  3329.317 0.06219692
#> 2923:  solvent-pos-3 M433_R10_680 -534.8049 0.99048530   2595803  3329.317 0.06219692
#> 2924: standard-pos-1 M433_R10_680 1635.3435 0.06340353   3078297 -1303.883 0.83795459
#> 2925: standard-pos-2 M433_R10_680 1716.0164 0.06340353   3078297 -1303.883 0.83795459
#> 2926: standard-pos-3 M433_R10_680 1660.7139 0.06340353   3078297 -1303.883 0.83795459

The regressionBy argument in the above example is set to ensure that regression is calculated for each experiment separately. In sets workflows you can set regressionBy="set" to perform the calculations per set. The regressionBy argument can be omitted if the regression should be calculated for all analyses together. The x_reg output column stores the x values calculated by the regression model (only present if features=TRUE). Other regression properties such as the RSQ, slope and p can be used as a basic trend analysis for prioritization:

fGroupsTab <- as.data.table(fGroupsRegr, areas = TRUE, features = TRUE,
                            regression = "exposure", regressionBy = "experiment")
# subset features that appear to increase in intensity with longer UV exposure
increasingFGroups <- fGroupsRegr[, fGroupsTab[RSQ >= 0.8 & p < 0.05 & slope > 0, group]]

The plotInt() function can also work with regression data and is helpful for visualization purposes.

# plot regression for a specific feature group
plotInt(fGroupsRegr[, 13], areas = TRUE, regression = TRUE, xBy = "exposure",
        groupBy = "experiment", showLegend = TRUE)

For more details, see the reference manual for the feature groups methods for as.data.table() and plotInt() (?as.data.table and ?plotInt).

References

Helmus, Rick, Ingrida Bagdonaite, Pim de Voogt, et al. 2025. “Comprehensive Mass Spectrometry Workflows to Systematically Elucidate Transformation Processes of Organic Micropollutants: A Case Study on the Photodegradation of Four Pharmaceuticals.” Environmental Science & Technology 59 (7): 3723–36. https://doi.org/10.1021/acs.est.4c09121.