8.9 Feature regression analysis

Some basic support in patRoon is available to perform simple linear regression on feature intensities vs given experimental conditions. Examples of such conditions are dilution factor, sampling time or initial concentration of a parent in a degradation experiment. By testing if there is a significant linearity, features of interest can be isolated in a relative easy way. Originally, this functionality was implemented as a very basic method to perform rough calculations of concentrations. However, the next sections describes a much better way by using the MS2Quant package. Regardless, this functionality still uses ‘concentrations’ as terminology for the experimental conditions of interest. The conditions are specified in the conc column of the analysis information, for instance:

# obtain analysis information as usual, but add some experimental parameters of interest (dilution, time etc).
# The blanks are set to NA, whereas the standards are set to increasing levels.
anaInfo <- generateAnalysisInfo(paths = patRoonData::exampleDataPath(),
                                groups = c(rep("solvent", 3), rep("standard", 3)),
                                blanks = "solvent",
                                concs = c(NA, NA, NA, 1, 2, 3))

If no experimental conditions are available for a particular analysis then the conc value should be NA. For these analyses the experimental condition will be calculated using the regression model obtained from the other analyses.

The as.data.table() function (or as.data.frame()) can then be used to calculate regression data:

# use areas for quantitation and make sure that feature data is reported
# (only relevant columns are shown for clarity)
as.data.table(fGroups, areas = TRUE, features = TRUE, regression = TRUE)

#>              group  conc        RSQ intercept  slope   conc_reg
#>             <char> <num>      <num>     <num>  <num>      <num>
#>   1:  M109_R192_20     1 0.71446367 193338.67  -4928  1.3649892
#>   2:  M109_R192_20     2 0.71446367 193338.67  -4928  1.2700216
#>   3:  M109_R192_20     3 0.71446367 193338.67  -4928  3.3649892
#>   4:  M111_R330_23     1 0.08902714  85338.67   -370 -0.8468468
#>   5:  M111_R330_23     2 0.08902714  85338.67   -370  5.6936937
#>  ---                                                           
#> 419: M407_R239_672     2 0.99560719 210036.00 -11734  2.0767002
#> 420: M407_R239_672     3 0.99560719 210036.00 -11734  2.9616499
#> 421: M425_R319_676     1 0.46488086 193198.67  10896  1.6194322
#> 422: M425_R319_676     2 0.46488086 193198.67  10896  0.7611356
#> 423: M425_R319_676     3 0.46488086 193198.67  10896  3.6194322

The calculated experimental conditions are stored in the conc_reg column (this column is only present if features=TRUE). In addition, the table also contains other regression data such as RSQ, intercept and slope. To perform basic trend analysis the RSQ (i.e. R squared) can be used:

fGroupsTab <- as.data.table(fGroups, areas = TRUE, features = FALSE, regression = TRUE)
# subset fGroups with reasonable correlation
increasingFGroups <- fGroups[, fGroupsTab[RSQ >= 0.8, group]]