5.1 Inspecting results
Several generic functions exist that can be used to inspect data that is stored in a particular object (e.g. features, compounds etc):
| Generic | Classes | Remarks |
|---|---|---|
length() |
All | Returns the length of the object (e.g. number of features, compounds etc) |
algorithm() |
All | Returns the name of the algorithm used to generate the object. |
groupNames() |
All | Returns all the unique identitifiers (or names) of the feature groups for which this object contains results. |
names() |
featureGroups, components |
Returns names of the feature groups (similar to groupNames()) or components |
show() |
All | Prints general information. |
"[[" / "$" operators |
All | Extract general information, see below. |
as.data.table() / as.data.frame() |
All | Convert data to a data.table or data.frame, see below. |
analysisInfo(), analyses(), replicateGroups() |
features, featureGroups |
Returns the analysis information, analyses or replicate groups for which this object contains data. |
groupInfo() |
featureGroups |
Returns feature group information (m/z and retention time values). |
screenInfo() |
featureGroupsScreening |
Returns information on hits from suspect screening. |
componentInfo() |
components |
Returns information for all components. |
annotatedPeakList() |
formulas, compounds |
Returns a table with annotated mass peaks (see below). |
The common R extraction operators "[[", "$" can be used to obtain data for a particular feature groups, analysis etc:
#> NULL
#> [1] 264836 245372 216560
#> [1] 264836
#> ID mz intensity precursor
#> <int> <num> <num> <lgcl>
#> 1: 5 105.0698 6183.111 FALSE
#> 2: 6 106.0653 7643.556 FALSE
#> 3: 8 107.0728 7760.667 FALSE
#> 4: 15 120.0556 168522.667 TRUE
#> 5: 17 121.0587 13894.667 FALSE
#> 6: 18 121.0884 10032.889 FALSE
#> 7: 19 122.0964 147667.778 FALSE
#> 8: 20 123.0803 36631.111 FALSE
#> 9: 21 123.0996 15482.444 FALSE
#> 10: 22 124.0805 35580.667 FALSE
#> neutral_formula ion_formula neutralMass ion_formula_mz error dbe isoScore
#> <char> <char> <num> <num> <num> <num> <num>
#> 1: C6H5N3 C6H6N3 119.0483 120.0556 1.8 6 0.92461
#> explainedPeaks score neutralMass SMILES
#> <int> <num> <num> <char>
#> 1: 0 2.9919045 119.0483 C1=CC2=NNN=C2C=C1
#> 2: 0 1.2504308 119.0483 C1=CNC2=CN=CN=C21
#> 3: 0 1.2336169 119.0483 C1=CC2=C(N=C1)N=CN2
#> 4: 0 1.2079701 119.0483 C1=CC2=C(C=NN2)N=C1
#> 5: 0 1.1511570 119.0483 C1=CN2C(=CC=N2)N=C1
#> ---
#> 37: 0 0.9541662 119.0483 CC1=CN=C(N=C1)C#N
#> 38: 0 0.9535093 119.0483 CC1=NC(=NC=C1)C#N
#> 39: 0 0.9499092 119.0483 CC1=NN=C(C=C1)C#N
#> 40: 0 0.8128595 119.0483 C1=CC(=[N+]=[N-])C=CC1=N
#> 41: 0 0.7438038 119.0483 C(C#N)C(CC#N)C#N
#> group ret mz isogroup isonr charge
#> <char> <num> <num> <num> <num> <num>
#> 1: M143_R206_64 205.787 143.0700 NA NA NA
#> 2: M159_R208_103 208.280 159.0650 NA NA NA
#> 3: M161_R208_104 207.582 161.0806 NA NA NA
#> 4: M181_R209_159 208.580 181.0469 NA NA NA
A more sophisticated way to obtain data from a workflow object is to use as.data.table() or as.data.frame(). These functions will convert all information within the object to a table (data.table or data.frame) and allow various options to add extra information. An advantage is that this common data format can be used with many other functions within R. The output is in a tidy format.
NOTE If you are not familiar with
data.tableand want to know more see data.table. Briefly, this is a more efficient and largely compatible alternative to the regulardata.frame.
NOTE The
as.data.frame()methods defined inpatRoonsimply convert the results fromas.data.table(), hence, both functions are equal in their usage and are defined for the same object classes.
Some typical examples are shown below.
#> analysis ID ret mz area intensity
#> <char> <char> <num> <num> <num> <num>
#> 1: solvent-pos-1 f_13874967465629495319 13.176 98.97537 4345232.0 391476
#> 2: solvent-pos-1 f_3429221869721733112 7.181 100.11197 797112.1 426956
#> 3: solvent-pos-1 f_10712640495102120078 192.178 100.11211 9609998.0 750532
#> 4: solvent-pos-1 f_13186422858338701184 19.171 100.11217 5784411.0 370376
#> 5: solvent-pos-1 f_10077122379300272915 4.786 100.11220 551723.6 567312
#> ---
#> 2922: standard-pos-3 f_13932917677741629648 318.892 425.18866 666531.5 232636
#> 2923: standard-pos-3 f_7861422627515437412 9.114 427.03242 362024.1 114744
#> 2924: standard-pos-3 f_14988550005726610931 318.892 427.18678 200193.5 77768
#> 2925: standard-pos-3 f_14478742590299124657 382.682 432.23984 217612.9 97648
#> 2926: standard-pos-3 f_17057406572265204390 9.114 433.00457 3086864.0 912920
# Returns group info and intensity values for each feature group
as.data.table(fGroups, average = TRUE) # average intensities for replicates#> group ret mz standard-pos
#> <char> <num> <num> <num>
#> 1: M109_R192_20 191.8717 109.0759 183482.67
#> 2: M111_R330_23 330.4078 111.0439 84598.67
#> 3: M114_R269_25 268.6906 114.0912 85796.00
#> 4: M116_R317_29 316.7334 116.0527 766888.00
#> 5: M120_R268_30 268.4078 120.0554 242256.00
#> ---
#> 137: M316_R363_635 363.4879 316.1741 89904.00
#> 138: M318_R349_638 349.1072 318.1450 83320.00
#> 139: M352_R335_664 334.9403 352.2019 74986.67
#> 140: M407_R239_672 239.3567 407.2227 186568.00
#> 141: M425_R319_676 319.4944 425.1885 214990.67
# As above, but with suspect matches on separate rows and additional screening information
# (select some columns to simplify the output below)
as.data.table(fGroupsSusp, average = TRUE, collapseSuspects = NULL,
onlyHits = TRUE)[, c("group", "susp_name", "susp_compRank", "susp_annSimBoth", "susp_estIDLevel")]#> group susp_name susp_compRank susp_annSimBoth susp_estIDLevel
#> <char> <char> <int> <num> <char>
#> 1: M120_R268_30 1H-benzotriazole 1 0.0000000 4b
#> 2: M137_R249_53 N-Phenyl urea 1 0.6443557 3a
#> 3: M146_R309_68 2-Hydroxyquinoline 2 0.9896892 3a
#> 4: M146_R248_69 2-Hydroxyquinoline NA NA 5
#> 5: M146_R225_70 2-Hydroxyquinoline NA NA 5
#> group type ID mz intensity precursor
#> <char> <char> <int> <num> <num> <lgcl>
#> 1: M120_R268_30 MS 1 100.1120 178952.381 FALSE
#> 2: M120_R268_30 MS 2 102.1277 202359.667 FALSE
#> 3: M120_R268_30 MS 3 114.0912 37647.548 FALSE
#> 4: M120_R268_30 MS 4 115.0752 66685.238 FALSE
#> 5: M120_R268_30 MS 5 120.0554 113335.857 TRUE
#> ---
#> 235: M192_R355_191 MS 51 299.1274 44083.126 FALSE
#> 236: M192_R355_191 MS 52 299.1471 7390.267 FALSE
#> 237: M192_R355_191 MSMS 14 119.0496 588372.444 FALSE
#> 238: M192_R355_191 MSMS 18 120.0524 70273.333 FALSE
#> 239: M192_R355_191 MSMS 31 192.1384 71978.667 TRUE
# Returns all formula candidates for each feature group with scoring
# information, neutral loss etc
as.data.table(formulas)[, 1:6]#> group neutral_formula ion_formula neutralMass ion_formula_mz error
#> <char> <char> <char> <num> <num> <num>
#> 1: M120_R268_30 C6H5N3 C6H6N3 119.0483 120.0556 1.80000000
#> 2: M137_R249_53 C7H8N2O C7H9N2O 136.0637 137.0709 2.90000000
#> 3: M146_R309_68 C9H7NO C9H8NO 145.0528 146.0600 1.66666667
#> 4: M192_R355_191 C12H17NO C12H18NO 191.1310 192.1383 0.03333333
# Returns all compound candidates for each feature group with scoring and other metadata
as.data.table(compounds)[, 1:4]#> group explainedPeaks score neutralMass
#> <char> <int> <num> <num>
#> 1: M120_R268_30 0 2.991905 119.0483
#> 2: M120_R268_30 0 1.250431 119.0483
#> 3: M120_R268_30 0 1.233617 119.0483
#> 4: M120_R268_30 0 1.207970 119.0483
#> 5: M120_R268_30 0 1.151157 119.0483
#> ---
#> 288: M192_R355_191 1 1.367332 191.1310
#> 289: M192_R355_191 1 1.367220 191.1310
#> 290: M192_R355_191 1 1.366424 191.1310
#> 291: M192_R355_191 1 1.364403 191.1310
#> 292: M192_R355_191 1 1.363116 191.1310
# Returns table with all components (including feature group info, annotations etc)
as.data.table(components)[, 1:6]#> name cmp_ret cmp_retsd neutral_mass analysis size
#> <char> <num> <num> <char> <char> <int>
#> 1: CMP1 347.2914 0.0000000 <NA> standard-pos-2 2
#> 2: CMP1 347.2914 0.0000000 <NA> standard-pos-2 2
#> 3: CMP2 349.6328 4.6804985 225.1589/188.20157 standard-pos-3 6
#> 4: CMP2 349.6328 4.6804985 225.1589/188.20157 standard-pos-3 6
#> 5: CMP2 349.6328 4.6804985 225.1589/188.20157 standard-pos-3 6
#> ---
#> 88: CMP29 313.3475 0.3105035 <NA> standard-pos-2 3
#> 89: CMP29 313.3475 0.3105035 <NA> standard-pos-2 3
#> 90: CMP30 268.3430 0.3840764 81.08705 standard-pos-1 3
#> 91: CMP30 268.3430 0.3840764 81.08705 standard-pos-1 3
#> 92: CMP30 268.3430 0.3840764 81.08705 standard-pos-1 3
Finally, the annotatedPeakList() function is useful to inspect annotation results for a formula or compound candidate:
# formula annotations for the first formula candidate of feature group M137_R249_53
annotatedPeakList(formulas, index = 1, groupName = "M137_R249_53",
MSPeakLists = mslists)#> ID mz intensity precursor ion_formula dbe ion_formula_mz error neutral_loss annotated
#> <int> <num> <num> <lgcl> <char> <num> <num> <num> <char> <lgcl>
#> 1: 2 94.06500 9406.111 FALSE C6H8N 3.5 94.06513 1.30 CHNO TRUE
#> 2: 6 98.97522 2212.000 FALSE <NA> NA NA NA <NA> FALSE
#> 3: 7 105.06971 1662.111 FALSE <NA> NA NA NA <NA> FALSE
#> 4: 14 120.04434 7176.222 FALSE C7H6NO 5.5 120.04439 0.40 H3N TRUE
#> 5: 19 122.07222 2246.000 FALSE <NA> NA NA NA <NA> FALSE
#> 6: 21 135.08004 1565.556 FALSE <NA> NA NA NA <NA> FALSE
#> 7: 23 137.07039 5348.667 TRUE C7H9N2O 4.5 137.07094 3.35 TRUE
#> 8: 24 137.09572 2026.889 FALSE <NA> NA NA NA <NA> FALSE
#> 9: 26 138.09116 12356.667 FALSE <NA> NA NA NA <NA> FALSE
#> 10: 27 139.07503 5020.667 FALSE <NA> NA NA NA <NA> FALSE
# compound annotation for first candidate of feature group M137_R249_53
annotatedPeakList(compounds, index = 1, groupName = "M137_R249_53",
MSPeakLists = mslists)#> ID mz intensity precursor ion_formula ion_formula_MF neutral_loss score annotated
#> <int> <num> <num> <lgcl> <char> <char> <char> <num> <lgcl>
#> 1: 2 94.06500 9406.111 FALSE C6H8N [C6H6N+H]+H+ CHNO 405 TRUE
#> 2: 6 98.97522 2212.000 FALSE <NA> <NA> <NA> NA FALSE
#> 3: 7 105.06971 1662.111 FALSE <NA> <NA> <NA> NA FALSE
#> 4: 14 120.04434 7176.222 FALSE C7H6NO [C7H6NO]+ H3N 305 TRUE
#> 5: 19 122.07222 2246.000 FALSE <NA> <NA> <NA> NA FALSE
#> 6: 21 135.08004 1565.556 FALSE <NA> <NA> <NA> NA FALSE
#> 7: 23 137.07039 5348.667 TRUE <NA> <NA> <NA> NA FALSE
#> 8: 24 137.09572 2026.889 FALSE <NA> <NA> <NA> NA FALSE
#> 9: 26 138.09116 12356.667 FALSE <NA> <NA> <NA> NA FALSE
#> 10: 27 139.07503 5020.667 FALSE <NA> <NA> <NA> NA FALSE
More advanced examples for these functions are shown below.
# Feature table, can also be accessed by numeric index
fList[[1]]
mslists[["standard-1", "M120_R268_30"]] # feature data (instead of feature group averaged)
formulas[[1, "M120_R268_30"]] # feature data (if available, i.e. calculateFeatures=TRUE)
components[["CMP1", 1]] # only for first feature group in component
as.data.frame(fList) # classic data.frame format, works for all objects
as.data.table(fGroups) # return non-averaged intensities (default)
as.data.table(fGroups, features = TRUE) # include feature information
as.data.table(mslists, averaged = FALSE) # peak lists for each feature
as.data.table(mslists, fGroups = fGroups) # add feature group information
as.data.table(formulas, countElements = c("C", "H")) # include C/H counts (e.g. for van Krevelen plots)
# add various information for organic matter characterization (common elemental
# counts/ratios, classifications etc)
as.data.table(formulas, OM = TRUE)
as.data.table(compounds, fGroups = fGroups) # add feature group information
as.data.table(compounds, fragments = TRUE) # include information of all annotated fragments
annotatedPeakList(formulas, index = 1, groupName = "M120_R268_30",
MSPeakLists = mslists, onlyAnnotated = TRUE) # only include annotated peaks
annotatedPeakList(compounds, index = 1, groupName = "M120_R268_30",
MSPeakLists = mslists, formulas = formulas) # include formula annotations