5.6 MS similarity

The spectral similarity is used to compare spectra from different features. For this purpose the spectrumSimilarity function can be used. This function operates on MS peak lists, and accepts the following function arguments:

Argument Remarks
MSPeakLists The MS peak lists object from which peak lists data should be taken.
groupName1, groupName2 The name(s) of the first and second feature group(s) to compare
analysis1, analysis2 The analysis names of the data to be compared. Set this when feature data (instead of feature group data) should be compared.
MSLevel The MS level: 1 or 2 for MS and MS/MS, respectively.
specSimParams Parameters that define how similarities are calculated.
NAToZero If TRUE then NA values are converted to zeros. NA values are reported if a comparison cannot be made because of missing peak list data.

The specSimParams argument defines the parameters for similarity calculations. It is a list, and the default values are obtained with the getDefSpecSimParams() function:

getDefSpecSimParams()
#> $method
#> [1] "cosine"
#> 
#> $removePrecursor
#> [1] FALSE
#> 
#> $mzWeight
#> [1] 0
#> 
#> $intWeight
#> [1] 1
#> 
#> $absMzDev
#> [1] 0.005
#> 
#> $relMinIntensity
#> [1] 0.05
#> 
#> $minPeaks
#> [1] 1
#> 
#> $shift
#> [1] "none"
#> 
#> $setCombineMethod
#> [1] "mean"

The method field describes the calculation measure: this is either "cosine" or "jaccard".

The shift field is primarily useful when comparing MS/MS data and defines if and how a spectral shift should be performed prior to similarity calculation:

  • "none": The default, no shifting is performed.
  • "precursor" The mass difference between the precursor mass of both spectra (i.e. the feature mass) is first calculated. This difference is then subtracted from each of the mass peaks of the second spectrum. This shifting increases similarity if the MS fragmentation process itself occurs similarly (i.e. if both features show similar neutral losses).
  • "both” This combines both shifting methods: first peaks are aligned that have the same mass, then the precursor strategy is applied for the remaining mass peaks. This shifting method yields higher similarities if either fragment masses or neutral losses are similar.

To override a default setting, simply pass it as an argument to getDefSpecSimParams:

getDefSpecSimParams(shift = "both")

For more details on the various similarity calculation parameters see the reference manual (?getDefSpecSimParams).

Some examples are shown below:

# similarity between MS spectra with default parameters
spectrumSimilarity(mslists, groupName1 = "M120_R268_30", groupName2 = "M137_R249_53")
#> [1] 0.4088499
# similarity between MS/MS spectra with default parameters
spectrumSimilarity(mslists, groupName1 = "M120_R268_30", groupName2 = "M192_R355_191",
                   MSLevel = 2)
#> [1] 0.08589848
# As above, with jaccard calculation
spectrumSimilarity(mslists, groupName1 = "M120_R268_30", groupName2 = "M192_R355_191",
                   MSLevel = 2, specSimParams = getDefSpecSimParams(method = "jaccard"))
#> [1] 0.1111111
# With shifting
spectrumSimilarity(mslists, groupName1 = "M120_R268_30", groupName2 = "M192_R355_191",
                   MSLevel = 2, specSimParams = getDefSpecSimParams(shift = "both"))
#> [1] 0.08589848

The spectrumSimilarity function can also be used to calculate multiple similarities. Simply specify multiple feature group names for the groupNameX parameters. Alternatively, if you want to compare the same set of feature groups with each other pass their names only as the groupName1 parameter:

# compare two pairs
spectrumSimilarity(mslists,
                   groupName1 = c("M120_R268_30", "M137_R249_53"),
                   groupName2 = c("M146_R309_68", "M192_R355_191"),
                   MSLevel = 2, specSimParams = getDefSpecSimParams(shift = "both"))
#>              M146_R309_68 M192_R355_191
#> M120_R268_30     0.520052    0.08589848
#> M137_R249_53     0.197720    0.03372542
# compare all
spectrumSimilarity(mslists, groupName1 = groupNames(mslists),
                   MSLevel = 2, specSimParams = getDefSpecSimParams(shift = "both"))
#>               M120_R268_30 M137_R249_53 M146_R309_68 M192_R355_191
#> M120_R268_30    1.00000000   0.20406381   0.52005204    0.08589848
#> M137_R249_53    0.20406381   1.00000000   0.19772004    0.03372542
#> M146_R309_68    0.52005204   0.19772004   1.00000000    0.08524785
#> M192_R355_191   0.08589848   0.03372542   0.08524785    1.00000000