4.3 Sample analyses
In patRoon a sample analysis, or analysis, refers to a single HRMS measurement of a sample. The raw data for an analysis is typically stored in different file types and file formats, which are discussed in the next section. The analysis information informs patRoon which analyses should be processed, where to find the raw data and is used to store any other metadata. The data pre-treatment section describes how to convert and prepare the raw data.
4.3.1 Analysis file types and formats and the msdata interface
In patRoon a distinction is made between four types of raw data files:
- raw: the original raw data files from the HRMS instrument, with formats such as
.raw(Thermo, Waters) or.d(Bruker, Agilent). - centroid: exported and centroided data files in the
mzMLormzXMLformat. - profile: exported but not centroided (i.e. profile) data files in
mzMLormzXMLformats. - ims: exported ion mobility HRMS data files in the
mzMLformat.
Unfortunately, algorithms within the workflow may require different file types/formats, and it is often necessary to convert raw data to one or more other file types/formats. However, for ‘classical’ (non_IMS) workflows it is often sufficient to convert raw data to centroid data in mzML format. In patRoon the choice of file type and format is primarily based on:
- The feature detection algorithm that is used. An overview of requirements is listed in the feature detection section.
- The internal code of
patRoonto process raw data. This is called themsdatainterface.
The msdata interface is used throughout many operations within patRoon, such as loading mass spectra for feature annotation and generating extracted ion chromatograms (EICs) for plotting and reporting data. The msdata interface itself supports different backends, each of which support different file types and formats. By default the most suitable backend is chosen automatically, depending on the available raw data and which backends are available on your system. The currently supported backends are:
| Backend | Supported file types and formats |
|---|---|
"opentims" |
Uses OpenTIMS for highly efficient reading of raw Bruker TIMS data (only available on Windows and Linux). Requires the Bruker TDF-SDK, see the Installation chapter. |
"mzr" |
uses mzR to read centroid files in the mzML and mzXML formats. This was always used before patRoon 3.0. |
"mstoolkit" |
Uses mstoolkit to read ims files (mzML format) and centroid files in the mzML and mzXML formats. Requires the Rmstoolkitlib R package (see the Installation chapter). |
"streamcraft" |
Uses StreamCraft to read ims files (mzML format) and centroid files in the mzML and mzXML formats. |
NOTE The piek feature detection algorithm uses the
msdatainterface directly and therefore supports a wide range of raw data file types and formats.
See the reference manual for more details on the msdata interface (?msdata).
4.3.2 Analysis information
In patRoon, the analysis information describes the analyses that are to be processed, where they are located and holds any metadata such as replicate information. The analysis information should be a data.frame and is often stored in a variable called anaInfo (of course you are free to choose a different name!).
The analysis information table has a few mandatory columns:
- path_raw,path_centroid,path_profile,path_ims: the directory path to the analyses in the raw, centroided, profile and ims formats, respectively. See the previous section for details on the file types. Leave empty if the file type is not present.
- analysis: the name of the analysis. This should be the file name without file extension and without directory path (e.g.
C:\\MyAnalysis\\sample1.dbecomessample1). Each value in theanalysiscolumn must be unique. - replicate: to which replicate the analysis belongs. The analysis which are replicates of each other get the same name.
- blank: which replicate should be used for blank subtraction. Can be left empty if no subtraction is desired.
If a workflow requires multiple file formats of a same file type, e.g. centroided mzML and mzXML files, then simply store both file formats in the directory specified in the path_XXX column. If data needs to be exported (discussed in the next section), simply assign its destination path to the respective path_XXX column.
The analysis information table can be manually constructed in R (e.g. through import of an CSV file), through a graphical interface with newProject() (discussed previously) or automatically by the generateAnalysisInfo() function. Here is an example of the latter for the example data in the patRoonData package:
# Take example data from patRoonData package (triplicate solvent blank + triplicate standard)
generateAnalysisInfo(fromCentroid = patRoonData::exampleDataPath(),
replicate = c(rep("solvent-pos", 3), rep("standard-pos", 3)),
blank = "solvent-pos")#> analysis path_centroid path_raw path_profile path_ims replicate blank
#> 1 solvent-pos-1 /usr/local/lib/R/site-library/patRoonData/extdata/pos solvent-pos solvent-pos
#> 2 solvent-pos-2 /usr/local/lib/R/site-library/patRoonData/extdata/pos solvent-pos solvent-pos
#> 3 solvent-pos-3 /usr/local/lib/R/site-library/patRoonData/extdata/pos solvent-pos solvent-pos
#> 4 standard-pos-1 /usr/local/lib/R/site-library/patRoonData/extdata/pos standard-pos solvent-pos
#> 5 standard-pos-2 /usr/local/lib/R/site-library/patRoonData/extdata/pos standard-pos solvent-pos
#> 6 standard-pos-3 /usr/local/lib/R/site-library/patRoonData/extdata/pos standard-pos solvent-pos
(Note that for the example data the patRoonData::exampleAnalysisInfo() function can also be used.)
It is possible to add more columns to the analysis information: these can be used to attach additional metadata to each sample analysis. These columns can be added later to the table, or specified directly to generateAnalysisInfo():
# As above, but add some (nonsensical) metadata: location and exposure
generateAnalysisInfo(fromCentroid = patRoonData::exampleDataPath(),
replicate = c(rep("solvent-pos", 3), rep("standard-pos", 3)),
blank = "solvent-pos",
location = c("NL", "NL", "NL", "DE", "DE", "DE"),
exposure = c(0, 0, 0, 2, 2, 2))#> analysis path_centroid path_raw path_profile path_ims replicate blank location exposure
#> 1 solvent-pos-1 /usr/local/lib/R/site-library/patRoonData/extdata/pos solvent-pos solvent-pos NL 0
#> 2 solvent-pos-2 /usr/local/lib/R/site-library/patRoonData/extdata/pos solvent-pos solvent-pos NL 0
#> 3 solvent-pos-3 /usr/local/lib/R/site-library/patRoonData/extdata/pos solvent-pos solvent-pos NL 0
#> 4 standard-pos-1 /usr/local/lib/R/site-library/patRoonData/extdata/pos standard-pos solvent-pos DE 2
#> 5 standard-pos-2 /usr/local/lib/R/site-library/patRoonData/extdata/pos standard-pos solvent-pos DE 2
#> 6 standard-pos-3 /usr/local/lib/R/site-library/patRoonData/extdata/pos standard-pos solvent-pos DE 2
The metadata (location and exposure in the example above) can be used in various ways later in the workflow to process the non-target data.
See the reference manual for more details on the analysis information and generateAnalysisInfo() (?`analysis-information `).
4.3.3 Data conversion and pre-treatment
As noted in the previous sections, analyses are typically stored in different file types and formats, and algorithms in the workflow typically only support some of these. Hence, it is often required to perform file conversion.
The convertMSFiles() function supports various algorithms to perform the necessary file conversions:
| Algorithm | Usage | Input file types and formats | Output file types and formats | Remarks |
|---|---|---|---|---|
| ProteoWizard | convertMSFiles(algorithm = "pwiz", ...) |
all formats and types | all except raw | most popular and versatile converter |
| OpenMS | convertMSFiles(algorithm = "openms", ...) |
centroid and profile (mzML and mzXML) |
centroid and profile (mzML and mzXML) |
Does not support centroiding. |
| DataAnalysis | convertMSFiles(algorithm = "bruker", ...) |
raw (Bruker .d) |
centroid and profile (mzML and mzXML) |
|
| IMS collapse | convertMSFiles(algorithm = "imscollapse", ...) |
raw (Bruker TIMS) and ims (mzML) (uses msdata) | centroid (mzML and mzXML) |
Omits MS2 data by default. |
| TIMSCONVERT | convertMSFiles(algorithm = "timsconvert", ...) |
raw (Bruker TIMS) | centroid, profile, ims (mzML) |
NOTE For the conversion of IMS to centroided data it is highly recommended to use the IMS collapse or TIMSCONVERT algorithms, as ProteoWizard currently does not support accurate centroiding of IMS data. For the conversion of Agilent IMS data, ProteoWizard can be used to convert the raw
.dfiles to the ims (mzML) files, and subsequently IMS collapse can be used to convert these to centroided files.
The convertMSFiles() function uses the analysis information to locate the input files and the destination paths for the output. The path_XXX columns should contain the desired destination directories for those file types that should be exported. For instance:
anaInfoConv <- data.frame(
analysis = c("sample1", "sample2"),
replicate = "replicate",
blank = "",
path_raw = "raw_files", # directory containing the raw HRMS instrument files (.d, .raw, ...)
path_centroid = "centroid_files" # destination directory where the centroided files will be placed
)
anaInfoConv#> analysis replicate blank path_raw path_centroid
#> 1 sample1 replicate raw_files centroid_files
#> 2 sample2 replicate raw_files centroid_files
The convertMSFiles() takes the analysis information and performs the necessary conversions:
# Convert thermo raw files to centroided mzML files
convertMSFiles(anaInfo, typeFrom = "raw", formatFrom = "thermo", typeTo = "centroid", formatTo = "mzML",
algorithm = "pwiz")
# convert TIMS data to LC-MS like centroided data
convertMSFiles(anaInfo, typeFrom = "raw", formatFrom = "bruker_ims", typeTo = "centroid", formatTo = "mzML",
algorithm = "timsconvert")
# convert Agilent IMS-HRMS data to ims data in mzML format
convertMSFiles(anaInfo, typeFrom = "raw", formatFrom = "agilent_ims", typeTo = "ims", formatTo = "mzML",
algorithm = "pwiz")
# ... and then use IM collapse to LC-MS like centroided mzML files
convertMSFiles(anaInfo, typeFrom = "ims", formatFrom = "mzML", typeTo = "centroid", formatTo = "mzML",
algorithm = "imscollapse")The newProject() utility can automatically generate a proper analysis information table and the required code to perform the desired file conversions.
NOTE The IMS collapse algorithm omits MS/MS data by default to save space and speed up file conversion. This algorithm is typically used in post mobility assignment IMS workflows, which do not use MS/MS data from centroided files. Set
includeMSMS=TRUEto include MS/MS data.
Besides conversion, other types of data pre-treatment may also need to be performed. For instance, ProteoWizard can be used to apply various data filters, and several utility functions exist to apply mass re-calibration of Bruker data files.
# Use ProteoWizard to perform conversion and apply a filter to only keep MS 1 data
# See http://proteowizard.sourceforge.net/tools/msconvert.html for supported filters
convertMSFiles(anaInfo, typeFrom = "raw", formatFrom = "thermo", typeTo = "centroid", formatTo = "mzML",
algorithm = "pwiz", filters = "msLevel 1")
# perform m/z re-calibration of Bruker data (should be performed prior to file conversion!)
# NOTE: this requires Bruker DataAnalysis
setDAMethod(anaInfo, "path/to/DAMethod.m") # configure Bruker files with given method that has automatic calibration configured
recalibrarateDAFiles(anaInfo) # trigger re-calibration for each analysis
getDACalibrationError(anaInfo) # get calibration error for each analysisPlease see the reference manual for more details (?convertMSFiles, ?`bruker-utils`).