Properties for the sample analyses used in the workflow and utilities to automatically generate this information.

generateAnalysisInfo(
  paths,
  groups = "",
  blanks = "",
  concs = NULL,
  norm_concs = NULL,
  formats = MSFileFormats()
)

generateAnalysisInfoFromEnviMass(path)

Arguments

paths

A character vector containing one or more file paths that should be used for finding the analyses.

groups, blanks

An (optional) character vector containing replicate groups and blanks, respectively (will be recycled). If groups is an empty character string ("") the analysis name will be set as replicate group.

concs

An optional numeric vector containing concentration values for each analysis. Can be NA if unknown. If the length of concs is less than the number of analyses the remainders will be set to NA. Set to NULL to not include concentration data.

norm_concs

An optional numeric vector containing concentrations used for feature normalization (see the Feature intensity normalization section in the featureGroups documentation). NA values are allowed for analyses that should not be normalized (e.g. because no IS is present). If the length of norm_concs is less than the number of analyses the remainders will be set to NA. Set to NULL to not include normalization concentration data.

formats

A character vector of analyses file types to consider. Analyses not present in these formats will be ignored. For valid values see MSFileFormats.

path

The path of the enviMass project.

Details

In patRoon a sample analysis, or simply analysis, refers to a single MS analysis file (sometimes also called sample or file). The analysis information summarizes several properties for the analyses, and is used in various steps throughout the workflow, such as findFeatures, averaging intensities of feature groups and blank subtraction. This information should be in a data.frame, with the following columns:

  • path the full path to the directory of the analysis.

  • analysis the file name without extension. Must be unique, even if the path is different.

  • group name of replicate group. A replicate group is used to group analyses together that are replicates of each other. Thus, the group column for all analyses considered to be belonging to the same replicate group should have an equal (but unique) value. Used for e.g. averaging and filter.

  • blank all analyses within this replicate group are used by the featureGroups method of filter for blank subtraction. Multiple entries can be entered by separation with a comma.

  • conc a numeric value specifying the 'concentration' for the analysis. This can be actually any kind of numeric value such as exposure time, dilution factor or anything else which may be used to form a linear relationship.

  • norm_conc a numeric value specifying the normalization concentration for the analysis. See the Feature intensity normalization section in the featureGroups documentation) for more details.

Most workflows steps work with mzXML and mzML file formats. However, some algorithms only support support one format (e.g. findFeaturesOpenMS, findFeaturesEnviPick) or a proprietary format (findFeaturesBruker). To mix such algorithms in the same workflow, the analyses should be present in all required formats within the same directory as specified by the path column.

Each analysis should only be specified once in the analysis information, even if multiple file formats are available. The path and analysis columns are internally used by patRoon to automatically find the path of analysis files with the required format.

The group column is mandatory and needs to be non-empty for each analysis. The blank column should also be present, however, this may be empty ("") for analyses where no blank subtraction should occur. The conc column is only required when obtaining regression information is desired with the as.data.table method. Similarly, the norm_conc is only necessary for the normInts method.

generateAnalysisInfo is an utility function that automatically generates a data.frame with analysis information. It scans the directories specified from the paths argument for analyses, and uses this to automatically fill in the analysis and path columns. Furthermore, this function also correctly handles analyses which are available in multiple formats.

generateAnalysisInfoFromEnviMass loads analysis information from an enviMass project. Note: this funtionality has only been tested with older versions of enviMass.