Skip to contents




StreamFind

The StreamFind is an R package and can be used for data management, processing, visualization and reporting. This guide uses mass spectrometry (MS) data as example and aims to instruct developers to implement new processing modules and additional processing algorithms for new or existing processing methods in StreamFind.

Setup

The R package is in the StreamFind GitHub repository of the ODEA Project. For development, the recommendation is to download the repository locally using git tracking for version control. The GitHub desktop tool can be used for more easily install and configure git with your GitHub account, which is recommended for authoring contributions. Since it is an R package, the RStudio IDE is recommended for development. Yet, others (e.g., VS Code) will also work. When using RStudio, the repository can be downloaded via new project, selecting version control, then git and finally adding the GitHub url https://github.com/odea-project/StreamFind. This should create a local image of the StreamFind repository directly with git tracking, if git or GitHub desktop were properly installed and configured. When using RStudio, the project should directly be identified as package development where all tools are available to support development. We recommend setting the Use devtools package functions if available and generate documentation via Roxygen (in the configure bottom you select all options) located in the Build tab under the Configure Build Tools…. For other IDEs, we recommend using the package devtools. Considering that the local image of the StreamFind repository is installed with git tracking, the first step for development is to create a dedicated branch for implementation of new processing modules and/or algorithms. The master branch should not be changed directly but modified by pull requests from the dedicated development branch, giving the opportunity for code revision.

Structure

The streamFind R package is centered around the R6 class system, which brings object oriented programming to R. For MS, the MassSpecEngine R6 class is used to encapsulate both the data and methods. The data is stored in private fields within the MassSpecEngine and can only be accessed and processed via the public methods. Below the creation of a MassSpecEngine object and the way to access and change data is briefly shown.

ms <- MassSpecEngine$new()

ms$add_headers(name = "Example", author = "Person A")

ms$get_headers()
## 
##   ProjectHeaders 
##   file: NA
##   date: 2024-08-01 10:06:11.943092
##   name: Example
##   author: Person A
# print method. Note that MS data files were not yet added!
ms
## 
## MassSpecEngine
## name          Example
## author        Person A
## file          NA
## date          2024-08-01 10:06:11.943092
## 
## Workflow empty 
## 
## Analyses empty 
## 
## Results empty

For implementation of processing methods, the S3 class ProcessingSettings is used to dispatch settings to processing methods. See article Evaluation of Wastewater Ozonation with Mass Spectrometry for demonstration of usage. In essence, a given algorithm to be applied to a processing method is added to the MassSpecEngine as a ProcessingSettings object which is then used to process the data with the defined settings or parameters. The structure of a ProcessingSettings is exemplified below for the algorithm openms to be applied to the method MassSpecEngine$find_features().

## 
##  ProcessingSettings 
##  engine       MassSpec
##  call         FindFeatures
##  algorithm    openms
##  version      0.2.0
##  software     openms
##  developer    Oliver Kohlbacher
##  contact      oliver.kohlbacher@uni-tuebingen.de
##  link         https://openms.de/
##  doi          https://doi.org/10.1038/nmeth.3959
## 
##  parameters: 
##   -  noiseThrInt 1000 
##   -  chromSNR 3 
##   -  chromFWHM 7 
##   -  mzPPM 15 
##   -  reEstimateMTSD TRUE 
##   -  traceTermCriterion sample_rate 
##   -  traceTermOutliers 5 
##   -  minSampleRate 1 
##   -  minTraceLength 4 
##   -  maxTraceLength 70 
##   -  widthFiltering fixed 
##   -  minFWHM 4 
##   -  maxFWHM 35 
##   -  traceSNRFiltering TRUE 
##   -  localRTRange 0 
##   -  localMZRange 0 
##   -  isotopeFilteringModel none 
##   -  MZScoring13C FALSE 
##   -  useSmoothedInts FALSE 
##   -  extraOpts 
##   -  intSearchRTWindow 3 
##   -  useFFMIntensities FALSE 
##   -  verbose FALSE

As shown, the constructor of a ProcessingSettings is a function always including [engine name upper cammel case]Settings_[method name upper cammel case]_[algorithm name]; More details are given in the Semantics (@ref(semantics)). Then, the ProcessingSettings can be directly added to the MassSpecEngine.

ms$add_settings(ffs)

ms
## 
## MassSpecEngine
## name          Example
## author        Person A
## file          NA
## date          2024-08-01 10:06:11.943092
## 
## Workflow
##  1: FindFeatures (openms)
## 
## Analyses empty 
## 
## Results empty

Alternatively, the ProcessingSettings can be saved as a JSON string and imported from a JSON file, as demonstrated below.

save_default_ProcessingSettings(
  engine = "MassSpec",
  call = "FindFeatures",
  algorithm = "xcms3_centwave",
  name = "ffs",
  path = getwd()
)
jsonlite::prettify(readLines("ffs.json"))
## {
##     "engine": [
##         "MassSpec"
##     ],
##     "call": [
##         "FindFeatures"
##     ],
##     "algorithm": [
##         "xcms3_centwave"
##     ],
##     "parameters": {
##         "class": [
##             "CentWaveParam"
##         ],
##         "ppm": [
##             12
##         ],
##         "peakwidth": [
##             5,
##             60
##         ],
##         "snthresh": [
##             15
##         ],
##         "prefilter": [
##             5,
##             1500
##         ],
##         "mzCenterFun": [
##             "wMean"
##         ],
##         "integrate": [
##             1
##         ],
##         "mzdiff": [
##             -0.0002
##         ],
##         "fitgauss": [
##             true
##         ],
##         "noise": [
##             500
##         ],
##         "verboseColumns": [
##             true
##         ],
##         "roiList": [
## 
##         ],
##         "firstBaselineCheck": [
##             false
##         ],
##         "roiScales": [
## 
##         ],
##         "extendLengthMSW": [
##             false
##         ]
##     },
##     "version": [
##         "0.2.0"
##     ],
##     "software": [
##         "xcms"
##     ],
##     "developer": [
##         "Ralf Tautenhahn, Johannes Rainer"
##     ],
##     "contact": [
##         "rtautenh@ipb-halle.de"
##     ],
##     "link": [
##         "https://bioconductor.org/packages/release/bioc/html/xcms.html"
##     ],
##     "doi": [
##         "https://doi.org/10.1186/1471-2105-9-504"
##     ]
## }
## 
ms$import_settings("ffs.json")

# "openms" replaced by "xcms3_centwave"
ms
## 
## MassSpecEngine
## name          Example
## author        Person A
## file          NA
## date          2024-08-01 10:06:11.943092
## 
## Workflow
##  1: FindFeatures (xcms3_centwave)
## 
## Analyses empty 
## 
## Results empty

The use of the S3 object system for ProcessingSettings gives flexibility to the list of parameters, meaning that each parameter entry can be a single numeric value, a vector of strings or even a full data.frame if required. Each ProcessingSettings constructor (i.e., [engine name upper cammel case]Settings_[method name upper cammel case]_[algorithm name]) has a dedicated validation method to ensure that the parameters and metadata are in conformity (as shown below). The validation of a ProcessingSettings is always performed before applying it to a processing method.

## [1] TRUE

Besides the S3 class ProcessingSettings, the ProcessingSettings object receives other class names that are used for S3 method dispatchment (i.e., direct the object to the dedicated S3 method where the actual processing algorithm is applied). Below we show the classes of the ffs ProcessingSettings. The class patRoon means that the algorithm openms is applied via the package patRoon. The class MassSpecSettings_FindFeatures_openms directs the object to the right processing method and indicates which algorithm to be applied. For this, an S3 generic is used in each processing method (e.g., MassSpecEngine$find_features() or MassSpecEngine$group_features()) for the dispatchment. This process is not visible to the user but is essential for the developer. Implementation of new processing methods and/or algorithms must consider this structure. In the section Implementation (@ref(implementation)) the process of adding new methods and algorithms is described in more detail.

class(ffs)
## [1] "ProcessingSettings"                  
## [2] "MassSpecSettings_FindFeatures_openms"
## [3] "FindFeatures_patRoon"

Semantics

Consistent semantics are attempted within the StreamFind R package. Some of the class and method names were already mentioned above and a clear use of the underscore to separate words for methods and use of Upper Camel Case for classes is visible. In this section, we try to highlight the defined rules for the most important semantic aspects. All the methods available via the class MassSpecEngine$ are written with underscore to separate the words (e.g., get_analysis_names() or annotate_features()). The arguments of methods, functions and class constructors are always written with Lower Camel Case when more than one word is needed (e.g., colorBy or minIntensity). Classes are written with Upper Camel Case when two or more words are used (e.g., MassSpecAnalysis or ProjectHeaders) with the exception of the specific constructor functions for the different algorithm settings, which use the syntax [engine name]Settings_[method name]_[algorithm name] (e.g., MassSpecSettings_FilterFeatures_StreamFind or RamanSettings_BinSpectra_StreamFind); This supports and facilitates the association of the settings with the respective engine and processing method. Functions or methods not available to the user (i.e., not exported via the package NAMESPACE) are written with . at the beginning, followed by underscore to separate words (e.g., .get_colors() or .plot_spectra_ms2_static()). This is also applied to the S3 generics of the processing modules, which use the syntax .s3_[module method] (e.g., .s3_FindFeatures or .s3_GroupFeatures).

Files

The file structure of the StreamFind package is in line with the CRAN official package development guideline.). All relevant files for the developer are in the R, src, man-roxygen, tests and vignettes. In the R folder are the R scripts, in the src folder are C++ libraries and the Rcpp interface functions, in the man-roxygen are the templates for documentation of arguments, in the tests are the test units that should be applied for each processing method and each algorithm implementation and finally, in the vignettes are articles, tutorials and guides for the users. R file names in the R folder have a defined name syntax according to the content/function. Class files are named with class_[class type in capital_[class name].R]. Exported MS function files are named with fct_[optional engine associated]_[unique name].R. Utility functions not exported are named with utils_[unique name].R. S3 methods for processing modules are written with methods_S3_[engine name]_[method name]_[algorithm].R. ProcessingSettings constructors for a given engine are placed in a file named class_S3_[engine name]Settings.R.

Implementation

The implementation of new processing methods and new algorithms for a given processing method differs in terms of impact change. While addition of new processing methods require the change of the main MassSpecEngine class and addition of new S3 generics, adding new algorithms for existing modules do not require changes in the existing files. Therefore, we describe their implementation in two separate sections.

New methods

Under definition.

New algorithms

Under definition.