Evaluation of Wastewater Ozonation with Mass Spectrometry

Introduction

In this article we demonstrate how StreamFind can be used to evaluate ozonation of secondary wastewater effluent (i.e., effluent of the aerated biological treatment) using mass spectrometry (MS). A set of 18 mzML files are used, representing blank, influent and effluent measurements in triplicate for both positive and negative ionization modes.

basename(files)

 [1] "01_tof_ww_is_neg_blank-r001.mzML"        
 [2] "01_tof_ww_is_neg_blank-r002.mzML"        
 [3] "01_tof_ww_is_neg_blank-r003.mzML"        
 [4] "01_tof_ww_is_pos_blank-r001.mzML"        
 [5] "01_tof_ww_is_pos_blank-r002.mzML"        
 [6] "01_tof_ww_is_pos_blank-r003.mzML"        
 [7] "02_tof_ww_is_neg_influent-r001.mzML"     
 [8] "02_tof_ww_is_neg_influent-r002.mzML"     
 [9] "02_tof_ww_is_neg_influent-r003.mzML"     
[10] "02_tof_ww_is_pos_influent-r001.mzML"     
[11] "02_tof_ww_is_pos_influent-r002.mzML"     
[12] "02_tof_ww_is_pos_influent-r003.mzML"     
[13] "03_tof_ww_is_neg_o3sw_effluent-r001.mzML"
[14] "03_tof_ww_is_neg_o3sw_effluent-r002.mzML"
[15] "03_tof_ww_is_neg_o3sw_effluent-r003.mzML"
[16] "03_tof_ww_is_pos_o3sw_effluent-r001.mzML"
[17] "03_tof_ww_is_pos_o3sw_effluent-r002.mzML"
[18] "03_tof_ww_is_pos_o3sw_effluent-r003.mzML"

The showcase will use the StreamFind MassSpecEngine, which encapsulates all tools required for parsing, storing, processing and visualizing MS data. Note that not all methods/functions will be shown as the demonstration focuses of the workflow to assess wastewater ozonation. Other processing methods are available in the StreamFind package and can be found in the StreamFind reference documentation.

MassSpecEngine

The R6 MassSpecEngine class object is created using MassSpecEngine$new(), as shown below. The analyses argument can be used to add the set of ms files directly. In this demonstration, we use mzML files. However, the original ms vendor files can also be used, but they will be converted to mzML format using msConvert from ProteoWizard in the background.

# Creates a MassSpecEngine from mzML files
ms <- MassSpecEngine$new(analyses = files)

# Prints in console a summary of the MassSpecEngine
ms


MassSpecEngine
File: 
NA

Headers:name: NA
author: NA
date: 2024-11-28 12:52:24.50869


Workflow: 
empty

Analyses: 
                               analysis                      replicate  blank
                                 <char>                         <char> <char>
 1:         01_tof_ww_is_neg_blank-r001         01_tof_ww_is_neg_blank   <NA>
 2:         01_tof_ww_is_neg_blank-r002         01_tof_ww_is_neg_blank   <NA>
 3:         01_tof_ww_is_neg_blank-r003         01_tof_ww_is_neg_blank   <NA>
 4:         01_tof_ww_is_pos_blank-r001         01_tof_ww_is_pos_blank   <NA>
 5:         01_tof_ww_is_pos_blank-r002         01_tof_ww_is_pos_blank   <NA>
 6:         01_tof_ww_is_pos_blank-r003         01_tof_ww_is_pos_blank   <NA>
 7:      02_tof_ww_is_neg_influent-r001      02_tof_ww_is_neg_influent   <NA>
 8:      02_tof_ww_is_neg_influent-r002      02_tof_ww_is_neg_influent   <NA>
 9:      02_tof_ww_is_neg_influent-r003      02_tof_ww_is_neg_influent   <NA>
10:      02_tof_ww_is_pos_influent-r001      02_tof_ww_is_pos_influent   <NA>
11:      02_tof_ww_is_pos_influent-r002      02_tof_ww_is_pos_influent   <NA>
12:      02_tof_ww_is_pos_influent-r003      02_tof_ww_is_pos_influent   <NA>
13: 03_tof_ww_is_neg_o3sw_effluent-r001 03_tof_ww_is_neg_o3sw_effluent   <NA>
14: 03_tof_ww_is_neg_o3sw_effluent-r002 03_tof_ww_is_neg_o3sw_effluent   <NA>
15: 03_tof_ww_is_neg_o3sw_effluent-r003 03_tof_ww_is_neg_o3sw_effluent   <NA>
16: 03_tof_ww_is_pos_o3sw_effluent-r001 03_tof_ww_is_pos_o3sw_effluent   <NA>
17: 03_tof_ww_is_pos_o3sw_effluent-r002 03_tof_ww_is_pos_o3sw_effluent   <NA>
18: 03_tof_ww_is_pos_o3sw_effluent-r003 03_tof_ww_is_pos_o3sw_effluent   <NA>
         type polarity spectra chromatograms concentration
       <char>   <char>   <num>         <num>         <num>
 1: MS/MS-DDA negative    1914             0            NA
 2: MS/MS-DDA negative    1932             0            NA
 3: MS/MS-DDA negative    1941             0            NA
 4: MS/MS-DDA positive    2504             0            NA
 5: MS/MS-DDA positive    2491             0            NA
 6: MS/MS-DDA positive    2504             0            NA
 7: MS/MS-DDA negative    1984             0            NA
 8: MS/MS-DDA negative    1995             0            NA
 9: MS/MS-DDA negative    1993             0            NA
10: MS/MS-DDA positive    2131             0            NA
11: MS/MS-DDA positive    2126             0            NA
12: MS/MS-DDA positive    2124             0            NA
13: MS/MS-DDA negative    1909             0            NA
14: MS/MS-DDA negative    1921             0            NA
15: MS/MS-DDA negative    1924             0            NA
16: MS/MS-DDA positive    2046             0            NA
17: MS/MS-DDA positive    2052             0            NA
18: MS/MS-DDA positive    2048             0            NA

ProjectHeaders

Project headers (e.g., name, author and description) can be added to the MassSpecEngine$headers as a list object as shown below. The elements of the list can be anything but must have length one.

# Adds headers to the MassSpecEngine
ms$headers <- list(
  name = "Wastewater Ozonation Showcase",
  author = "Ricardo Cunha",
  description = "Demonstration project"
)

# Gets and shows the headers
show(ms$headers)

name: Wastewater Ozonation Showcase
author: Ricardo Cunha
description: Demonstration project
date: 2024-11-28 12:52:24.90631

# Gets the MassSpecEngine date
ms$headers$date

[1] "2024-11-28 12:52:24 CET"

Replicates and blanks

The analysis replicate names and the associated blank replicate name can be amended in the MassSpecEngine, as shown below. Alternatively, a data.frame with column names file, replicate and blank could be added as the analyses argument in MassSpecEngine$new(analyses = files) to have directly the replicate and blank replicate names assigned (more details here).

# Character vector with analysis replicate names
rpls <- c(
  rep("blank_neg", 3),
  rep("blank_pos", 3),
  rep("influent_neg", 3),
  rep("influent_pos", 3),
  rep("effluent_neg", 3),
  rep("effluent_pos", 3)
)

# Character vector with associated blank replicate names
# Note that the order should match the respective replicate
blks <- c(
  rep("blank_neg", 3),
  rep("blank_pos", 3),
  rep("blank_neg", 3),
  rep("blank_pos", 3),
  rep("blank_neg", 3),
  rep("blank_pos", 3)
)

# Amends replicate and blank names
ms$add_replicate_names(rpls)
ms$add_blank_names(blks)

# Replicates and blanks were amended
ms$analyses$info[, 1:5]

                               analysis    replicate     blank      type
                                 <char>       <char>    <char>    <char>
 1:         01_tof_ww_is_neg_blank-r001    blank_neg blank_neg MS/MS-DDA
 2:         01_tof_ww_is_neg_blank-r002    blank_neg blank_neg MS/MS-DDA
 3:         01_tof_ww_is_neg_blank-r003    blank_neg blank_neg MS/MS-DDA
 4:         01_tof_ww_is_pos_blank-r001    blank_pos blank_pos MS/MS-DDA
 5:         01_tof_ww_is_pos_blank-r002    blank_pos blank_pos MS/MS-DDA
 6:         01_tof_ww_is_pos_blank-r003    blank_pos blank_pos MS/MS-DDA
 7:      02_tof_ww_is_neg_influent-r001 influent_neg blank_neg MS/MS-DDA
 8:      02_tof_ww_is_neg_influent-r002 influent_neg blank_neg MS/MS-DDA
 9:      02_tof_ww_is_neg_influent-r003 influent_neg blank_neg MS/MS-DDA
10:      02_tof_ww_is_pos_influent-r001 influent_pos blank_pos MS/MS-DDA
11:      02_tof_ww_is_pos_influent-r002 influent_pos blank_pos MS/MS-DDA
12:      02_tof_ww_is_pos_influent-r003 influent_pos blank_pos MS/MS-DDA
13: 03_tof_ww_is_neg_o3sw_effluent-r001 effluent_neg blank_neg MS/MS-DDA
14: 03_tof_ww_is_neg_o3sw_effluent-r002 effluent_neg blank_neg MS/MS-DDA
15: 03_tof_ww_is_neg_o3sw_effluent-r003 effluent_neg blank_neg MS/MS-DDA
16: 03_tof_ww_is_pos_o3sw_effluent-r001 effluent_pos blank_pos MS/MS-DDA
17: 03_tof_ww_is_pos_o3sw_effluent-r002 effluent_pos blank_pos MS/MS-DDA
18: 03_tof_ww_is_pos_o3sw_effluent-r003 effluent_pos blank_pos MS/MS-DDA
    polarity
      <char>
 1: negative
 2: negative
 3: negative
 4: positive
 5: positive
 6: positive
 7: negative
 8: negative
 9: negative
10: positive
11: positive
12: positive
13: negative
14: negative
15: negative
16: positive
17: positive
18: positive

ProcessingSettings

Data processing is performed by steps according to ProcessingSettings. S7 ProcessingSettings class objects are obtained via the respective [Engine type]Settings_[module name]_[algorithm name] constructor functions, attributing the respective subclass. Below we obtain the ProcessingSettings for the step FindFeatures using the algorithm openms. The parameters for each processing module can be changed via the constructor arguments. Documentation for each ProcessingSettings subclass can be found in the StreamFind reference documentation.

# Gets ProcessingSettings for finding features using the openms algorithm
ffs <- MassSpecSettings_FindFeatures_openms(
  noiseThrInt = 1000,
  chromSNR = 3,
  chromFWHM = 7,
  mzPPM = 15,
  reEstimateMTSD = TRUE,
  traceTermCriterion = "sample_rate",
  traceTermOutliers = 5,
  minSampleRate = 1,
  minTraceLength = 4,
  maxTraceLength = 70,
  widthFiltering = "fixed",
  minFWHM = 4,
  maxFWHM = 35,
  traceSNRFiltering = TRUE,
  localRTRange = 0,
  localMZRange = 0,
  isotopeFilteringModel = "none",
  MZScoring13C = FALSE,
  useSmoothedInts = FALSE,
  intSearchRTWindow = 3,
  useFFMIntensities = FALSE,
  verbose = FALSE
)

# Prints in console the details of the ProcessingSettings
show(ffs)


 StreamFind::MassSpecSettings_FindFeatures_openms 
 engine       MassSpec
 method       FindFeatures
 algorithm    openms
 version      0.2.0
 software     openms
 developer    Oliver Kohlbacher
 contact      oliver.kohlbacher@uni-tuebingen.de
 link         https://openms.de/
 doi          https://doi.org/10.1038/nmeth.3959

 parameters: 
  -  noiseThrInt 1000 
  -  chromSNR 3 
  -  chromFWHM 7 
  -  mzPPM 15 
  -  reEstimateMTSD TRUE 
  -  traceTermCriterion sample_rate 
  -  traceTermOutliers 5 
  -  minSampleRate 1 
  -  minTraceLength 4 
  -  maxTraceLength 70 
  -  widthFiltering fixed 
  -  minFWHM 4 
  -  maxFWHM 35 
  -  traceSNRFiltering TRUE 
  -  localRTRange 0 
  -  localMZRange 0 
  -  isotopeFilteringModel none 
  -  MZScoring13C FALSE 
  -  useSmoothedInts FALSE 
  -  intSearchRTWindow 3 
  -  useFFMIntensities FALSE 
  -  verbose FALSE

# Creates an ordered list with all processing steps for the MS data
workflow <- list(

  # Find features using the openms algorithm, created above
  ffs,

  # Annotation of natural isotopes and adducts
  MassSpecSettings_AnnotateFeatures_StreamFind(
    rtWindowAlignment = 0.3,
    maxIsotopes = 8,
    maxCharge = 2,
    maxGaps = 1
  ),

  # Excludes annotated isotopes and adducts
  MassSpecSettings_FilterFeatures_StreamFind(
    excludeIsotopes = TRUE,
    excludeAdducts = TRUE
  ),

  # Grouping features across analyses
  MassSpecSettings_GroupFeatures_openms(
    rtalign = FALSE,
    QT = FALSE,
    maxAlignRT = 5,
    maxAlignMZ = 0.008,
    maxGroupRT = 5,
    maxGroupMZ = 0.008,
    verbose = FALSE
  ),

  # Filter feature groups with maximum intensity below 5000 counts
  MassSpecSettings_FilterFeatures_StreamFind(
    minIntensity = 5000
  ),

  # Fill features with missing data
  # Reduces false negatives
  MassSpecSettings_FillFeatures_StreamFind(
    withinReplicate = FALSE,
    rtExpand = 2,
    mzExpand = 0.0005,
    minTracesIntensity = 1000,
    minNumberTraces = 6,
    baseCut = 0.3,
    minSignalToNoiseRatio = 3,
    minGaussianFit = 0.2
  ),

  # Calculate quality metrics for each feature
  MassSpecSettings_CalculateFeaturesQuality_StreamFind(
    filtered = FALSE,
    rtExpand = 2,
    mzExpand = 0.0005,
    minTracesIntensity = 1000,
    minNumberTraces = 6,
    baseCut = 0
  ),

  # Filter features based on minimum signal-to-noise ratio (s/n)
  # The s/n is calculated using the CalculateFeaturesQuality method
  MassSpecSettings_FilterFeatures_StreamFind(
    minSnRatio = 5
  ),

  # Filter features using other parameters via the patRoon package
  MassSpecSettings_FilterFeatures_patRoon(
    maxReplicateIntRSD = 40,
    blankThreshold = 5,
    absMinReplicateAbundance = 3
  ),

  # Finds internal standards in the MS data
  # db_is is a data.table with the
  # name, mass and expected retention time of
  # spiked internal standards, as shown below
  MassSpecSettings_FindInternalStandards_StreamFind(
    database = db_is,
    ppm = 8,
    sec = 10
  ),

  # Loads MS1 for features not filtered
  MassSpecSettings_LoadFeaturesMS1_StreamFind(
    filtered = FALSE
  ),

  # Loads MS2 for features not filtered
  MassSpecSettings_LoadFeaturesMS2_StreamFind(
    filtered = FALSE
  ),

  # Loads feature extracted ion chromatograms (EIC)
  MassSpecSettings_LoadFeaturesEIC_StreamFind(
    filtered = FALSE
  ),

  # Performs suspect screening using the StreamFind algorithm
  # db_with_ms2 is a database with suspect chemical standards
  # includes MS2 data (i.e., fragmentation pattern) from standards
  MassSpecSettings_SuspectScreening_StreamFind(
    database = db_with_ms2,
    ppm = 10,
    sec = 15,
    ppmMS2 = 10,
    minFragments = 3
  )
)

Then, the list can be added to the MassSpecEngine. Note that the order will matter when the workflow is applied!

# Adds the workflow to the engine. The order matters!
ms$workflow <- workflow

# Printing the data processing workflow
ms$print_workflow()

1: FindFeatures (openms)
2: AnnotateFeatures (StreamFind)
3: FilterFeatures (StreamFind)
4: GroupFeatures (openms)
5: FilterFeatures (StreamFind)
6: FillFeatures (StreamFind)
7: CalculateFeaturesQuality (StreamFind)
8: FilterFeatures (StreamFind)
9: FilterFeatures (patRoon)
10: FindInternalStandards (StreamFind)
11: LoadFeaturesMS1 (StreamFind)
12: LoadFeaturesMS2 (StreamFind)
13: LoadFeaturesEIC (StreamFind)
14: SuspectScreening (StreamFind)

`run_workflow()`

The Workflow can be applied by run_workflow(), as demonstrated below. Note that with run_workflow(), the processing modules are applied with the same order as they were added.

# Runs all ProcessingSettings added
ms$run_workflow()

Results

The created features and feature groups can be inspected as data.table objects or plotted by dedicated methods in the MassSpecEngine. Internally, the MassSpecEngine stores the results in the NTS object, which can be accessed with MassSpecEngine$nts. Yet, the engine interface is recommended for accessing the results. However, the NTS object can be used for more advanced operations, such as exporting the results to a database or other formats or use the native objects from other packages (e.g., patRoon) as we demonstrate further in this article.

`data.table` objects

The features and feature groups can be obtained as data.table with the MassSpecEngine$get_features() and MassSpecEngine$get_groups() methods. The methods also allow to look for specific features/feature groups using mass, mass-to-charge ratio, retention time and drift time targets, as show below for a small set of compound targets where mass and retention time expected values are known. Note that drift time is only applicable for MS data with ion mobility separation.

db

                         name       formula     mass    rt    tag
                       <char>        <char>    <num> <int> <char>
 1:     4N-Acetylsulfadiazine   C12H12N4O3S 292.0630   905      S
 2:                Metoprolol     C15H25NO3 267.1834   915      S
 3:          Sulfamethoxazole   C10H11N3O3S 253.0521  1015      S
 4:                Bisoprolol     C18H31NO4 325.2253   955      S
 5: 4N-Acetylsulfamethoxazole   C12H13N3O4S 295.0627  1011      S
 6:             Carbamazepine     C15H12N2O 236.0950  1079      S
 7:                 Terbutryn     C10H19N5S 241.1361  1126      S
 8:                  Losartan   C22H23ClN6O 422.1622  1095      S
 9:               Candesartan    C24H20N6O3 440.1597  1097      S
10:               Isoproturon     C12H18N2O 206.1419  1152      S
11:                    Diuron   C9H10Cl2N2O 232.0170  1160      S
12:                Bezafibrat   C19H20ClNO4 361.1081  1164      S
13:                 Valsartan    C24H29N5O3 435.2270  1177      S
14:              Tebuconazole   C16H22ClN3O 307.1451  1267      S
15:                Diclofenac  C14H11Cl2NO2 295.0167  1255      S
16:             Propiconazole C15H17Cl2N3O2 341.0698  1308      S
17:                Flufenacet C14H13F4N3O2S 363.0665  1296      S
18:                 Ibuprofen      C13H18O2 206.1307  1152      S
19:                      CBZD    C15H14N2O3 270.1004   936      S

# Compounds are searched by monoisotopic mass and retention time
# ppm and sec set the mass (im ppm) and time (in seconds) allowed deviation, respectively
# average applies a mean to the intensities in each analysis replicate group
ms$get_groups(mass = db, ppm = 10, sec = 15, average = TRUE)

             group             name effluent_pos influent_neg influent_pos
            <char>           <char>        <num>        <num>        <num>
1:  M236_R1079_292    Carbamazepine         0.00        0.000    66019.723
2:  M253_R1015_479 Sulfamethoxazole         0.00        0.000     9679.877
3:   M267_R916_636       Metoprolol     15409.08        0.000    49655.950
4: M295_R1256_1106       Diclofenac         0.00     7786.457    19602.253
5:  M325_R957_1536       Bisoprolol     13762.82        0.000    48375.420
6: M440_R1097_2691      Candesartan     22512.84    34920.950   112540.650

Already by inspection of the data.table, it is possible to see compounds detected in the influent but not in the effluent (e.g., Carbamazepine) or compounds that are appear to be reduced during ozonation (e.g., Metoprolol). Since positive and negative ionization mode were combined, there are compounds that appear in both polarities and are grouped by neutral monoisotopic mass (e.g., Candesartan and Diclofenac).

`plot_groups` methods

For a better overview of the results, the method MassSpecEngine$plot_groups() or even more detailed the method MassSpecEngine$plot_groups_overview() can be used.

# set legendNames to TRUE for using the names in db as legend
ms$plot_groups(mass = db, ppm = 10, sec = 15, legendNames = TRUE)

ms$plot_groups_overview(mass = db, ppm = 5, sec = 10, legendNames = TRUE)

Filtered not removed

The FilterFeatures method was applied to filter features according to defined conditions/thresholds. Yet, the filtered features were not removed but just tagged as filtered. For instance, when the method MassSpecEngine$get_groups() is run with filtered argument set to TRUE, the filtered features are also shown. Below, we search for the same compounds as above but with the filtered argument set to TRUE. Potential features from Valsartan are now returned but were filtered due to low intensity. Note that when extracting features based on basic parameters, i.e. mass and time, does not mean that features are identified. The identification of features is a more complex process and requires additional information, such as MS/MS data as in the processing method suspect screening.

# Set filtered to TRUE for showing filtered features/feature groups
ms$get_groups(mass = db, ppm = 5, sec = 10, average = TRUE, filtered = TRUE)

             group             name effluent_neg effluent_pos influent_neg
            <char>           <char>        <num>        <num>        <num>
1:  M236_R1079_292    Carbamazepine        0.000         0.00        0.000
2:  M253_R1015_479 Sulfamethoxazole        0.000         0.00        0.000
3:   M267_R916_636       Metoprolol        0.000     15409.08        0.000
4: M295_R1256_1106       Diclofenac        0.000         0.00     7786.457
5:  M325_R957_1536       Bisoprolol        0.000     13762.82        0.000
6: M435_R1176_2658        Valsartan     3598.697         0.00     3164.295
7: M440_R1097_2691      Candesartan     6750.803     22512.84    34920.950
   influent_pos
          <num>
1:    66019.723
2:     9679.877
3:    49655.950
4:    19602.253
5:    48375.420
6:        0.000
7:   112540.650

Internal Stanards

The method FindInternalStandards was applied for tagging spiked internal standards and the results can be obtained with the dedicated method MassSpecEngine$get_internal_standards() or plotted as a quality overview using the method MassSpecEngine$plot_internal_standards(), as shown below. The plot gives an overview of the mass, retention time and intensity variance of the internal standards across the analyses in the project.

# List of spiked internal standards
db_is

                  name           formula     mass    rt    tag
                <char>            <char>    <num> <int> <char>
1: Cyclophosphamide-D6 C7[2]H6H9Cl2N2O2P 266.0625  1007     IS
2:        Ibuprofen-D3     C13[2]H3H15O2 209.1495  1150     IS
3:      Diclophenac-D4  C14[2]H4H7Cl2NO2 299.0418  1253     IS
4:       Metoprolol-D7    C15[2]H7H18NO3 274.2274   915     IS
5:  Sulfamethoxazol-D4   C10[2]H4H7N3O3S 257.0772  1015     IS
6:      Isoproturon-D6    C12[2]H6H12N2O 212.1796  1149     IS
7:           Diuron-D6   C9[2]H6H4Cl2N2O 238.0547  1157     IS
8:    Carbamazepin-D10    C15[2]H10H2N2O 246.1577  1075     IS
9:         Naproxen-D3     C14[2]H3H11O3 233.1131  1169     IS

# Gets the internal standards evaluation data.table
ms$get_internal_standards()

                   name    rt     mass intensity    area   rtr    mzr error_rt
                 <char> <num>    <num>     <num>   <num> <num>  <num>    <num>
 1:    Carbamazepin-D10  1075 246.1586    456754 2496925  13.6 0.0017      0.0
 2:    Carbamazepin-D10  1075 246.1585    532430 2925329  15.1 0.0021     -0.3
 3:    Carbamazepin-D10  1075 246.1586    510457 2672523  14.3 0.0012     -0.2
 4: Cyclophosphamide-D6  1007 266.0627     51652  245292  13.5 0.0018     -0.1
 5: Cyclophosphamide-D6  1006 266.0627     56586  276668  10.2 0.0020     -1.1
 6: Cyclophosphamide-D6  1007 266.0628     60849  293547  11.6 0.0019     -0.1
 7:      Diclophenac-D4  1254 299.0415     24035  123891  13.7 0.0018      0.6
 8:      Diclophenac-D4  1254 299.0423     51904  265863  11.2 0.0017      0.6
 9:      Diclophenac-D4  1254 299.0414     18131  104910  13.8 0.0027      1.1
10:      Diclophenac-D4  1254 299.0424     48453  262399  12.8 0.0019      1.2
11:      Diclophenac-D4  1254 299.0412     19708  111623  12.5 0.0027      0.8
12:      Diclophenac-D4  1254 299.0423     51395  280790  12.0 0.0016      1.0
13:           Diuron-D6  1157 238.0544     14884   79520  12.1 0.0015     -0.4
14:           Diuron-D6  1157 238.0553    120342  623359  11.5 0.0014      0.0
15:           Diuron-D6  1157 238.0542     14522   75994  11.7 0.0020     -0.3
16:           Diuron-D6  1157 238.0553    130401  729836  13.0 0.0015     -0.1
17:           Diuron-D6  1158 238.0545     15530   91640  10.5 0.0019      0.5
18:           Diuron-D6  1157 238.0554    139260  779077  13.8 0.0012      0.1
19:      Isoproturon-D6  1149 212.1806   1198379 6461139  12.5 0.0012     -0.5
20:      Isoproturon-D6  1149 212.1808   1269561 7284356  14.9 0.0014     -0.5
21:      Isoproturon-D6  1149 212.1807   1322756 7284879  14.8 0.0017     -0.3
22:       Metoprolol-D7   915 274.2291   1581149 8159823  14.6 0.0017      0.1
23:       Metoprolol-D7   914 274.2290   1497131 7504654  14.5 0.0026     -0.7
24:       Metoprolol-D7   915 274.2289   1571305 7921583  14.9 0.0012      0.2
25:  Sulfamethoxazol-D4  1015 257.0767     10741   57457   9.6 0.0015     -0.4
26:  Sulfamethoxazol-D4  1014 257.0776    193814  960954  13.7 0.0015     -0.6
27:  Sulfamethoxazol-D4  1014 257.0773      6566   32506   8.9 0.0027     -1.3
28:  Sulfamethoxazol-D4  1014 257.0777    189113 1005469  14.6 0.0017     -1.2
29:  Sulfamethoxazol-D4  1014 257.0778    187412  893062  13.6 0.0014     -1.1
30:  Sulfamethoxazol-D4  1014 257.0769      7969   41277  10.5 0.0033     -0.8
31:  Sulfamethoxazol-D4  1015 257.0780    204772  990155  14.0 0.0012     -0.2
                   name    rt     mass intensity    area   rtr    mzr error_rt
    error_mass   rec iso_n iso_c    replicate           group  freq
         <num> <num> <num> <num>       <char>          <char> <int>
 1:        3.5   1.0     2    14    blank_pos  M246_R1075_389     3
 2:        3.4   1.2     3    14 influent_pos  M246_R1075_389     3
 3:        3.7   1.1     3    14 effluent_pos  M246_R1075_389     3
 4:        0.8   1.0     3     9    blank_pos  M266_R1007_620     3
 5:        0.8   1.1     2     0 influent_pos  M266_R1007_620     2
 6:        1.1   1.2     2     4 effluent_pos  M266_R1007_620     2
 7:       -1.1   1.0     3    15    blank_neg M299_R1254_1176     3
 8:        1.8   1.0     4    13    blank_pos M299_R1254_1176     3
 9:       -1.3   0.8     3    12 influent_neg M299_R1254_1176     3
10:        1.9   1.0     4    16 influent_pos M299_R1254_1176     3
11:       -1.9   0.9     3    11 effluent_neg M299_R1254_1176     3
12:        1.7   1.1     3    15 effluent_pos M299_R1254_1176     3
13:       -1.4   1.0     1     0    blank_neg  M238_R1157_309     3
14:        2.6   1.0     4     9    blank_pos  M238_R1157_309     3
15:       -2.1   1.0     2     0 influent_neg  M238_R1157_309     3
16:        2.6   1.2     4     9 influent_pos  M238_R1157_309     3
17:       -0.9   1.2     2     0 effluent_neg  M238_R1157_309     3
18:        2.8   1.2     4     8 effluent_pos  M238_R1157_309     3
19:        4.6   1.0     3    11    blank_pos  M212_R1149_103     3
20:        5.6   1.1     3    11 influent_pos  M212_R1149_103     3
21:        5.3   1.1     3    11 effluent_pos  M212_R1149_103     3
22:        6.2   1.0     3    14    blank_pos   M274_R915_734     3
23:        5.7   0.9     3    14 influent_pos   M274_R915_734     3
24:        5.4   1.0     3    13 effluent_pos   M274_R915_734     3
25:       -2.2   1.0     0     0    blank_neg  M257_R1014_514     3
26:        1.7   1.0     3    19    blank_pos  M257_R1014_514     3
27:        0.5   0.6     0     0 influent_neg  M257_R1014_514     2
28:        1.8   1.0     3    19 influent_pos  M257_R1014_514     3
29:        2.3    NA     3    21 influent_pos            <NA>     1
30:       -1.0   0.7     0     0 effluent_neg  M257_R1014_514     3
31:        3.1   1.0     3    18 effluent_pos  M257_R1014_514     3
    error_mass   rec iso_n iso_c    replicate           group  freq
    intensity_sd area_sd rtr_sd mzr_sd error_rt_sd error_mass_sd rec_sd
           <num>   <num>  <num>  <num>       <num>         <num>  <num>
 1:        16944   66611    1.4 0.0004         0.6           0.9    0.0
 2:        16123   84288    1.3 0.0013         0.7           0.2    0.0
 3:         2287   50630    0.1 0.0004         0.9           0.3    0.0
 4:         1266    3523    0.1 0.0004         0.3           0.4    0.0
 5:          326   23429    1.1 0.0011         0.4           0.8    0.1
 6:         3280   19496    0.1 0.0010         0.4           0.3    0.1
 7:         2398    7689    0.5 0.0006         0.6           0.7    0.0
 8:         1420    4215    0.6 0.0002         0.5           0.8    0.0
 9:          602    3947    1.2 0.0008         0.5           0.3    0.0
10:          371   13372    0.4 0.0009         0.3           0.3    0.1
11:         1502    3689    0.2 0.0009         0.9           0.8    0.0
12:         3152    3974    0.2 0.0007         0.4           0.3    0.0
13:          333    2927    1.0 0.0006         0.3           0.8    0.0
14:         1146   25424    0.2 0.0001         0.3           0.6    0.0
15:          717    3788    1.7 0.0005         0.4           0.2    0.0
16:         3699   24125    0.8 0.0004         0.7           0.6    0.0
17:         1642    3607    1.5 0.0001         0.8           0.4    0.0
18:         6556   22184    1.0 0.0003         0.5           0.4    0.0
19:        47806  249601    0.6 0.0002         0.6           1.0    0.0
20:        26888  259717    0.2 0.0002         0.6           1.2    0.0
21:         7909   16502    0.6 0.0005         0.2           0.4    0.0
22:        30940  348656    0.4 0.0004         0.3           0.7    0.0
23:         9479   68476    0.8 0.0011         0.3           0.7    0.0
24:        15361  639985    0.7 0.0003         0.4           0.5    0.1
25:         1711    6948    1.0 0.0002         0.5           0.7    0.0
26:         3299   67601    0.7 0.0002         0.3           0.4    0.0
27:          737    1085    2.3 0.0017         0.3           3.2    0.0
28:         3996   94418    1.9 0.0004         0.2           0.5    0.1
29:           NA      NA     NA     NA          NA            NA     NA
30:          448    4146    1.0 0.0008         0.3           1.0    0.1
31:         9008   60927    0.9 0.0003         0.3           0.8    0.1
    intensity_sd area_sd rtr_sd mzr_sd error_rt_sd error_mass_sd rec_sd
    iso_n_sd iso_c_sd
       <num>    <num>
 1:        0        1
 2:        1        1
 3:        0        1
 4:        0        1
 5:        0        0
 6:        1        6
 7:        0        3
 8:        1        2
 9:        1       10
10:        1        1
11:        1       10
12:        0        1
13:        1        0
14:        0        1
15:        0        0
16:        1        2
17:        0        0
18:        0        1
19:        0        0
20:        0        1
21:        0        0
22:        0        0
23:        0        0
24:        0        1
25:        0        0
26:        0        1
27:       NA       NA
28:        0        0
29:       NA       NA
30:        0        0
31:        0        1
    iso_n_sd iso_c_sd

ms$plot_internal_standards()

Quality control of spiked internal standards

ms$plot_groups_profile(mass = db_is, ppm = 8, sec = 10, filtered = TRUE, legendNames = TRUE)

Internal standards profile across analyses

Components

The method AnnotateFeatures was applied to annotate the natural isotopes and adducts into components. Implementation of annotation for in-source fragments is planned but not yet available with the StreamFind algorithm. The method MassSpecEngine$get_components() can be used to search for components, as shown below for the analysis number 11. Because the filters excludeIsotopes and minIntensity were applied, the isotopic features are likely filtered.

# Components of Diclofenac and Candesartan in analysis 11
ms$get_components(
  analyses = 11,
  mass = db[db$name %in% c("Diclofenac", "Candesartan"), ],
  ppm = 5, sec = 10
)

                         analysis         component           feature
                           <char>            <char>            <char>
1: 02_tof_ww_is_pos_influent-r002 F388_MZ296_RT1256 F388_MZ296_RT1256
2: 02_tof_ww_is_pos_influent-r002 F388_MZ296_RT1256 F403_MZ298_RT1256
3: 02_tof_ww_is_pos_influent-r002 F937_MZ441_RT1097 F937_MZ441_RT1097
4: 02_tof_ww_is_pos_influent-r002 F937_MZ441_RT1097 F940_MZ442_RT1097
5: 02_tof_ww_is_pos_influent-r002 F937_MZ441_RT1097 F946_MZ443_RT1097
             group      rt       mz      area intensity   rtmin   rtmax
            <char>   <num>    <num>     <num>     <num>   <num>   <num>
1: M295_R1256_1106 1255.60 296.0246 116940.30  18776.84 1251.92 1264.44
2: M295_R1256_1106 1255.60 298.0215  79347.27  14199.29 1251.92 1263.00
3: M440_R1097_2691 1096.58 441.1681 588979.40 109887.30 1091.92 1105.25
4: M440_R1097_2691 1096.58 442.1707 151194.10  28418.03 1093.57 1105.25
5: M440_R1097_2691 1096.58 443.1739  23518.97   4544.72 1093.57 1104.00
      mzmin    mzmax isocount polarity     mass adduct filtered filter filled
      <num>    <num>    <int>    <num>    <num> <char>   <lgcl> <char> <lgcl>
1: 296.0238 296.0256        1        1 295.0173 [M+H]+    FALSE   <NA>  FALSE
2: 298.0208 298.0230        1        1 297.0142 [M+H]+     TRUE   <NA>  FALSE
3: 441.1674 441.1690        1        1 440.1609 [M+H]+    FALSE   <NA>  FALSE
4: 442.1691 442.1712        1        1 441.1634 [M+H]+     TRUE   <NA>  FALSE
5: 443.1711 443.1762        1        1 442.1666 [M+H]+     TRUE   <NA>  FALSE
             quality               eic                ms1                 ms2
              <list>            <list>             <list>              <list>
1: <data.table[1x7]> <data.table[1x6]> <data.table[12x7]>  <data.table[25x7]>
2: <data.table[0x0]> <data.table[0x0]>  <data.table[0x0]>   <data.table[0x0]>
3: <data.table[1x7]> <data.table[1x6]>  <data.table[8x7]> <data.table[106x7]>
4: <data.table[0x0]> <data.table[0x0]>  <data.table[0x0]>   <data.table[0x0]>
5: <data.table[0x0]> <data.table[0x0]>  <data.table[0x0]>   <data.table[0x0]>
     istd           suspects        name index iso_size iso_charge iso_step
   <list>             <list>      <char> <int>    <int>      <int>    <int>
1: [NULL] <data.table[1x16]>  Diclofenac   387        2          1        0
2: [NULL]          <list[0]>  Diclofenac   402        2          1        2
3: [NULL] <data.table[1x16]> Candesartan   936        3          1        0
4: [NULL]          <list[0]> Candesartan   939        3          1        1
5: [NULL]          <list[0]> Candesartan   945        3          1        2
   iso_cat  iso_isotope iso_mzr iso_relative_intensity
    <char>       <char>   <num>                  <num>
1:     M+0         37Cl 0.00104                1.00000
2:     M+2         37Cl 0.00220                0.75621
3:     M+0  13C 13C/13C 0.00089                1.00000
4:     M+1          13C 0.00275                0.25861
5:     M+2      13C/13C 0.00275                0.04136
   iso_theoretical_min_relative_intensity
                                    <num>
1:                                0.00000
2:                                0.31977
3:                                0.00000
4:                                0.20921
5:                                0.00000
   iso_theoretical_max_relative_intensity iso_mass_distance
                                    <num>             <num>
1:                                0.00000           0.00000
2:                                3.19766           1.99695
3:                                0.00000           0.00000
4:                                0.31381           1.00256
5:                                0.58819           2.00571
   iso_theoretical_mass_distance iso_mass_error iso_time_error
                           <num>          <num>          <num>
1:                       0.00000        0.00000              0
2:                       1.99705        0.00010              0
3:                       0.00000        0.00000              0
4:                       1.00335        0.00079              0
5:                       2.00671        0.00100              0
   iso_number_carbons adduct_element adduct_cat adduct_time_error
                <num>         <char>     <char>             <num>
1:                  0                                           0
2:                  0                                           0
3:                 24                                           0
4:                 24                                           0
5:                 24                                           0
   adduct_mass_error
               <num>
1:                 0
2:                 0
3:                 0
4:                 0
5:                 0

The components (i.e., isotopes and adducts) can also be visualized with the method MassSpecEngine$map_components(), as shown below for the internal standards added to analysis 11. Note that again the filtered argument was set to TRUE to return also filtered features, as the internal standards were likely excluded by blank subtraction.

ms$map_components(
  analyses = 11,
  mass = db_is,
  ppm = 8, sec = 10,
  filtered = TRUE,
  legendNames = TRUE
)

Components (i.e., isotopes and adducts) of internal standards in analysis 11

Suspects

The methods MassSpecEngine$get_suspects() and MassSpecEngine$plot_suspects() can be used to inspect the suspect screening results. In the plot function, a second plot is added to compare the experimental fragmentation pattern (top) with the fragmentation pattern of the respective reference standard (down) added within the database. The colorBy argument can be set to targets+replicates to legend the plot with combined keys of suspect target names and analysis replicate names.

ms$get_suspects()

             group   name id_level error_mass error_rt shared_fragments
            <char> <char>    <ord>      <num>    <num>            <int>
1:  M236_R1079_292      0        1        1.8      0.2                9
2:  M253_R1015_479      0        1        0.4      0.1                6
3:   M267_R916_636      0        1        0.7      0.5               10
4: M295_R1256_1106      0        1        1.4      0.6                5
5:  M325_R957_1536      0        1        1.1      0.9                7
6: M440_R1097_2691      0        1        1.6      0.3               17
   effluent_pos influent_neg influent_pos           i.name
          <num>        <num>        <num>           <char>
1:         0.00        0.000    66019.723    Carbamazepine
2:         0.00        0.000     9679.877 Sulfamethoxazole
3:     15409.08        0.000    49655.950       Metoprolol
4:         0.00     7786.457    19602.253       Diclofenac
5:     13762.82        0.000    48375.420       Bisoprolol
6:     22512.84    34920.950   112540.650      Candesartan

ms$plot_suspects(colorBy = "targets+replicates")

Methods from patRoon

The NTS object holds the original objects from the patRoon package, which can be accessed directly. For instance, the S4 class features or featureGroups objects are can be accessed via the NTS object with ms$nts$features. The patRoon package provides a set of functions for the native objects and can be applied the ms$nts$features object, as shown below. See more information in the patRoon reference documentation.

# Native patRoon object
ms$nts$features

A featureGroupsSet object
Hierarchy:
featureGroups
    |-- featureGroupsSet
      |-- featureGroupsScreeningSet
---
Object size (indication): 11 MB
Algorithm: openms-set
Feature groups: M199_R1261_4, M201_R1031_22, M204_R969_33, M204_R1050_39, M210_R1034_77, M211_R917_90, ... (216 total)
Features: 867 (4.0 per group)
Has normalized intensities: FALSE
Internal standards used for normalization: no
Predicted concentrations: none
Predicted toxicities: none
Analyses: 01_tof_ww_is_neg_blank-r001, 01_tof_ww_is_neg_blank-r002, 01_tof_ww_is_neg_blank-r003, 01_tof_ww_is_pos_blank-r001, 01_tof_ww_is_pos_blank-r002, 01_tof_ww_is_pos_blank-r003, ... (18 total)
Replicate groups: blank_neg, blank_pos, influent_neg, influent_pos, effluent_neg, effluent_pos (6 total)
Replicate groups used as blank: blank_neg, blank_pos (2 total)
Sets: negative, positive

# Using the native patRoon's plotUpSet method
patRoon::plotUpSet(ms$nts$features)

UpSet plot of features

More to come

Future integration:

Correcting matrix for better comparison between influent and effluent
Annotation of in-source fragments
Screening of transformation products using the biotransformer tool via patRoon
Fold-change and other statistic analysis for inter-sample correlation

Ricardo Cunha

28 November, 2024