Skip to contents

Introduction

Differential abundance testing in microbiome data challenges both parametric and non-parametric statistical methods, due to its sparsity, high variability and compositional nature. Microbiome-specific statistical methods often assume classical distribution models or take into account compositional specifics. These produce results that range within the specificity vs sensitivity space in such a way that type I and type II error are difficult to ascertain in real microbiome data when a single method is used. Recently, a consensus approach based on multiple differential abundance (DA) methods was recently suggested in order to increase robustness.

With dar, you can use dplyr-like pipeable sequences of DA methods and then apply different consensus strategies. In this way we can obtain more reliable results in a fast, consistent and reproducible way.

Installation

You can install the development version of dar from GitHub with:

# install.packages("pak")
pak::pkg_install("MicrobialGenomics-IrsicaixaOrg/dar")

Usage

library(dar)
#> Registered S3 methods overwritten by 'vegan':
#>   method         from      
#>   reorder.hclust seriation 
#>   rev.hclust     dendextend
data("metaHIV_phy")

## Define recipe
rec <-
  recipe(metaHIV_phy, var_info = "RiskGroup2", tax_info = "Species") |>
  step_subset_taxa(tax_level = "Kingdom", taxa = c("Bacteria", "Archaea")) |>
  step_filter_taxa(.f = "function(x) sum(x > 0) >= (0.03 * length(x))") |>
  step_maaslin() |>
  step_aldex()

rec
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#> 
#>      ℹ phyloseq object with 451 taxa and 156 samples 
#>      ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid) 
#>      ℹ taxonomic level Species 
#> 
#> Preporcessing steps:
#> 
#>      ◉ step_subset_taxa() id = subset_taxa__Komaj_sehen 
#>      ◉ step_filter_taxa() id = filter_taxa__Zlebia 
#> 
#> DA steps:
#> 
#>      ◉ step_maaslin() id = maaslin__Mille_feuille 
#>      ◉ step_aldex() id = aldex__Shakarbura

## Prep recipe
da_results <- prep(rec, parallel = TRUE)
da_results
#> ── DAR Results ─────────────────────────────────────────────────────────────────
#> Inputs:
#> 
#>      ℹ phyloseq object with 278 taxa and 156 samples 
#>      ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid) 
#>      ℹ taxonomic level Species 
#> 
#> Results:
#> 
#>      ✔ maaslin__Mille_feuille diff_taxa = 52 
#>      ✔ aldex__Shakarbura diff_taxa = 96 
#> 
#>      ℹ 35 taxa are present in all tested methods

## Consensus strategy
n_methods <- 2
da_results <- bake(da_results, count_cutoff = n_methods)
da_results
#> ── DAR Results ─────────────────────────────────────────────────────────────────
#> Inputs:
#> 
#>      ℹ phyloseq object with 278 taxa and 156 samples 
#>      ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid) 
#>      ℹ taxonomic level Species 
#> 
#> Results:
#> 
#>      ✔ maaslin__Mille_feuille diff_taxa = 52 
#>      ✔ aldex__Shakarbura diff_taxa = 96 
#> 
#>      ℹ 35 taxa are present in all tested methods 
#> 
#> Bakes:
#> 
#>      ◉ 1 -> count_cutoff: 2, weights: NULL, exclude: NULL, id: bake__Birnbrot

## Results
cool(da_results)
#> ℹ Bake for count_cutoff = 2
#> # A tibble: 35 × 2
#>    taxa_id taxa                        
#>    <chr>   <chr>                       
#>  1 Otu_78  Bacteroides_uniformis       
#>  2 Otu_88  Odoribacter_splanchnicus    
#>  3 Otu_119 Alistipes_putredinis        
#>  4 Otu_129 Parabacteroides_merdae      
#>  5 Otu_125 Parabacteroides_distasonis  
#>  6 Otu_82  Barnesiella_intestinihominis
#>  7 Otu_96  Prevotella_copri            
#>  8 Otu_51  Bacteroides_dorei           
#>  9 Otu_332 Catenibacterium_mitsuokai   
#> 10 Otu_62  Bacteroides_ovatus          
#> # ℹ 25 more rows

Contributing

Code of Conduct

Please note that the dar project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.