Introduction
Differential abundance testing in microbiome data challenges both parametric and non-parametric statistical methods, due to its sparsity, high variability and compositional nature. Microbiome-specific statistical methods often assume classical distribution models or take into account compositional specifics. These produce results that range within the specificity vs sensitivity space in such a way that type I and type II error are difficult to ascertain in real microbiome data when a single method is used. Recently, a consensus approach based on multiple differential abundance (DA) methods was recently suggested in order to increase robustness.
With dar, you can use dplyr-like pipeable sequences of DA methods and then apply different consensus strategies. In this way we can obtain more reliable results in a fast, consistent and reproducible way.
Installation
You can install the development version of dar from GitHub with:
# install.packages("pak")
pak::pkg_install("MicrobialGenomics-IrsicaixaOrg/dar")Usage
library(dar)
#> Registered S3 methods overwritten by 'vegan':
#> method from
#> reorder.hclust seriation
#> rev.hclust dendextend
data("metaHIV_phy")
## Define recipe
rec <-
recipe(metaHIV_phy, var_info = "RiskGroup2", tax_info = "Species") |>
step_subset_taxa(tax_level = "Kingdom", taxa = c("Bacteria", "Archaea")) |>
step_filter_taxa(.f = "function(x) sum(x > 0) >= (0.03 * length(x))") |>
step_maaslin() |>
step_aldex()
rec
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 451 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Preporcessing steps:
#>
#> ◉ step_subset_taxa() id = subset_taxa__Komaj_sehen
#> ◉ step_filter_taxa() id = filter_taxa__Zlebia
#>
#> DA steps:
#>
#> ◉ step_maaslin() id = maaslin__Mille_feuille
#> ◉ step_aldex() id = aldex__Shakarbura
## Prep recipe
da_results <- prep(rec, parallel = TRUE)
da_results
#> ── DAR Results ─────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 278 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Results:
#>
#> ✔ maaslin__Mille_feuille diff_taxa = 52
#> ✔ aldex__Shakarbura diff_taxa = 96
#>
#> ℹ 35 taxa are present in all tested methods
## Consensus strategy
n_methods <- 2
da_results <- bake(da_results, count_cutoff = n_methods)
da_results
#> ── DAR Results ─────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 278 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Results:
#>
#> ✔ maaslin__Mille_feuille diff_taxa = 52
#> ✔ aldex__Shakarbura diff_taxa = 96
#>
#> ℹ 35 taxa are present in all tested methods
#>
#> Bakes:
#>
#> ◉ 1 -> count_cutoff: 2, weights: NULL, exclude: NULL, id: bake__Birnbrot
## Results
cool(da_results)
#> ℹ Bake for count_cutoff = 2
#> # A tibble: 35 × 2
#> taxa_id taxa
#> <chr> <chr>
#> 1 Otu_78 Bacteroides_uniformis
#> 2 Otu_88 Odoribacter_splanchnicus
#> 3 Otu_119 Alistipes_putredinis
#> 4 Otu_129 Parabacteroides_merdae
#> 5 Otu_125 Parabacteroides_distasonis
#> 6 Otu_82 Barnesiella_intestinihominis
#> 7 Otu_96 Prevotella_copri
#> 8 Otu_51 Bacteroides_dorei
#> 9 Otu_332 Catenibacterium_mitsuokai
#> 10 Otu_62 Bacteroides_ovatus
#> # ℹ 25 more rowsContributing
If you think you have encountered a bug, please submit an issue.
Either way, learn how to create and share a reprex (a minimal, reproducible example), to clearly communicate about your code.
Working on your first Pull Request? You can learn how from this free series How to Contribute to an Open Source Project on GitHub
Code of Conduct
Please note that the dar project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
