Determine taxa whose absolute abundances, per unit volume, of the ecosystem (e.g., gut) are significantly different with changes in the covariate of interest (e.g., group). The current version of ancombc2 function implements Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC2) in cross-sectional and repeated measurements data. In addition to the two-group comparison, ANCOM-BC2 also supports testing for continuous covariates and multi-group comparisons, including the global test, pairwise directional test, Dunnett's type of test, and trend test.
Usage
step_ancom(
rec,
fix_formula = get_var(rec)[[1]],
rand_formula = NULL,
p_adj_method = "holm",
prv_cut = 0.1,
lib_cut = 0,
s0_perc = 0.05,
group = NULL,
struc_zero = FALSE,
neg_lb = FALSE,
alpha = 0.05,
n_cl = 1,
verbose = FALSE,
global = FALSE,
pairwise = FALSE,
dunnet = FALSE,
trend = FALSE,
rarefy = FALSE,
id = rand_id("ancom")
)
# S4 method for class 'Recipe'
step_ancom(
rec,
fix_formula = get_var(rec)[[1]],
rand_formula = NULL,
p_adj_method = "holm",
prv_cut = 0.1,
lib_cut = 0,
s0_perc = 0.05,
group = NULL,
struc_zero = FALSE,
neg_lb = FALSE,
alpha = 0.05,
n_cl = 1,
verbose = FALSE,
global = FALSE,
pairwise = FALSE,
dunnet = FALSE,
trend = FALSE,
rarefy = FALSE,
id = rand_id("ancom")
)
# S4 method for class 'PrepRecipe'
step_ancom(
rec,
fix_formula = get_var(rec)[[1]],
rand_formula = NULL,
p_adj_method = "holm",
prv_cut = 0.1,
lib_cut = 0,
s0_perc = 0.05,
group = NULL,
struc_zero = FALSE,
neg_lb = FALSE,
alpha = 0.05,
n_cl = 1,
verbose = FALSE,
global = FALSE,
pairwise = FALSE,
dunnet = FALSE,
trend = FALSE,
rarefy = FALSE,
id = rand_id("ancom")
)
Arguments
- rec
A Recipe object. The step will be added to the sequence of operations for this Recipe.
- fix_formula
the character string expresses how the microbial absolute abundances for each taxon depend on the fixed effects in metadata. When specifying the fix_formula, make sure to include the group variable in the formula if it is not NULL.
- rand_formula
the character string expresses how the microbial absolute abundances for each taxon depend on the random effects in metadata. ANCOM-BC2 follows the lmerTest package in formulating the random effects. See ?lmerTest::lmer for more details. Default is NULL.
- p_adj_method
character. method to adjust p-values. Default is "holm". Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". See ?stats::p.adjust for more details.
- prv_cut
a numerical fraction between 0 and 1. Taxa with prevalences less than prv_cut will be excluded in the analysis. For instance, suppose there are 100 samples, if a taxon has nonzero counts presented in less than 10 samples, it will not be further analyzed. Default is 0.10.
- lib_cut
a numerical threshold for filtering samples based on library sizes. Samples with library sizes less than lib_cut will be excluded in the analysis. Default is 0, i.e. do not discard any sample.
- s0_perc
a numerical fraction between 0 and 1. Inspired by Significance Analysis of Microarrays (SAM) methodology, a small positive constant is added to the denominator of ANCOM-BC2 test statistic corresponding to each taxon to avoid the significance due to extremely small standard errors, especially for rare taxa. This small positive constant is chosen as s0_perc-th percentile of standard error values for each fixed effect. Default is 0.05 (5th percentile).
- group
character. The name of the group variable in metadata. group should be discrete. Specifying group is required for detecting structural zeros and performing multi-group comparisons (global test, pairwise directional test, Dunnett's type of test, and trend test). Default is NULL. If the group of interest contains only two categories, leave it as NULL.
- struc_zero
logical. Whether to detect structural zeros based on group. Default is FALSE. See Details for a more comprehensive discussion on structural zeros.
- neg_lb
logical. Whether to classify a taxon as a structural zero using its asymptotic lower bound. Default is FALSE.
- alpha
numeric. Level of significance. Default is 0.05.
- n_cl
numeric. The number of nodes to be forked. For details, see ?parallel::makeCluster. Default is 1 (no parallel computing).
- verbose
logical. Whether to generate verbose output during the ANCOM-BC2 fitting process. Default is FALSE.
- global
logical. Whether to perform the global test. Default is FALSE.
- pairwise
logical. Whether to perform the pairwise directional test. Default is FALSE.
- dunnet
logical. Whether to perform the Dunnett's type of test. Default is FALSE.
- trend
logical. Whether to perform trend test. Default is FALSE.
- rarefy
Boolean indicating if OTU counts must be rarefyed. This rarefaction uses the standard R sample function to resample from the abundance values in the otu_table component of the first argument, physeq. Often one of the major goals of this procedure is to achieve parity in total number of counts between samples, as an alternative to other formal normalization procedures, which is why a single value for the sample.size is expected. If 'no_seed', rarefaction is performed without a set seed.
- id
A character string that is unique to this step to identify it.
See also
Other Diff taxa steps:
step_aldex()
,
step_corncob()
,
step_deseq()
,
step_lefse()
,
step_maaslin()
,
step_metagenomeseq()
,
step_wilcox()
Examples
data(metaHIV_phy)
## Init Recipe
rec <-
recipe(metaHIV_phy, "RiskGroup2", "Phylum") |>
step_subset_taxa(tax_level = "Kingdom", taxa = c("Bacteria", "Archaea")) |>
step_filter_taxa(.f = "function(x) sum(x > 0) >= (0.4 * length(x))")
rec
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 451 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Phylum
#>
#> Preporcessing steps:
#>
#> ◉ step_subset_taxa() id = subset_taxa__Roti_tissue
#> ◉ step_filter_taxa() id = filter_taxa__Linzer_torte
#>
#> DA steps:
#>
## Define step with default parameters and prep
rec <-
step_ancom(rec) |>
prep(parallel = FALSE)
#> Registered S3 methods overwritten by 'proxy':
#> method from
#> print.registry_field registry
#> print.registry_entry registry
#> Warning: The number of taxa used for estimating sample-specific biases is: 6
#> A large number of taxa (>50) is required for the consistent estimation of biases
#> Loading required package: foreach
#> Loading required package: rngtools
#> Warning: The number of taxa used for estimating sample-specific biases is: 6
#> A large number of taxa (>50) is required for the consistent estimation of biases
#> Warning: The number of taxa used for estimating sample-specific biases is: 6
#> A large number of taxa (>50) is required for the consistent estimation of biases
rec
#> ── DAR Results ─────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 76 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Phylum
#>
#> Results:
#>
#> ✔ ancom__Trdelník diff_taxa = 24
#>
#> ℹ 24 taxa are present in all tested methods
#>
## Wearing rarefaction only for this step
rec <-
recipe(metaHIV_phy, "RiskGroup2", "Species") |>
step_ancom(rarefy = TRUE)
rec
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 451 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Preporcessing steps:
#>
#>
#> DA steps:
#>
#> ◉ step_ancom() id = ancom__Pineapple_cake