Skip to content

Fast Parallel UniFrac

joey711 edited this page Feb 8, 2012 · 19 revisions

A common tool in microbial ecology studies involving many samples is to calculate the "UniFrac" distance between all pairs of samples, and then perform various analyses on the resulting distance matrix.

The phyloseq package includes a native R implementation of both the original UniFrac algorithm, as well as the better, faster, cleaner "Fast UniFrac" algorithm. Both approaches arrive at the same result. There are also two very different types of UniFrac calculation:

Weighted UniFrac - which does take into account differences in abundance of species between samples, but takes longer to calculate; and

Unweighted UniFrac - which only considers the presence/absence of species between sample pairs.

Both can be useful, and share slightly different insight. Both weighted and unweighted UniFrac are included, and all UniFrac calculations have the option of running "in parallel" for faster results on computers that have multiple cores/processors available.

All of this is accessed through a single function call:

UniFrac(physeq, weighted=FALSE, normalized=TRUE, parallel=FALSE, fast=TRUE)

UniFrac time trial

Clone this wiki locally