metannotateScripts

This folder contains various post-processing scripts for metannotate analyses of metagenomes.

MetAnnotate can be found here http://metannotate.uwaterloo.ca/ and here https://bitbucket.org/doxeylab/metannotate

Some basic R commands are below:

Loading and processing output files

Set your working directory. For example, this may be the directory containing the .tsv MetAnnotate results file. E.g.,

setwd("~/Desktop/MetAnnotateResults/")

Load your dataset (here, it is a tab-separated file called data.tsv):

tb <- read.delim("data.tsv",sep='\t',header=T)

You may now want to look at the counts (# hits for all HMMs) for each dataset. To normalize these counts, divide them by the total number of reads (DNA) in each dataset.

sort(table(tb[,1]))

If there are datasets with very few (e.g., 50 or less) counts, you may want to remove them:

print("Removing...")
which(table(tb[,1]) < 50)

for (d in names(which(table(tb[,1]) < 50)))
{
	tb <- tb[-which(tb[,1] == d),]
}

tb[,1] <- as.character(tb[,1])

Plotting

Barplots

The first thing you need to decide on is the data column you want to analyze. For order level taxon assignments predicted using the tree-based method ...

k <- "Closest.Homolog.Order" # can be Species, Genus, Family, Order, Class...

Now run the following code to make your barplot

# a color scheme
pal12 = c("#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", 
"#E31A1C", "#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", 
"#FFFF99", "#B15928") 
cols <- colorRampPalette(pal12)(length(levels(as.factor(tb[,k]))))

par(mfrow=c(1,2))

tb.2 <- t(table(tb[,c(1,which(colnames(tb) == k))]))
for (i in 1:ncol(tb.2))
{
	tb.2[,i] <- tb.2[,i] / sum(tb.2[,i])
}

tb.2 <- t(na.omit(t(tb.2)))


## suppose you want to reorder the barplot and show only a subset of datasets in a specific order. Uncomment the following two lines
# datasetOrder <- c("4446153","4477804","4537195","SRR908273a")  # a hypothetical list of datasets
# tb.2 <- tb.2[,match(datasetOrder,colnames(tb.2))]

barplot(tb.2,col=cols,las=3, cex.names=0.5)
plot.new()

legend("left", inset=.05, title="Class",legend =
  	c(levels(as.factor(tb[,k]))), fill = cols,cex=0.2)

Clustered Heatmap

You will need to have the pheatmap package installed.

library(pheatmap)

As before, you need to pick a taxonomic level. This time it is called sColumn

k <- "Closest.Homolog.Genus" #species column for which to do the species breakdown/analysis

Since showing all taxa may be overwhelming, we can choose to show only the most frequent species -- those present greater than some threshold in at least one dataset. This is done with the mpt parameter

mpt <- 0.02 #max fraction of a taxa in any dataset required for inclusion in heat map

The following parameter can also be set to less to reduce the size of labels in the heatmap

cexVal <- 1

Now run...

matr <- table(tb[,k],tb[,1])
matr <- sweep(matr, 2, colSums(matr), "/")
maxPerTaxa <- apply(matr,1,max)
cexVal <- 1 #sets size of labels
ll <- pheatmap((matr[which(maxPerTaxa > mpt),]),cex=cexVal)

Principal coordinates analysis (PCOA)

PCOA is really easy in R. First load the vegan and ape packages

library(vegan)
library(ape)

Just run:

D <- vegdist(t(matr[which(maxPerTaxa > mpt),]))   ## works better when rare species are removed
biplot(pcoa(D))

Generating an OTU table

otu.tb1 <- table(tb[,k],tb[,2])
otu.tb2 <- data.frame(rownames(otu.tb1),as.data.frame.matrix(otu.tb1))
colnames(otu.tb2)[1] <- "OTU"
write.table(otu.tb2,file="otutable.tsv",sep="\t",row.names=F)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
compareToPrimer.pl		compareToPrimer.pl
phylogeneticNovelty.R		phylogeneticNovelty.R
taxPredictionsWithErrorBars.R		taxPredictionsWithErrorBars.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

metannotateScripts

Loading and processing output files

Plotting

Barplots

Clustered Heatmap

Principal coordinates analysis (PCOA)

Generating an OTU table

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

doxeylab/metannotateScripts

Folders and files

Latest commit

History

Repository files navigation

metannotateScripts

Loading and processing output files

Plotting

Barplots

Clustered Heatmap

Principal coordinates analysis (PCOA)

Generating an OTU table

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages