Skip to content

Commit a95e75a

Browse files
author
chakalakka
committed
added section about univariate distributions to vignette
1 parent 0d43efe commit a95e75a

File tree

6 files changed

+39
-6
lines changed

6 files changed

+39
-6
lines changed

R/plotting.R

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -297,15 +297,15 @@ plotHistogram <- function(model, state=NULL, chromosomes=NULL, start=NULL, end=N
297297
### Plot the distributions
298298
if (is.null(state)) {
299299
ggplt <- ggplt + geom_line(data=df, aes_string(x='x', y='y', col='state'), size=linewidth)
300-
ggplt <- ggplt + scale_color_manual(name="components", values=getStateColors(c('unmodified','modified','total')), labels=legend) + theme(legend.justification=c(1,1), legend.position=c(1,1))
300+
ggplt <- ggplt + scale_color_manual(name="components", values=getStateColors(c('unmodified','modified','total')), labels=legend) + theme(legend.justification=c(1,1), legend.position=c(0.99,0.99))
301301
} else {
302302
if (state=="unmodified") {
303303
ggplt <- ggplt + geom_line(data=df[df$state=='unmodified',], aes_string(x='x', y='y', col='state'), size=linewidth)
304-
ggplt <- ggplt + scale_color_manual(name="components", values=getStateColors(c('unmodified')), labels=legend[1]) + theme(legend.justification=c(1,1), legend.position=c(1,1))
304+
ggplt <- ggplt + scale_color_manual(name="components", values=getStateColors(c('unmodified')), labels=legend[1]) + theme(legend.justification=c(1,1), legend.position=c(0.99,0.99))
305305
}
306306
if (state=="modified") {
307307
ggplt <- ggplt + geom_line(data=df[df$state=='modified',], aes_string(x='x', y='y', col='state'), size=linewidth)
308-
ggplt <- ggplt + scale_color_manual(name="components", values=getStateColors(c('modified')), labels=legend[2]) + theme(legend.justification=c(1,1), legend.position=c(1,1))
308+
ggplt <- ggplt + scale_color_manual(name="components", values=getStateColors(c('modified')), labels=legend[2]) + theme(legend.justification=c(1,1), legend.position=c(0.99,0.99))
309309
}
310310
}
311311

24.2 KB
Binary file not shown.
24.2 KB
Binary file not shown.
6.89 KB
Binary file not shown.

vignettes/chromstaR.Rnw

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,8 @@ exportCounts(model, filename=tempfile())
8282
@
8383
\end{scriptsize}
8484

85+
\textbf{!! It is important that the distributions are fitted correctly !!} Please check section \ref{sec:FAQ_example_histograms} for examples of how this plot should \emph{not} look like and what can be done to get a correct fit.
86+
8587
\subsection{\label{sec:broad}Task 2: Peak calling for a broad histone modification}
8688

8789
Examples of histone modifications with a broad profile are H3K9me3, H3K27me3, H3K36me3, H4K20me1 in most human tissues. These modifications usually cover broad domains of the genome, and the enrichment is best captured with a bin size between 500bp and 2000bp.
@@ -147,6 +149,8 @@ plotHistogram(model) + ggtitle('H4K20me1')
147149
@
148150
\end{scriptsize}
149151

152+
\textbf{!! It is important that the distributions are fitted correctly !!} Please check section \ref{sec:FAQ_example_histograms} for examples of how this plot should \emph{not} look like and what can be done to get a correct fit.
153+
150154
\subsection{Task 3: Peak calling for ATAC-seq, DNase-seq, FAIRE-seq, ...}
151155

152156
Peak calling for ATAC-seq and DNase-seq is similar to the peak calling of a narrow histone modification (section~\ref{sec:narrow}). FAIRE-seq experiments seem to exhibit a broad profile with our model, so the procedure is similar to the domain calling of a broad histone modification (section~\ref{sec:broad}).
@@ -191,7 +195,7 @@ exp <- data.frame(file=files, mark='H3K27me3', condition='SHR', replicate=1:4,
191195
# We use bin size 1000bp and chromosome 12 to keep the example quick
192196
binned.data <- list()
193197
for (file in files) {
194-
binned.data[[basename(file)]] <- binReads(file, binsize=1000, stepsizes=500,
198+
binned.data[[basename(file)]] <- binReads(file, binsize=1000, stepsizes=500,
195199
assembly=rn4_chrominfo, chromosomes='chr12',
196200
experiment.table=exp)
197201
}
@@ -203,9 +207,13 @@ for (file in files) {
203207
models <- list()
204208
for (i1 in 1:length(binned.data)) {
205209
models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60)
210+
plotHistogram(models[[i1]])
206211
}
207212
@
208213

214+
\textbf{!! It is important that the distributions are fitted correctly !!} Please check section \ref{sec:FAQ_example_histograms} for examples of how this plot should \emph{not} look like and what can be done to get a correct fit.\\
215+
216+
209217
<<multivariate_replicate_peak_calling, results='markup', message=FALSE, eval=TRUE>>=
210218
## === Step 4: Check replicate correlation ===
211219
# We run a multivariate peak calling on all 4 replicates
@@ -260,7 +268,7 @@ head(rn4_chrominfo)
260268
#=== Step 2: Run Chromstar ===
261269
## Run ChromstaR
262270
Chromstar(inputfolder, experiment.table=experiment_table_H4K20me1,
263-
outputfolder=outputfolder, numCPU=4, binsize=1000, stepsize=500,
271+
outputfolder=outputfolder, numCPU=4, binsize=1000, stepsize=500,
264272
assembly=rn4_chrominfo, prefit.on.chr='chr12', chromosomes='chr12',
265273
mode='differential')
266274
@
@@ -271,6 +279,9 @@ model <- get(load(file.path(outputfolder,'multivariate',
271279
'multivariate_mode-differential_mark-H4K20me1.RData')))
272280
@
273281

282+
\textbf{!! It is important that the distributions in folder outputfolder/PLOTS/univariate-distributions are fitted correctly !!} Please check section \ref{sec:FAQ_example_histograms} for examples of how this plot should \emph{not} look like and what can be done to get a correct fit.\\
283+
284+
274285
<<multivariate_differential_stateBrewer, results='markup', message=TRUE, eval=TRUE>>=
275286
## === Step 3: Construct differential and common states ===
276287
diff.states <- stateBrewer(experiment_table_H4K20me1, mode='differential',
@@ -326,6 +337,9 @@ Chromstar(inputfolder, experiment.table=experiment_table_SHR,
326337
mode='combinatorial')
327338
@
328339

340+
\textbf{!! It is important that the distributions in folder outputfolder/PLOTS/univariate-distributions are fitted correctly !!} Please check section \ref{sec:FAQ_example_histograms} for examples of how this plot should \emph{not} look like and what can be done to get a correct fit.\\
341+
342+
329343
<<multivariate_combinatorial_listfiles, results='markup', message=FALSE, eval=TRUE, fig.width=4, fig.height=3, out.width='0.5\\textwidth'>>=
330344
## Results are stored in 'outputfolder' and can be loaded for further processing
331345
list.files(outputfolder)
@@ -354,6 +368,7 @@ genes <- GRanges(seqnames=paste0('chr',genes$chromosome_name),
354368
ranges=IRanges(start=genes$start, end=genes$end),
355369
strand=genes$strand,
356370
name=genes$external_gene_id, biotype=genes$gene_biotype)
371+
seqlevels(genes)[seqlevels(genes)=='chrMT'] <- 'chrM'
357372
print(genes)
358373
@
359374

@@ -411,6 +426,7 @@ expression.SHR <- GRanges(seqnames=paste0('chr',expr$chromosome_name),
411426
strand=expr$strand, name=expr$external_gene_id,
412427
biotype=expr$gene_biotype,
413428
expression=expr$expression_SHR)
429+
seqlevels(expression.SHR)[seqlevels(expression.SHR)=='chrMT'] <- 'chrM'
414430
# We apply an asinh transformation to reduce the effect of outliers
415431
expression.SHR$expression <- asinh(expression.SHR$expression)
416432
@@ -472,6 +488,9 @@ model <- get(load(file.path(outputfolder,'combined',
472488
'combined_mode-differential.RData')))
473489
@
474490

491+
\textbf{!! It is important that the distributions in folder outputfolder/PLOTS/univariate-distributions are fitted correctly !!} Please check section \ref{sec:FAQ_example_histograms} for examples of how this plot should \emph{not} look like and what can be done to get a correct fit.\\
492+
493+
475494
<<combined_analysis, results='markup', message=FALSE, eval=TRUE>>=
476495
#=== Step 3: Analysis and export ===
477496
## Obtain all genomic regions where the two tissues have different states
@@ -506,6 +525,7 @@ genes <- GRanges(seqnames=paste0('chr',genes$chromosome_name),
506525
ranges=IRanges(start=genes$start, end=genes$end),
507526
strand=genes$strand,
508527
name=genes$external_gene_id, biotype=genes$gene_biotype)
528+
seqlevels(genes)[seqlevels(genes)=='chrMT'] <- 'chrM'
509529
print(genes)
510530
@
511531

@@ -529,7 +549,7 @@ plots[['BN']] + facet_wrap(~ mark) +
529549
tss <- resize(genes, width = 3, fix = 'start')
530550
biotypes <- split(tss, tss$biotype)
531551
plots <- plotFoldEnrichHeatmap(model, annotations=biotypes)
532-
plots[['BN']] + coord_flip() +
552+
plots[['BN']] + coord_flip() +
533553
ggtitle('Fold enrichment with different biotypes')
534554
@
535555
\end{scriptsize}
@@ -607,7 +627,20 @@ heatmapCombinations(marks=c('H3K4me3', 'H3K27me3', 'H3K36me3', 'H3K27Ac'))
607627
@
608628
\end{scriptsize}
609629

630+
\subsection{\label{sec:FAQ_example_histograms}Examples of problematic distributions.}
631+
632+
For the chromstaR peak calling to work correctly it is essential that the Baum-Welch algorithm correctly identifies unmodified (background) and modified (signal/peak) components in the data. Therefore, you should always check the plots in folder \textbf{PLOTS/univariate-distributions} for correct convergence. Here are some plots that indicate failed and succesful fitting procedures:
633+
634+
% p1 <- ggdraw(p1) + draw_label("WRONG", angle = -45, size = 80, alpha = .2, colour = 'red')
635+
% p2 <- ggdraw(p2) + draw_label("CORRECT", angle = -45, size = 80, alpha = .2, colour = 'green')
636+
% cowplt <- plot_grid(p1, p2, labels = letters[1:2])
637+
% ggsave(cowplt, filename = '~/Bioconductor/chromstaR/vignettes/PLOTS/H3K27me3-Adult-rep2_binsize1000.pdf', width=42, height=15, units='cm')
638+
639+
\includegraphics[width=\textwidth]{PLOTS/H3K27me3-Adult-rep2_binsize1000.pdf}
640+
The plot shows data for H3K27me3 at binsize 1000bp. (a) Incorrectly converged fit, where the \textbf{modified} component (red) has lower read counts than the \textbf{unmodified} component (gray). (b) Correctly converged fit. Even here, the fit could be improved by reducing the average number of reads per bin, either by selecting a smaller binsize or by downsampling the data before using chromstaR.
610641

642+
\includegraphics[width=\textwidth]{PLOTS/H3K27me3-Adult-rep2_binsize150.pdf}
643+
The plot shows data for H3K27me3 at binsize 150bp. (a) Incorrectly converged fit, where the \textbf{modified} component (red) has a higher density at zero reads than the \textbf{unmodified} component (gray). (b) Correctly converged fit.
611644

612645
\section{Session Info}
613646
\begin{scriptsize}

vignettes/chromstaR.pdf

689 KB
Binary file not shown.

0 commit comments

Comments
 (0)