October 23, 2014

Extending methylKit : Extract promoters with differentially methylated CpGs

In my previous post, I wrote about the features of methylKit. Here, I will discuss how to extend bisulfite sequencing data analysis beyond methylKit.

Annotation is an important feature of genomic analyses. Coming to bisulfite sequencing analyses such as RRBS or WGBS, methylKit does a pretty good job by identifying the differentially methylated individual CpGs or any specific regions/tiles. It also performs basic annotation and plots pie charts indicating where all the differentially methylated CpGs overlap the genic annotations.

The adjacent picture is an example of the kind of annotation performed by methylKit.Using the native functions, methylKit users will be able to annotate the differentially methylated CpGs to the genic regions. Adjacent picture indicates that among all the diffmeth
CpGs 46% overlap with promoters, 27% overlap with intergenic region. Another way to look at the annotations is to identify the list of promoters, exons, introns, intergenic regions that overlap differentially methylated CpGs. There are no methylKit functions to perform the annotations from the point of genic regions.However, methylKit facilitates this by enabling the coercion of the methylKit objects into "GRanges" (GenomicRanges). The following script will help methylKit users in extracting the list of promoter/exons/introns that overlap with differentially methylated CpGs.

October 21, 2014

methylKit for bisulfite sequencing data analysis

I have been relying upon methylKit, an R package for my RRBS data analysis. It is one the most highly cited R package for analysing bisulifite sequencing data. It is straight forward to install and it's vignette details all the major steps in the bisulfite analysis with clarity. Altuna Akalin, the author of the methylKit has been actively supporting (via google groups) the issues faced by the users in implementing methylKit. Overall, methylKit could also be used with little knowledge of R. Interestingly, working with methylKit also helps laboratory researchers learn R.

As with other bisulfite sequencing data analysis packages, methylkit takes charge once the bisulfite reads are aligned to the genome. Here are the tasks one can implement using methylKit:

  • Extract methylation information from aligned data from mappers like Bismark
  • Alternatively, one can read the methylation information of mapped cytosines easily from other mappers like BSMAP  or any other bisulfite mapper in a specified format 
  • Normalize the CpGs covered by removing the CpGs that have excess coverage due to over amplification/PCR duplication
  • Calculate methylation status of each CpG covered (or specified regions or tiles across genome) and export them into bed or bedcoverage formats for visualization in a genome browser. methylKit also enables merging of strand coverage.
  • Enables the consideration of replicates among control and test samples
  • Calculate differential methylation either at single CpG levels or regions/tiles covered across the control and test samples
  • Facilitates PCA and cluster analysis to identify the overall relation among the samples from methylation point of view
  • Enables annotation of differential methylation across CpG islands/shores and multiple genic regions of interest such as promoters, exons, introns......
Any genomic analysis is highly customized after a certain number of basic steps. One has to build the customization by utilizing the options among several packages and bridging the gaps by fine tuning the input and output file formats. methylKit does a fairly good job by facilitating the coercion of methylKit objects into GenomicRanges objects such as GRanges. This feature enables seamless integration into multiple packages from bioconductor. In the future posts, I will detail some examples and R scripts that facilitate extension to methylKit analysis.

October 8, 2014

Are Post-bisulfite DNA library preparation kits suitable for RRBS?

Recently, a colleague asked me if post-bisulfite DNA library preparation kits are suitable for reduced representation bisulfite sequencing (RRBS). In this blog post I share my views on this concept and its applicability.

The concept of DNA library preparation after bisulfite conversion of DNA was originally introduced by Zymo research for whole genome Methyl-seq. This is really exciting because bisulfite reaction is so harsh that 90% of library gets fragmented in the traditional protocols and is not amplifiable. The amplification we see is actually from the remaning 10% library. Other advantages are listed below:
  • Generally sonication of DNA is performed in a buffer of atleast 130 ul (Volume of a Covaris micro-tube). After sonication DNA needs to be purified/concentrated. So, sonication always accompanies additional purification steps that lead to loss of DNA (purification by columns will lead to a minimum loss of 10% of the DNA). Additionally, one has to check the concentration of DNA and the fragments size before proceeding.
  • Bisulfite conversion of the whole genome is a harsh reaction that leads to random nicks in the DNA. Thus DNA is broken down into fragments. Subjecting the whole genome to bisulfite conversion is thus doing two steps: fragmentation and bisulfite conversion. Thus, it is advantageous as it avoids sonication and loss of DNA during the purifications steps.
  • Another advantage is that there is no fragmentation induced loss of DNA after ligation (as in normal Methyl-seq where bisulfite reaction is performed post ligation. This generally leads to fragmentation of ligated library that could not be amplified).
Before commenting on the applicability of post-bisulfite library preparation to RRBS, it is wise to understand the associated library preparation steps post-bisulfite conversion of whole genome. The following picture explains the methodology (adapted from zymo research webpage)


An important consideration in the above work-flow is to convert bisulfite converted ssDNA into dsDNA. This is achieved in a process akin to cDNA preparation using random oligos. Since, the fragmentation induced by bisulfite reaction is random, random oligo seems a right choice. This protocol should work fine for preparing the library for whole genome bisulfite sequencing (Methyl-seq).

To find out the suitability of this workflow for RRBS, let us revisit the basic concepts. RRBS is inteded to enrich the CpG rich regions by digesting the DNA with MspI restriction enzyme. This will result in DNA fragments ranging from as low as 40 bp to multiple kilobases in length with identical termini. However, we choose fragments in the size range of 40-480 bp that seem to be a better representation of the CpG rich regions and promoters. In the traditional RRBS protocol, we perform bisulfite conversion post adapter ligation. So, we amplify the fragments we 'choose' (with some loss during the bisulfite conversion due to random fragmentation of DNA).

Let us see what happens if we subject the MspI digested DNA to bisulfite reaction prior to library preparation.

  • DNA is further fragmented into smaller fragments.
  • This fragmentation will skew the composition of the MspI digested fragments and the desired size range of 40-480bp does not just represent CpG rich regions. This size range now contains any region of the genome.
  • Termini will not be MspI recognition motifs but random nucleotides due to chemical induced fragmentation.
  • Because of the random fragmentation, even sequencing data from replicates is likely to represent CpGs from different genomic loci reducing the overlap among replicates.
  • The QC of the RRBS reads is assessed by the 5' termini (CGG/TGG). Now, because of random fragmentation, terminal nucleotides are altered!
  • Even when random fragmentation doesn't happen, another issue exists during ssDNA to dsDNA conversion. Usage of random oligos is good for randomly fragmented termini. For MspI digested termini, the major chunk are identical termini which means, random oligos may not convert the DNA at the same efficiency!
Since there are multiple issues involved, I conclude (well I have not done any experiment yet) that post-bisulfite library preparation kits are not suitable for RRBS. In my view any company that claims the suitability of this kit should do the RRBS experiments with replicates before documenting and selling these kits. I am eager to know if this has worked for RRBS. I would be willing to retract this post then!