October 15, 2013

Reduced Representation Bisulfie Sequencing Data Analysis

I am writing this post to help researchers trying to analyse their own RRBS data. This is non-technical explanation. Just following the steps may help on any ubuntu/linux system. Requirements on the computer:
  1. Bowtie (Manual and Source available from here)
  2. Human genome in fasta format (Download hg19.2bit from here)
  3. Cutadapt (Manual and installation instructions are detailed here)
  4. Trim galore (Perl script is available here)
  5. Bismark (Excellent resource is available here)
  6. Methylkit (useful R package for downstream analysis on this google code page)
  7. Install all the above scripts/programs in the directories in your PATH or export it.

May 4, 2013

Survival tips for biologists to navigate on linux

As a biologist I noticed that there are few resources that teach programming or linux commands from the biologists view. Here, I try to summarize the basic linux commands needed to navigate the linux folder structure, getting help and other basic tasks. This information given below will be just enough for survival. For detailed learning you may refer to other sources available on internet.

Basic linux tasks and related command for survival (commands are writtien in brackets):
1. Getting help on any command (man)
2. Creating directories (mkdir)
3. Changing directories (cd)
4. Removing file/s or directory (rm)
5. Copying file/s or directory (cp)
6. Move the location of file/s or directory (mv)
7. Rename a file or list of file/s (rename)

March 20, 2013

Scope of Reduced Representation Bisulfite Sequencing Data

RRBS method explores the methylation status across the genome but at specific locations defined by the MspI recognition sites. These sites are mostly located within CpG Islands. So, how to visulaize the scope of the RRBS data - regions from where we can expect the methylation status in the human genome.

I show the localization of the CpG Island on the human genome in the following graphic. These are the most possible locations for the RRBS data sampling

This is a Karyogram view of the CpG Island on human chromosomes. Chromosomes were plotted relative to their size. Each CpG island is denoted by a single dot at the relative position on the chromosome. CpG Island close to each other are perceived as a connecting line to the human eye (hundred of dot placed closely). This picture also gives us an Idea that certain parts of the chromosomes either lack CpG islands (chr1, chr13, chr14, chr15). This could be due to incomplete mapping of human genome on these chromosomes.

I made this graphic using ggbio and ggplot2.

Methylation Status of CpG Islands Across Human Genome

Researchers are aware that a majority of the CpG islands are unmethylated. How to represent this fact in a graphic? 

The above picture created with ggplot2 explains us how the CpG methylation across the CpG Islands is distributed for each chromosome. Each CpG island is shown as a single dot(.). Methylation on the CpG island is identified by the color gradient. This explains that most of the CpG Islands are unmethylated (overlapped dots are seen as a line on the left side of the picture). Sparsely methylated CpG Islands can be identified as blue dots on the right side. Click the picture for a larger view.