February 25, 2014

Create a GENCODE transcript database in R

The following gist will help the researchers in creating the gencode transcript database using the bioconductor packages. I am assuming that the user's computer has preinstalled packages "GenomicRanges" and "GenomicFeatures". Following script has the following information:
  • loads the needs bioconductor packages
  • gives information about creating the intermediate files needed for generating the database
  • brief explanation about each step in the procedure
  • create the transcript database, saving and loading when needed
  • extract information for each feature (gene, cds,transcript,exon,intron,intergenic regions) as 'GRanges' object, 'sort' when needed.
  • saves all the extracted features into combined object to be loaded in future