Genome size

posted on 02:07 PM on Tuesday 29 July 2014

For computation of genome coverage, a bed file containing the sizes of the chromosomes is very useful. This can be done easily on the command line as follows:

# get the hg19 sizes from UCSC
wget "http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz"

# gunzip and pipe to awk to process the file to a standard bed file
gunzip -c chromInfo.txt.gz | awk -F $'\t' 'BEGIN{OFS=FS}{print $1, 0, $2}' > hg19.bed

Coverage of some features in another BED file can then be computed using bedtools as such:

bedtools coverage -a some_features.bed -b hg19.bed

This will produce a output showing the coverage on a per chromosome basis.

bernett.net