Monthly Archives: March 2016

Changing the ID of VCF files

I typically like to use the format “chr:pos:ref:alt” as theĀ ID for genotyping data. I feel that this is better than a rsid as it is descriptive enough that I can tell the locationĀ and alleles. Furthermore there is no ambiguity when matching or comparing SNPs.

To change a VCF file to use this ID, do the following using bcftools.

bcftools query -f "%CHROM\t%POS\t%REF\t%ALT\t%ID\tchr%CHROM:%POS:%REF:%ALT\n" test.vcf.gz > oids.txt
bgzip -f oids.txt
tabix -s1 -b2 -e2 oids.txt.gz
bcftools annotate -a oids.txt.gz -c CHROM,POS,REF,ALT,OID,ID -h <(echo '##INFO=<ID=OID,Number=1,Type=String,Description="Original ID">') test.vcf.gz > new.vcf
bgzip -f new.vcf
tabix new.vcf.gz

The commands will create a new tab delimited file with the new ID and the old ID. The tab delimited file will then be compressed and indexed. This is then used by bcftools to annotate the VCF to create a new VCF file which is then compressed and indexed.