Changing the ID of VCF files

posted on 03:11 PM on Tuesday 08 March 2016

I typically like to use the format "chr:pos:ref:alt" as the ID for genotyping data. I feel that this is better than a rsid as it is descriptive enough that I can tell the location and alleles. Furthermore there is no ambiguity when matching or comparing SNPs. To change a VCF file to use this ID, do the following using bcftools.

bcftools query -f "%CHROM\t%POS\t%REF\t%ALT\t%ID\tchr%CHROM:%POS:%REF:%ALT\n" test.vcf.gz > oids.txt
bgzip -f oids.txt
tabix -s1 -b2 -e2 oids.txt.gz
bcftools annotate -a oids.txt.gz -c CHROM,POS,REF,ALT,OID,ID -h <(echo '##INFO=') test.vcf.gz > new.vcf
bgzip -f new.vcf tabix new.vcf.gz

The commands will create a new tab delimited file with the new ID and the old ID. The tab delimited file will then be compressed and indexed. This is then used by bcftools to annotate the VCF to create a new VCF file which is then compressed and indexed.

bernett.net