Up: Component summary Component


A quick and dirty wrapper for ANNOVAR. Annotates variants from a VCF file using the chosen databases and outputs CSV.

Explanations from the ANNOVAR documentation for some gene-based annotations:

exonicvariant overlaps a coding exon
splicingvariant is within 2-bp of a splicing junction
ncRNAvariant overlaps a transcript without coding annotation in the gene definition
UTR5variant overlaps a 5' untranslated region
UTR3variant overlaps a 3' untranslated region
intronicvariant overlaps an intron
upstreamvariant overlaps 1-kb region upstream of transcription start site
downstreamvariant overlaps 1-kb region downtream of transcription end site
intergenicvariant is in intergenic region

This component will add four additional columns: alt_samples, ref_samples, alt_alleles and called_samples. These indicate the number of samples presenting a non-reference allele, number of samples homozygous for the reference allele, total number of alternative alleles and number of samples for which a call was present for this variant, respectively.

Version 1
Bundle sequencing
Authors Miko Valori (miko.valori@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml vcf2annotatedcsv.sh
Usage Example with default values


Name Type Mandatory Description
vcf VCF Mandatory VCF file to be annotated.


Name Type Description
annotated CSV CSV file containing the annotated variants.


Name Type Default Description
annovar_bin string "/opt/annovar" Path to the ANNOVAR home directory.
annovar_db string "/opt/annovar/humandb" Path to the ANNOVAR database directory.
buildver string "hg18" Either hg19 or hg18.
operation string "g" Comma separated list of annotation operations corresponding to the databases listed in the protocol parameter. For the protocol example we would use "g,f,f,r". Use "g" for gene-based annotations, "f" for filter-based and "r" for region-based. Gene-based annotations are for gene definition files, filter-based for specific variant information containing files and region-based for files than contain genomic regions. ANNOVAR doesn't automatically know what to do with the databases defined in the protocol parameter so you need to use this parameter to guide it.
protocol string "ensGene" Comma separated list of databases to use, e.g. "ensGene,1000g2012apr_all,snp137,cytoBand" for annotating the variants with Ensembl gene definitions, 1000 Genomes allele frequencies, dbSNP identifiers and cytogenetic band locations.

Generated 2018-12-12 07:42:06 by Anduril 2.0.0