Up: Component summary Function

VariantCaller

Calls genomic sites of variation using the specified caller. Implemented callers are (see 'caller'):

The caller should be specified using the parameter, as well as the path to the software unless environment variables are used (GATK_HOME, VARSCAN_HOME pointing to installation directories). Samtools executables "samtools", "bcftools" and "vcfutils.pl" must be in PATH and made executable (vcfutils.pl).

The options parameter can be used to add additional software options in the software specific format. For samtools caller the options are added to only the mpileup command. The following options are hard implemented:

Here is a list of caller specific options that should be considered depending on the data:

The output is always an array with one or more files. Here follows a description of the files specific for the caller:

Complete documentation:

Version 1.0
Bundle sequencing
Categories VariationAnalysis
Authors Rony Lindell (rony.lindell@helsinki.fi), Riku Louhimo (Riku.Louhimo@Helsinki.FI)
Issue tracker View/Report issues
Source files component.xml function.scala
Usage Example with default values

Inputs

Name Type Mandatory Description
reference FASTA Mandatory The reference genome together with possible auxiliary files.
bam1 BAM Optional Input bam file of normal or tumor sample. The BAM index file (.bai) should be located in the same directory when using GATK.
bam2 BAM Optional Tumor bam file to call from in VarScan comparison calling. The variants from 'bam1' and 'bam2' will be compared in order to separate germline and somatic variants. [varscan only]
bams BAMList Optional File containing newline separated paths to bam files. When using this input GATK usage will be forced and multiple sample variant calling using the UnifiedGenotyper will be executed. One multi-sample VCF file will be produced. [gatk only]

The extension of the file must be .list.
intervals BED Optional File with genomic intervals to operate on. Only calls hitting these areas will be output.

Note: This can be used to mask out introns in exome sequencing by providing an BED file containing all exonic regions.
dbsnp VCF Optional File with known SNPs, usually from latest dbSNP distribution. The file is used in GATK to improve calling and for annotation. [gatk only]

The file must be pointed to by the key 'vcf'.

Outputs

Name Type Description
snp VCF Output snp (and indel) calls. When indels are written into the same file (GATK, Samtools), this file will contain both snps and indels.
indel VCF Output indel calls. VarScan writes snps and indels into separate files.
metrics TextFile Calling metrics. GATK will produce a file with some information about the calling results.

Parameters

Name Type Default Description
callIndels boolean true If false, indel calling will be skipped and only snps are called.
caller string "gatk" Software/algorithm to use to call variants. Possible values: {samtools, gatk, varscan}.
exclude string "none" Genomic region from which to exclude reads (i.e., skip) in the same format as 'region'. [gatk only]
gatk string "" Path to GATK jar file, e.g. "/opt/gatk/GenomeAnalysisTK.jar". If empty string is given (default), GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located.
memory string "4g" The amount of java-heap memory being allocated to GATK, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc. For optimal performance this should be a multiple of the threads used (see threads). [gatk only]
options string "" This string will be added to the command and can include any number of options in the software specific format, e.g. "-q 1" will skip zero-quality alignments or "-stand_emit_conf 10.0" to emit calls with quality higher than 10. See software specific documentation for more information.
region string "all" Genomic region from which to select reads, e.g. "chr1" or "chr2:1-20000". The region can also be a comma separated list, e.g. "chr1,chr2:1-20000,chrX:1-4000". A more extensive list of regions can be defined as a file using the 'intervals' input. [gatk only]
samOptions4VS string "" This string will be added to the samtools command when VarScan is used and can include any number of options in the software specific format. If string is empty, "-q 1" will be applied to skip zero-quality alignments. See software specific documentation for more information.
threads int 1 Number of threads allocated. Preferably allocate k*INT amount of memory to accompany the threads, e.g. 1*4=4 gb of memory for 4 threads, or 2*8=16 gb for 8 threads. [gatk only]
variantsOnly boolean true If true, only variant sites are called. When false, all confident sites are called, even those which are equal to the reference allele. Assigning this to false might be indicated for normal samples in normal-tumor comparison calculations.
varscan string "" Path to the VarScan jar file (typically VarScan.v2.X.Y.jar varying with the version). If empty string is given (default), VARSCAN_HOME environment variable is assumed to point to the VarScan directory, where VarScan.jar is the program file or a link pointing to it.

Test cases

Test case Parameters IN
reference
IN
bam1
IN
bam2
IN
bams
IN
intervals
IN
dbsnp
OUT
snp
OUT
indel
OUT
metrics
case1_samtools properties reference bam1 (missing) (missing) (missing) (missing) (missing) (missing) (missing)

# Simple testcase for the samtools caller,
caller=samtools,
variantsOnly=false

case2_gatk properties reference bam1 (missing) (missing) (missing) dbsnp (missing) (missing) (missing)

# Simple testcase for the gatk caller,
caller=gatk,
callIndels=false,
variantsOnly=false,
memory=1g,
threads=1,
options=-stand_emit_conf 1 -stand_call_conf 1

case3_varscan properties reference bam1 (missing) (missing) (missing) (missing) (missing) (missing) (missing)

# Simple testcase for the varscan germline caller,
caller=varscan,
callIndels=false,
options=--min-coverage 1 --min-reads2 1 --min-avg-qual 1 --p-value 0.05

case4_varscan_paired properties reference bam1 bam2 (missing) (missing) (missing) (missing) (missing) (missing)

# Simple testcase for the varscan somatic caller,
caller=varscan,
options=--min-coverage 1 --p-value 0.95 --somatic-p-value 0.10

case5_varscan_paired properties reference bam1 bam2 (missing) (missing) (missing) (missing) (missing) (missing)

# Simple testcase for the varscan somatic caller,
caller=varscan,
options=--min-coverage 1 --p-value 0.95 --somatic-p-value 0.10,
samOptions4VS=-q 1 -Q 10


Generated 2018-12-17 07:42:41 by Anduril 2.0.0