Up: Component summary Function

BamStats

The function collects alignment and coverage statistics from a bam file. It uses picard CollectMultipleMetrics, bedtools genomecov/coverage and some in-house plotting and reporting scripts to produce the output. The function accept sorted bam files produced by either whole-genome or targeted (exome) sequencing experiments.

Version 1.0
Bundle sequencing
Categories Alignment
Authors Amjad Alkodsi (amjad.alkodsi@helsinki.fi)
Issue tracker View/Report issues
Requires picard-tools ; bedtools ; ggplot2 (R-package) ; hwriter (R-package)
Source files component.xml function.scala
Usage Example with default values

Inputs

Name Type Mandatory Description
bam BAM Mandatory Input bam file, should be sorted.
refGenome FASTA Mandatory The reference genome used to produce the alignment.
targets BED Optional Targets file if experiment is targeted.
chrLength CSV Optional Chromosomes lengths formatted as a headerless CSV with three columns: chr,start,end. Required only if the experiment is not targeted. Only chromosomes specified by this file will be analyzed.
markStats BinaryFile Optional The statistics file reported by Picard MarkDuplicates (Anduril function DuplicateMarker). If specified, the statistics will be included in the report.

Outputs

Name Type Description
report BinaryFolder Binary folder containing index.html and plotted images.
summary CSV Two-column CSV file with metrics in first column and measured statistics in the second column which named as the sampleName parameter. This file can be easily combined with other files when iterating over large number of samples.

Parameters

Name Type Default Description
bedtools string "" Path to bedtools binary directory,If empty string is given (default), BEDTOOLS_HOME environment variable is assumed to point to the bedtools directory.
maxCoverage int 150 Coverage higher than this value will be truncated from the histogram. Negative or zero value will suppress truncation.
memory string "4g" This value is used with Picard. e.g. "4g" or "8g".
paired boolean true Whether the bam file is paired-end or single-end.
picard string "../../lib/picard" Path to Picard directory, e.g. "/mnt/csc-gc5/opt/picard-tools-1.113", which containg the Picard-tools .jar files. If empty string is given (default), PICARD_HOME environment variable is assumed to point to the Picard directory. Note that some older versions of picard have bugs in the CollectMultipleMetrics module.
sampleName string "Sample" Sample name or key to be used in the report and the output summary.
stopAfter int 0 Number of reads that picard will use to report the statistics. The default value "0" will use all reads in the input file.
targeted boolean false Whether the sequencing experiment is targeted or not. If true, the targets input should be specified, and if false, the chrLength input should be specified.

Test cases

Test case Parameters IN
bam
IN
refGenome
IN
targets
IN
chrLength
IN
markStats
OUT
report
OUT
summary
case1 properties bam refGenome (missing) chrLength markStats (missing) (missing)

maxCoverage=100,

case2 properties bam refGenome targets (missing) (missing) (missing) (missing)

targeted=true,


Generated 2018-12-12 07:42:06 by Anduril 2.0.0