Up: Component summary Function

MutSig

Determine significantly mutated genes in a set of genetic variations using MutSig. The algorithm takes genetic variations mapped to patients as input, and computes gene significance based on sample- and gene-specific background mutation rates (BMRs). Gene-specific BMRs are estimated using gene features (covariates) that may affect mutation rate, such as gene expression. BMR is further estimated separately for different mutation categories such as C->T, A->G, and indels. MutSig requires silent (and optionally noncoding) variants in addition to nonsilent variants, as they are used for estimating BMR.

The output is a prioritized list of significant genes with the following columns:

This component may be called using a variety of inputs. Mandatory inputs include only patient identifiers, variant locations and variant alleles. In this case, other information is inferred using ANNOVAR and custom logic. It is also possible to provide all necessary information a priori.

Genes are normally specified using HUGO as per MutSig convention. MutSig requires approx. 10 GB memory. See below for installation instructions.

Version 1.0
Bundle sequencing
Categories VariationAnalysis
Authors Kristian Ovaska (kristian.ovaska@helsinki.fi)
Issue tracker View/Report issues
Requires MutSigCV ; MATLAB Compiler Runtime ; ANNOVAR ; Scala
Source files component.xml function.scala
Usage Example with default values

Inputs

Name Type Mandatory Description
variants CSV Mandatory Raw variants. Must contains columns for patient identifiers, chromosome, chromosomal position and variant (alternative) allele. May also contain columns for gene identifiers (HUGO), predicted effect of variation, and reference allele. Column names are configured using parameters.
coverage CSV Optional If given, contains genomic intervals for those parts of the genome that were selectively sequenced in the experiment. For example, an exome sequencing experiment would include the regions that were captured. The following columns must be present: Chromosome, Start (1-based), End (inclusive). Note: annotating coverage using ANNOVAR is relatively slow.
covariates CSV Optional Gene covariate table. The first column contains gene identifiers (HUGO) and the rest of the columns contain numeric attributes of the genes.

Outputs

Name Type Description
genes CSV Significance of mutated genes (for all defined genes). Contains the columns: Gene (HUGO); p (nominal p-value); pFDR (FDR-corrected p-value), as well as MutSig metrics and copies of covariates.
genesExcel Excel The genes output as a formatted Excel.

Parameters

Name Type Default Description
altAlleleColumn string "ALT" In variants, column name for the variant allele. Must always be defined.
annovarBin string "" Path to ANNOVAR binary installation directory. This directory contains convert2annovar.pl, table_annovar.pl, etc. Only needed when geneColumn or effectColumn are empty or custom coverage is provided. If empty, the environment variable ANNOVAR_HOME is used instead (when needed).
annovarDB string "" Path to ANNOVAR database directory. This directory often contains hgNN_X.{fa,idx,txt} files. If empty, the environment variable ANNOVAR_DB is used instead (when needed).
builtinCovariates string "*" Comma-separated list of column names for builtin covariates located in gene.covariates.txt in the MutSig installation directory. These covariates are combined with custom covariates, if defined. The special value * selects all builtin covariates.
chromColumn string "CHROM" In variants, column name for the chromosome. Must always be defined.
effectColumn string "" In variants, column name for variant effect. This is either the native MutSig effect (noncoding/nonsilent/silent/null); the Variant_Classification column in the MAF format; ANNOVAR output; or the AminoChange column in RikuRator export files. Non-MutSig effect specifications are converted to the native format. If empty, this is inferred using ANNOVAR
geneColumn string "" In variants, column name for gene identifiers (HUGO). If empty, this is inferred using ANNOVAR.
label string "MutSig" Label for the experiment that is used as sheet name in the Excel report.
matlab string "" Path to the MATLAB compiler runtime directory. This directory contains the subdirectories bin, mcr, resources, etc. If empty, the environment variable MCRROOT is used instead.
mutsig string "" Path to MutSig installation directory. This directory contains the main MutSig MCR binaries (run_MutSigCV.sh, etc.) and supplementary data files available on the MutSig web site. Always needed is mutation_type_dictionary_file.txt. Depending on optional inputs, chr_files_hg19 (reference genome directory), exome_full192.coverage.txt and gene.covariates.txt may also be needed. If empty, the environment variable MUTSIG_HOME is used instead.
patientColumn string (no default) In variants, column name for patient identifiers. Must always be defined.
positionColumn string "POS" In variants, column name for chromosomal position. Must always be defined.
refAlleleColumn string "REF" In variants, column name for the reference allele. If empty, this is inferred from mutsig/chr_files_hg19.

Test cases

Test case Parameters IN
variants
IN
coverage
IN
covariates
OUT
genes
OUT
genesExcel
case1 properties variants coverage (missing) (missing) (missing)

geneColumn=gene,
patientColumn=patient,
effectColumn=effect,
refAlleleColumn=

case2_minimal properties variants (missing) (missing) (missing) (missing)

patientColumn=patient,
refAlleleColumn=,
chromColumn=Chromosome,
positionColumn=Start,
altAlleleColumn=AltAllele

case3_effect properties variants coverage (missing) (missing) (missing)

geneColumn=gene,
patientColumn=patient,
effectColumn=effect,
refAlleleColumn=

case4_covariates properties variants coverage covariates (missing) (missing)

geneColumn=gene,
patientColumn=patient,
effectColumn=effect,
builtinCovariates=expr,reptime,
refAlleleColumn=


Generated 2018-12-18 07:42:34 by Anduril 2.0.0