Up: Component summary Function

ExprTable

Generates an expression table from individual samples expression files.

It was created with genes.fpkm_tracking and isoforms.fpkm_tracking files from Cufflinks in mind, but it can be used to summarize in one table any group of expression files that have an id column common to all files and the expression values.

Version 1.1
Bundle sequencing
Categories Expression
Authors Alejandra Cervera (alejandra.cervera@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml function.scala
Usage Example with default values

Inputs

Name Type Mandatory Description
array Array<BinaryFile> Mandatory The key column is used for naming the samples. The expression files should all have a matching id column with the gene or transcript id and a column with the expression values that are going to be included in the final expression table.
auxiliary CSV Optional If given, contains one column (see "matchColAux") whose values are matched to a column in the input files (see "matchCol").

Outputs

Name Type Description
table CSV Expression table that has at least one column with the gene or transcript ids, and expression columns corresponding to several samples.
log2 CSV Expression table that has at least one column with the gene or transcript ids, and expression columns corresponding to several samples with the expression values in log2.
topHits CSV Table with the top most expressed genes/transcripts in each sample.
plots Latex Density, histogram and boxplot on the expression values.

Parameters

Name Type Default Description
collapseNumeric string "consensus" If the ids are not unique then the values are collapsed, options are "median" (take median of non-NA values), "mean", "sum", "max", "min", "first" (take the first row),"median","majority" (take the value that is present on the largest number of rows), "consensus" (require that all rows have the same value) and "indicator".
extraCols string "gene_id,gene_short_name" Other columns that will be included in the expression table, for example the gene id and gene short name corresponding to the transcript or gene referred to in the idsCol.
filter string "FPKM_status=OK" Allows to filter out expression values based on the value in a different column. For example, in Cufflinks expression files the FPKM status column can be used to decide if the FPKM value is reliable. The default behavior is to keep only the values that have an "OK" status. If no filtering is desired, then it should be set to "".
highBound string "" Equivalent to highBound in CSVFilter
idsCol string "tracking_id" The name of the column that has the gene,transcript or exon id. The default is from the expression samples from Cufflinks.
log2transform boolean true Output a expression table with log2 transformed values. Log2 values are needed for generating the stats, so if set to false stats will not be produced either.
lowBound string "" Equivalent to lowBound in CSVFilter
matchCol string "" Column name in the input files that is matched to the "matchColAux" for subsetting the expression table. If empty, the first column of the input files is used.
matchColAux string "" Column name in "auxiliary" containing values that must match the "matchCol" column in input files. If empty, the first column of "auxiliary" is used.
numberTopHits int 10 Number of top hits to provide as statistic, the tophits of each sample are included so the final list could be much longer than the number provided.
valueCols string "FPKM" The name of the column that contains the expression values. The default is from Cufflinks' expression files.

Test cases

Test case Parameters IN
array
IN
auxiliary
OUT
table
OUT
log2
OUT
topHits
OUT
plots
case1 properties array (missing) (missing) (missing) (missing) (missing)


Generated 2018-12-11 07:42:07 by Anduril 2.0.0