Up: Component summary Component

BackSPIN

Biclustering of gene expression data using the BackSPIN algorithm

Version 1.0
Bundle tools
Categories Clustering
Authors Antti Hakkinen (antti.e.hakkinen@helsinki.fi)
Issue tracker View/Report issues
Requires python ; installer (bash)
Source files component.xml backspin.py
Usage Example with default values

Inputs

Name Type Mandatory Description
in CSV Mandatory The expression file to be clustered. Columns are genes and rows are samples.

Outputs

Name Type Description
rowClusts CSV Clustering of the input rows. Rows represent the genes (original rows), columns the clustering depths, and values the cluster labels.
colClusts CSV Clustering of the input columns. Rows represent the cells (original columns), columns the clustering depths, and values the cluster labels.
permutedInput CSV Permuted and filtered (if feature section used) copy of the input

Parameters

Name Type Default Description
feature_fit boolean true Feature selection is performed before BackSPIN. Selection is based on expected noise (a curve fit to the CV-vs-mean plot).
feature_genes int 2000 Argument controls how many genes are selected for features
first_run_iters int 10 Number of iterations of preparatory SPIN
first_run_step float 0.1 Controls the decrease rate of the width parameter used in the preparatory SPIN. Smaller values will increase the number of SPIN iterations and result in higher precision in the first step but longer execution time.
low_thrs float 0.2 If the difference between the average expression of two groups is lower than threshold the algorithm uses higly correlated genes to assign the gene to one of the two groups
normal_spin boolean false Run normal SPIN instead of backSPIN. Normal spin respects the parameters "runs_iters" and "runs_step".
normal_spin_axis string "both" An axis value 0 (or "genes") to only sort genes (rows), 1 (or "cells") to only sort cells (columns) or "both" for both
numLevels int 2 Depth/Number of levels: The number of nested splits that will be tried by the algorithm
preprocess boolean false Transform the input data using log2(x+1) transform and by subtracting the mean gene expression as the BackSPIN script always does
runs_iters int 8 Number of the iterations used for every width parameter. Does not apply on the first run (use "first_run_iters" instead)
runs_step float 0.3 Controls the decrease rate of the width parameter. Smaller values will increase the number of SPIN iterations and result in higher precision but longer execution time. Does not apply on the first run (use "first_run_step" instead)
seed int 12345 Seed for the pseudorandom number generator
split_limit_c int 2 Minimal number of cells that a group must contain for splitting to be allowed.
split_limit_g int 2 Minimal number of genes that a group must contain for splitting to be allowed.
stop_const float 1.15 Minimum score that a breaking point has to reach to be suitable for splitting.
verbose boolean false Print to the stdoutput extra details of what is happening

Test cases

Test case Parameters IN
in
OUT
rowClusts
OUT
colClusts
OUT
permutedInput
case-example properties (missing) rowClusts colClusts permutedInput

seed=123,
preprocess=true,
feature_fit=true,
feature_genes=500

case-small (missing) in rowClusts colClusts permutedInput

Generated 2018-12-12 07:42:06 by Anduril 2.0.0