Up: Component summary Component


Clusters data using Phenograph clustering method.

PhenoGraph is a clustering method designed for high-dimensional single-cell data. It works by creating a graph ("network") representing phenotypic similarities between cells and then identifying communities in this graph.


Version 1.0
Bundle tools
Categories Clustering
Authors Ville Rantanen (ville.rantanen@helsinki.fi)
Issue tracker View/Report issues
Requires git (DEB) ; python3-dev (DEB) ; python3-pip (DEB) ; setuptools (python3) ; installer (bash)
Source files component.xml phenoclus.py
Usage Example with default values


Name Type Mandatory Description
in CSV Mandatory Data to cluster.


Name Type Description
out CSV A CSV with the columns idColumn, "clusterId". The column "clusterId" contains the computed cluster index for each sample as an integer. ClusterId -1 is regarded as an outlier.
graph CSV A CSV


Name Type Default Description
columnsToRemove string "" Comma separated list of names of columns not to be used in clustering. Useful if you want to ignore some attribute in the data while clustering.
directed boolean false Whether to use a symmetric (default) or asymmetric ("directed") graph The graph construction process produces a directed graph, which is symmetrized by one of two methods (see below)
idColumn string "" Name of a column that contains a unique identifier for each row in the input data. This column is not included in the clustering but is copied to the output. Use an empty string if the input does not have such a column. If this string is empty the output will have a column called "index" which has a running index number for each row of input (starting from 1).
jaccard boolean true If true, use Jaccard metric between k-neighborhoods to build graph. If false, use a Gaussian kernel.
k int 30 Number of nearest neighbors to use in first step of graph construction.
louvainTimeLimit int 2000 Maximum number of seconds to run modularity optimization. If exceeded the best result so far is returned
metric string "euclidean" Distance metric to define nearest neighbors. Options include: euclidean,manhattan,correlation,cosine.
minClusterSize int 10 Cells that end up in a cluster smaller than min_cluster_size are considered outliers and are assigned to -1 in the cluster labels
nJobs int -1 Nearest Neighbors and Jaccard coefficients will be computed in parallel using n_jobs. If n_jobs=-1, the number of jobs is determined automatically
prune boolean false Whether to symmetrize by taking the average (prune=False) or product (prune=True) between the graph and its transpose
qTol float 0.001 Tolerance (i.e., precision) for monitoring modularity optimization

Test cases

Test case Parameters IN
case1 properties in out graph


Generated 2018-12-11 07:42:07 by Anduril 2.0.0