The component takes the output of the MMClustering component,
where each data sample has been clustered with multiple different cluster
numbers, and determines the optimal cluster number for each sample. The
optimal number of clusters for each sample is determined independently
of the other samples. The parameters `method`

and ```
metric
```

determine how the optimal number of cluster is chosen.

Version | 1.0 |
---|---|

Bundle | flowand |

Categories | FlowCytometry |

Authors | Anna-Maria Lahesmaa-Korpinen (anna-maria.lahesmaa@helsinki.fi), Erkka Valo (erkka.valo@helsinki.fI) |

Requires | R ; fpc (R-package) |

Source files | component.xml OptimalClustering.r |

Usage | Example with default values |

Name | Type | Mandatory | Description |
---|---|---|---|

clusters | CSVList | Mandatory | A directory containing clustering results for one or multiple samples. For one sample there should be multiple results corresponding to results with different number of clusters. One column in the CSV files should contain the cluster membership information for the row. |

clustStat | CSV | Mandatory | One row corresponds to clustering results for one sample with
specific number of clusters. There should be columns for the
corresbonding file name in `clusters` , number of
clusters used in the clustering, the original file name and
values for BIC, AIC, SWR and ICL. |

Name | Type | Description |
---|---|---|

clusters | CSVList | The optimal clustering results for each sample are copied to output. |

report | Latex | Report containing a plot for the optimal cluster number metrics as a function of the cluster number. |

Name | Type | Default | Description |
---|---|---|---|

clusterClustCol | string | "cluster" | The name of the column in the clusterFiles which represents the cluster number of the rows. |

method | string | "min" | Method used to choose the optimal clustering given the metric. Possible values are 'min', 'max' and 'changepoint'. 'min' and 'max' choose the clustering results with the minimum and maximum value of the metric respectively. 'changepoint' fits two linear models to the data to detect the changepoint. |

metric | string | "SWR" | Metric used for choosing the optimal number of clusters for each sample. Possible values are SWR (Scaleefree Weighted Ratio), AID (Average Intercluster Distance), IIR (Average Intracluster Distance / Average Intercluster Distance), AIC (Akaike Information Criterion, BIC (Bayesian Information Criterion) or ICL (Integrated Completed Likelihood). |

nSample | int | 1000 | The number of data points to sample from each clustering result
to calculate AID and IIR. If there is less or equal number of
data points as `nSample` , all data points are used.
This can be very slow for large values of nSample. |

seed | int | 123456 | Random seed. Used to make the sampling of the data reproducible. |

useAIDAndIIR | boolean | true | If true calculate AID and IIR metrics for the different clustering results. This can be time consuming. |

Generated 2018-12-18 07:42:15 by Anduril 2.0.0