Up: Component summary Component

SampleGroupCreator

Creates sample group tables based on sample names read from data files. These tables define relationship between samples, e.g., which samples are biological or technical replicates. The list of sample names is read from data files (either column names or one column along rows) and sample groups are defined using regular expressions. Also, constant ("verbatim") groups where the set of member samples does not depend on data can be created. A maximum of nine patterns can be defined; the type of each pattern is given with "patternTypes", which is one of "re", "relist" or "verbatim" for each pattern.

Groups having the type "re" are defined by regular expressions. Here, patternN is a Java regular expression that matches sample names and definitionN has the format NAME,TYPE or NAME,TYPE,DESCRIPTION. NAME is the ID of the sample group and TYPE defines the relationship of samples within the group (mean, median, ratio or sample); see documentation on SampleGroupTable for details. Optionally, DESCRIPTION is a human-readable name. All samples that match the pattern are members of the group.

One pattern may spawn several sample groups if grouping operators "(" and ")" are used in the pattern. These capturing groups can be referred to as $1, $2, etc. in NAME and DESCRIPTION. For example, when pattern1 is "S([0-9]+)[a-z]" and definition1 is "MyGroup_$1,median", the following groups may be created: "MyGroup_1" containing "S1a,S1b" and "MyGroup_2" containing "S2a,S2b,S2c".

Groups having the type "relist" are a special case of two-sample groups defined by two regular expressions. Here, patternN has the format PATTERN1,PATTERN2. Both elements are regular expressions. All two-sample pairs that match the patterns are created as groups. Capturing groups may be present only in PATTERN1; PATTERN2 may refer to them as $1, $2, etc. The definitionN parameter is as before and it may also refer to capturing groups of PATTERN1. For example, if pattern1 is "S([0-9]+)_green,S$1_red" and definition1 is "ratio_S$1,ratio", the following groups may be created: "ratio_S1" containing "S1_green,S1_red" and "ratio_S2" containing "S2_green,S2_red".

Groups having the type "verbatim" are constant groups. Here, patternN is a comma-separated list of member sample group names and definitionN is as before, except capturing groups $1, ..., can not be used.

Version 1.2.2
Bundle tools
Categories Preprocessing
Authors Kristian Ovaska (kristian.ovaska@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml SampleGroupCreator.java
Usage Example with default values

Inputs

Name Type Mandatory Description
data1 CSV Mandatory Data file 1 for reading sample names. Either column names or values on one column contain sample names.
data2 CSV Optional Data file 2 for reading sample names. Either column names or values on one column contain sample names.
data3 CSV Optional Data file 3 for reading sample names. Either column names or values on one column contain sample names.

Outputs

Name Type Description
groups SampleGroupTable Result sample groups.

Parameters

Name Type Default Description
columns string "" Defines which columns in input files contain sample names, or whether column names should be used (default). This parameter is a comma-separated list of at most three values which name columns in inputs data1 to data3. If an entry is empty, the column names of the corresponding input file are used instead.
definition1 string "" Definition of group 1. If empty, the group is omitted. Format: NAME,TYPE or NAME,TYPE,DESCRIPTION.
definition2 string "" Definition of group 2. If empty, the group is omitted.
definition3 string "" Definition of group 3. If empty, the group is omitted.
definition4 string "" Definition of group 4. If empty, the group is omitted.
definition5 string "" Definition of group 5. If empty, the group is omitted.
definition6 string "" Definition of group 6. If empty, the group is omitted.
definition7 string "" Definition of group 7. If empty, the group is omitted.
definition8 string "" Definition of group 8. If empty, the group is omitted.
definition9 string "" Definition of group 9. If empty, the group is omitted.
pattern1 string "" Pattern for group 1. If empty, the group is omitted. Format depends on the pattern type.
pattern2 string "" Pattern for group 2. If empty, the group is omitted.
pattern3 string "" Pattern for group 3. If empty, the group is omitted.
pattern4 string "" Pattern for group 4. If empty, the group is omitted.
pattern5 string "" Pattern for group 5. If empty, the group is omitted.
pattern6 string "" Pattern for group 6. If empty, the group is omitted.
pattern7 string "" Pattern for group 7. If empty, the group is omitted.
pattern8 string "" Pattern for group 8. If empty, the group is omitted.
pattern9 string "" Pattern for group 9. If empty, the group is omitted.
patternTypes string "" Comma-separated list of pattern types for each pattern. Each item is one of "re" (default), "relist" or "verbatim". The types are explained above. An empty value is interpreted as "re". For example, ",re,relist,,verbatim" specifies that pattern3 has the type "relist" and pattern5 the type "verbatim"; all others (including pattern6 and above) have the type "re".

Test cases

Test case Parameters IN
data1
IN
data2
IN
data3
OUT
groups
case1 properties data1 (missing) (missing) groups

pattern1=S([0-9]+).*,
definition1=Group1_$1,median,Description 1: $1,
pattern2=S.*?A,
definition2=Group2,mean,Description 2,
pattern3=S.*?([ABC]),
definition3=Group3_$1,mean,Description 3,
pattern4=Group1_1,Group2,
definition4=Ratio1,ratio,Description 4,
pattern5=Group1_2,Group2,
definition5=Ratio2,ratio,Description 5,
pattern6=Group1_3,Group2,
definition6=Ratio3,ratio,Description 6,
patternTypes=re,re,re,verbatim,verbatim,verbatim

case2_twochannel properties data1 data2 (missing) groups

pattern1=S([0-9]+)_green,S$1_red,
definition1=ratio_S$1,ratio,
patternTypes=relist,
columns=Sample,SampleID


Generated 2018-12-18 07:42:32 by Anduril 2.0.0