Component

# SetTransformer

Transforms sets using union, intersection, difference and other functions. Transformations are defined using an expression syntax like the following: union(intersection(S1, S2), S3). Here, union and intersection are set functions and S1..S3 refer to set names (IDs) which must be present in input files. Set references can be quoted or unquoted: union(S1, S2) is equal to union('S1', "S2"). Quotes have to be used if set names contain special characters. Expressions can nest to arbitrary depth and function calls can have an arbitrary number of arguments, unless stated otherwise. The expression language is based on JEP 2. Transformations are evaluated in the order they appear and results of earlier transformations can be used in later transformations.

Supported functions are described in the following table. For domains, S denotes a set (corresponds to the Members column of SetList), C denotes a character string argument, N denotes an integer argument, Sn and Cn denote collections of sets and character strings, and "x" is Cartesian product. Each set transformation must produce an object of type S, i.e., a single set.

Definition Domain Description
intersection(sets...) Sn -> S Intersection of n sets. Example: `intersection(S1, S2, S3)`.
union(sets...) Sn -> S Union of n sets. Example: `union(S1, S2, S3)`.
diff(sets...) Sn -> S Non-symmetric difference so that ```diff(S1, ..., Sn) = S1 - S2 - ... - Sn = S1 - union(S2, ..., Sn)```, where Si - Sj denotes elements present in Si but not in Sj. For example, `diff(S1, S2)` returns the elements present in S1 but not in S2.
freq(low, high, sets...) N2 x Sn -> S Return elements whose frequencies (occurrences) are within given bounds; bounds are given as two numeric arguments. For example, `freq(1, 2, S1, S2, S3)` returns elements that are present in 1 or 2 of the input sets.
minfreq(low, sets...) N x Sn -> S Like `freq` but takes only the lower bound: ```minfreq(x, sets...) = freq(x, infinity, sets...)```.
set(strings...) Cn -> S Construct a literal set. For example, `set("x", "y")` creates a set with two elements. Arguments must be strings.
allnames() none -> S Return the set of all defined set names, corresponding to the ID column of SetList. Example: `allnames()`.
match(pattern, set) C x S -> S Use Java regular expressions to filter elements of the given set. For example, `match('a.*', S1)` returns all elements in set S1 that start with a.
setmatch(pattern) C -> Sn Use Java regular expressions to refer to set names. Return a collection of sets. Note that ```setmatch(pattern) = expand(match(pattern, allnames()))```. For example, `setmatch("S[0-9]+")` returns all sets whose names are like S1, S2, etc.
replace(search, replace, set) C2 x S -> S Search-and-replace text strings in the elements of given set. Return the set with modified items. The search pattern uses Java regular expressions and the replace pattern may refer to captured subpatterns using \$1, \$2, etc. For example, `replace('a(b|B)c', '\$1', S1)` replaces each occurrence of 'abc' with 'b' and 'aBc' with 'B' for every element in set S1.
names(sets...) Sn -> S Return the names (identifiers) of argument sets, corresponding to the ID column of SetList. For example, `names(S1, S2)` returns the set {"S1", "S2"}.
expand(set) S -> Sn The reverse of `names`: expand a set of names (set identifiers) into a collection of sets. For example, `expand(set('S1', 'S2'))` returns the sets S1 and S2.

### Iterated transformations

If the transformation target or expression contain `*`, the transformation is iterated over a set of names and each `*` is replaced with the current name for each iteration. This enables a single transformation to yield several result sets. Iterations that lead to invalid expressions are ignored, but the transformation must yield at least one result set. By default, iteration is done over all set names. This can be overridden by defining an IterationSet column in `transformation` that contains an expression that evaluates to a set of names. If the column is not present or contains NA, the expression is `allnames()`.

Example: Target is `*_deg`, Definition is ```union("*_up", "*_down")``` and IterationSet is `names(S1, S2)`. Assuming sets `S1_up`, `S1_down`, `S2_up` and `S2_down` exist, this creates sets `S1_deg` = `union(S1_up, S1_down)` and `S2_deg` = `union(S2_up, S2_down)`. If IterationSet is omitted, the transformation is looped over all sets.

Version 0.6 tools Convert Kristian Ovaska (kristian.ovaska@helsinki.fi) View/Report issues jep-2.4.1.jar (jar) component.xml SetTransformer.java Functions.java Example with default values

## Inputs

Name Type Mandatory Description
transformation CSV Mandatory Set transformations, one per CSV row. The columns Target (target set ID) and Definition (transformation expression) must be present. The column IterationSet may be present if iterated transformations are used. IterationSet should contain NA for non-iterated transformations. Any other columns are interpreted as annotation columns and are copied to the output. For iterated transformations, the wildcard * in annotations is replaced with the current set ID.
set1 SetList Mandatory Input sets 1. NA values in Members are interpreted as empty sets.
set2 SetList Optional Input sets 2.
set3 SetList Optional Input sets 3.
set4 SetList Optional Input sets 4.
set5 SetList Optional Input sets 5.

## Outputs

Name Type Description
result SetList Result sets. Sets are in the order they appear in transformations. Set members are in alphabetic order.

## Parameters

Name Type Default Description
includeAnnotation string "*" Comma-separated list of column names in the transformation input that should be used as annotation columns. The wildcard * includes all columns. The special columns Target, Definition and IterationSet are excluded automatically.
includeOriginal boolean false If true, the original sets from input files are included in the output as well. If false, only sets defined in transformations are included.

## Test cases

Test case Parameters IN
transformation
IN
set1
IN
set2
IN
set3
IN
set4
IN
set5
OUT
result
case1 (missing) transformation set1 set2 (missing) (missing) (missing) result
case2_inclorig properties transformation set1 set2 (missing) (missing) (missing) result

includeOriginal=true

case3_wildcard (missing) transformation set1 (missing) (missing) (missing) (missing) result
case4_annotation properties transformation set1 set2 (missing) (missing) (missing) result

includeOriginal=true,
includeAnnotation=Annotation1,Annotation2

Generated 2019-02-08 07:42:19 by Anduril 2.0.0