Up: Component summary Component

RowJoin

Joins duplicate id rows from a numeric matrix based on the frequency of some given value. If there are multiple rows with the highest frequency, the first row is chosen. A lower limit for the frequency can also be defined. All rows with no duplicates passing the frequency limit are output as NA.

Alternatively, the component imputes row values based on column wise mean value of matrix rows with identical IDs.

Input matrix must be presorted based on the row IDs to collapse duplicates. If the matrix is not presorted, only the freqLimit parameter has an effect.

Version 0.5
Bundle tools
Categories Convert
Authors Riku Louhimo (Riku.Louhimo@Helsinki.FI)
Issue tracker View/Report issues
Requires asser.jar
Source files component.xml
Usage Example with default values

Inputs

Name Type Mandatory Description
matrix CSV Mandatory Matrix that needs rows joined.
thresholds CSV Optional Thresholds for indicator output.

Outputs

Name Type Description
table CSV Input matrix with duplicate rows removed and the most frequent duplicate retained.

Parameters

Name Type Default Description
freqLimit float -1 Don't include in the output lines where the frequency of 'matchMe' is below this limit. Negative values disable this.
idColumn int 1 Index of the column that contains IDs. Empty defaults to the first column of the matrix.
impute boolean false Collapse row values for rows with identical IDs. The collapsed values are column wise means. NA values are treated as zeros when calculating means.
matchMe string "1" Defines which feature is used to calculate the frequency. Ignored if impute=true.
retainSkipped boolean false Include columns between idColumn and startColumn in the output.
startColumn int 2 Defines start of data in the matrix if the input contains more columns than the data and ID.
upper boolean true  

Test cases

Test case Parameters IN
matrix
IN
thresholds
OUT
table
case1 properties matrix (missing) table

idColumn=1,
startColumn=2,
freqLimit=0.7

case2 properties matrix (missing) table

idColumn=1,
startColumn=4

case3_impute properties matrix (missing) table

idColumn=1,
startColumn=3,
impute=true

case4_tholds properties matrix thresholds table

idColumn=1,
startColumn=3,
impute=false,
upper=true,
retainSkipped=true


Generated 2018-12-17 07:42:38 by Anduril 2.0.0