Up: Component summary Component

CSVJoin

Joins rows from two or more CSV files from all the inputs, optionally using one column as a matching key.

If a key column is not used, the result contains all rows and all columns of the input files. Missing values (NA) may be introduced when a column is not present in all input files. Each column is present once and duplicate rows are removed.

If a key column is used, the rows in each input file are matched using values from the key column. The result file has one row for each key value. In the result, the first column is the key column; its name is obtained from the first CSV input (csv1). If the intersection parameter is true, a key is included in the result if the key value is present in all inputs. If intersection is false, a key is included if it is present in at least one input (key union). Union semantics may introduce NA values in the result. If several input files have the same column, the value is obtained from the first file. However, if the first file contains a missing value (NA) and the second file contains a non-missing value, the non-missing value is used instead.

For more complex join operations, see TableQuery. You may also use CSVListJoin to join multiple large files efficiently.

Version 1.1
Bundle tools
Categories Convert
Specialties generic
Authors Kristian Ovaska (kristian.ovaska@helsinki.fi)
Issue tracker View/Report issues
Requires R
Source files component.xml CSVJoin.r
Usage Example with default values

Type parameters (generics)

Inputs

Name Type Mandatory Description
in1 CSV Optional CSV file 1.
in2 CSV Optional CSV file 2.
in3 CSV Optional CSV file 3.
in4 CSV Optional CSV file 4.
inDir BinaryFolder Optional Directory containing CSV files.
in Array<CSV> Optional Array containing CSV files.

Outputs

Name Type Description
out T (generic) Result CSV file.

Parameters

Name Type Default Description
asConnection boolean false If true, assign file connections (class file) to the inputs to variables of the same name. This is useful e.g. if you want to read large data files row by row.
intersection boolean true Defines how keys are handled; only used when useKeys=true. If intersection is true, the result contains a key if the key is present in all input files. If false, the result contains a key if the key is present in at least one input file.
keyColumnNames string "" Comma-separated list of key column names; only used when useKeys=true. The first name refers to csv1, the second to csv2, etc. An empty value refers to the first column. Empty values may be omitted from the list, so all these are equivalent: "col1" ; "col1," ; "col1,," ; etc.
minRows int 0 Fail component if there are less than minRows rows of data (excluding the header).
useKeys boolean true If true, use one column from each CSV file as a matching key column and the columns are combined. If false, rows are combined without a join.

Test cases

Test case Parameters IN
in1
IN
in2
IN
in3
IN
in4
IN
inDir
IN
in
OUT
out
case1 (missing) in1 in2 (missing) (missing) (missing) (missing) out
case10_array_dir_files properties in1 in2 in3 in4 inDir in out

useKeys=false

case2_union properties in1 in2 (missing) (missing) (missing) (missing) out

intersection=false

case3_nokeys properties in1 in2 (missing) (missing) (missing) (missing) out

useKeys=false

case4_names properties in1 in2 (missing) (missing) (missing) (missing) out

keyColumnNames=KEY1,KEY2

case5_many properties in1 in2 in3 (missing) (missing) (missing) out

useKeys=false

case6_sepnames (missing) in1 in2 (missing) (missing) (missing) (missing) out
case7_sepnames_nokeys properties in1 in2 (missing) (missing) (missing) (missing) out

useKeys=false

case8_many properties in1 in2 in3 in4 (missing) (missing) out

useKeys=false

case9_dir properties (missing) (missing) (missing) (missing) inDir (missing) out

intersection=false


Generated 2018-12-17 07:42:31 by Anduril 2.0.0