Up: Component summary Component

CSVCleaner

Cleans up CSV outputs, optionally removes file headers, quotations and unused columns, and can reorder and rename columns.

Possible character encodings for the missing value symbols and column separators:

character escape
empty string \e
space \s
carriage return\r
new line \n
tab \t
quotation mark \q
semicolon \c

Version 2.2
Bundle tools
Categories Convert
Authors Marko Laakso (Marko.Laakso@Helsinki.FI)
Issue tracker View/Report issues
Source files component.xml
Usage Example with default values

Inputs

Name Type Mandatory Description
in CSV Mandatory Input file to be cleaned

Outputs

Name Type Description
out CSV Simplified CSV

Parameters

Name Type Default Description
autoRename string "" Duplicate column names will be renamed uniquely if this delimiter string is not empty. The new column names are formed by adding an integer counter after the column name and the given delimiter. For example a hyphen (-) converts {a, a, a-1, a-3, a} to {a, a-2, a-1, a-3, a-4}.
columnsIn string "" A comma separated list of column names for the input columns. An empty string means that that the column names are defined on the first input row.
columnsOut string "*" Comma separated list of column selections for the output. An asterisk (*) may be used for all columns.
delimIn string "\t" Column delimiter for the input
delimSymbol string "\t" Column delimiter for the output
dropHeader boolean false This flag will eliminate column names from the output.
fillRows boolean false Accept rows with too few columns and complete them with missing values.
naIn string "NA" Missing value symbol for the input
naSymbol string "NA" Missing value symbol for the output
numberFormat string "" A line feed (\n) separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000.
rename string "" Comma separated list of column renaming rules (oldname=newname)
replace string "" A line feed (\n) separated list of column specific search replace rules. Each entry consists of three lines: column name, regular expression for the replacement keys, and the substitution patterns. The syntax of the keys and substitutions follows Java regular expressions.
rowSkip int 0 Skip this many lines from the input before reading the tabular.
skipQuotes string "" Comma separated list of output columns names that should not have quotation marks. An asterisk (*) may be used for all columns.
trim boolean false Remove leading and trailing whitespaces from the field values.

Test cases

Test case Parameters IN
in
OUT
out
case1 properties in out

columnsOut = Sample,value,name,
skipQuotes = *,
rename = value=Value,name=Name,

case2 properties in out

columnsOut = Sample,name,
skipQuotes = name,
rename=name = Name,
trim = true,

case3 properties in out

naSymbol = -N\\eA-,
naIn = MISSIN\\eG,
delimSymbol = \\c,
delimIn = ,,
rowSkip = 2,
metadata.timeout = 0

case4 properties in out

columnsOut = C,B,
dropHeader = true,
replace = B\nB\nb\nC\nC\nc

case5 properties in out

columnsIn = A,B,C,
columnsOut = C,B,
dropHeader = true

case6 properties in out

autoRename = -

case7 properties in out

fillRows = true

case8 properties in out

numberFormat = B=%\nA=#.##\nC=O.###E0

case9 properties in out

naIn = \\e,
columnsIn = col1,col2,col3,col4

case9b properties in out

naIn = \\e,
columnsIn = col1,col2,col3,col4


Generated 2018-12-18 07:42:28 by Anduril 2.0.0