Up: Component summary Component

TextFileSplitter

Splits a text file (such as CSV) into an array of smaller text files. Splitting can be done in two modes: (1) fixed and (2) non-fixed number (default) of elements in output array. Fixed mode is enabled when numElements is non-negative and non-fixed mode otherwise.

The input file is divided into records, which are regions each having N rows, where N is given with rowsPerRecord. By default, each line is considered as a record. Records are written into the output array using splitting criteria.

Common use cases:

This component is the inverse operation to CSVListJoin.

Version 1.0
Bundle tools
Categories Internal
Specialties generic
Authors Kristian Ovaska (kristian.ovaska@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml TextFileSplitter.java
Usage Example with default values

Type parameters (generics)

Inputs

Name Type Mandatory Description
in T1 (generic) Mandatory Input text file.

Outputs

Name Type Description
out Array<T1> (generic) Array of smaller text files whose contents are derived from the input file. Array elements and rows within individual elements are in the order of the original file.

Parameters

Name Type Default Description
headerRows int 1 Number of header rows in the beginning of the file. These rows are included in every output element and are not counted as actual records.
keyPattern string "%d" Defines the format of keys in the output array. The wildcard %d is replaced with the index of the current element, starting at 1. The default creates elements with keys 1, 2, etc.
maxRecords int -1 Maximum number of records in each element. If negative, there is no upper limit. Must be negative when fixed mode is enabled.
numElements int -1 Defines the fixed number of elements that the output array will have. If negative, the number of elements is not fixed and the non-fixed mode is enabled. When non-negative, the fixed mode is enabled and maxRecords and splitRegexp can not be used.
rowsPerRecord int 1 Number of rows that each record spans.
splitRegexp string "" Java regular expression that indicates the start of a new array element. When the current line matches this regular expression, a new element is started and the current line is written into the beginning of the new element. If empty, the expression is not used. Must not be used when fixed mode is enabled.

Test cases

Test case Parameters IN
in
OUT
out
case1_defaults (missing) in out
case2_maxrec properties in out

headerRows=4,
numElements=3,
keyPattern=mykey_%d,
rowsPerRecord=2

case3_regexp properties in out

headerRows=2,
rowsPerRecord=2,
maxRecords=3,
splitRegexp=>START .*

case4_csv_numelem properties in out

numElements=7

case5_csv_maxrec properties in out

maxRecords=3


Generated 2018-12-16 07:42:17 by Anduril 2.0.0