Up: Component summary Component

RandomSampler

Randomly selects rows and columns from a text or CSV file without replacement. The input file is interpreted as a CSV file if any of the column parameters are given non-default values; otherwise, it is interpreted as a text file. The number of sampled rows and columns can be specified as fractions or absolute numbers. There may be header rows and columns that are copied verbatim to the output; the random sample comes after the header entries.

Version 0.5
Bundle tools
Categories Analysis
Specialties generic
Authors Kristian Ovaska (kristian.ovaska@helsinki.fi)
Issue tracker View/Report issues
Requires commons-math3-3.2.jar (jar)
Source files component.xml RandomSampler.java
Usage Example with default values

Type parameters (generics)

Inputs

Name Type Mandatory Description
in T (generic) Mandatory Input file. May be either a CSV file or a general text file.

Outputs

Name Type Description
out T (generic) Random subset of the input file.

Parameters

Name Type Default Description
columnFraction boolean true If true, numColumns is a fraction. If false, numColumns is an absolute count. Only used for CSV files.
headerColumns int 0 For CSV files, number of header columns before the actual randomized content. These are the first columns in the CSV file. Header columns are copied verbatim to each output row.
headerRows int 1 Number of header rows before the actual randomized content. The header is copied to output verbatim. For CSV files, this must be 1.
numColumns float 1 For CSV files, number or fraction of columns to be randomly selected. If columnFraction is true, this is a fraction between 0 and 1; otherwise, this is an absolute number.
numRows float (no default) Number or fraction of rows to be randomly selected. If rowFraction is true, this is a fraction between 0 and 1; otherwise, this is an absolute number.
rowFraction boolean true If true, numRows is a fraction. If false, numRows is an absolute count.
shuffleColumns boolean false If true, column order is randomly shuffled. If false, columns appear in the output in the same order as they are in the input. Only used for CSV files.

Test cases

Test case Parameters IN
in
OUT
out
case1_text_fraction properties in out

headerRows=2,
numRows=1

case2_text_count properties in out

headerRows=1,
numRows=9,
rowFraction=false

case3_text_part properties in out

headerRows=0,
numRows=3,
rowFraction=false

case4_csv_fraction properties in out

numRows=1,
headerColumns=2

case5_csv_count properties in out

numRows=1,
numColumns=4,
columnFraction=false,
headerColumns=1

case6_csv_part1 properties in out

numRows=1,
rowFraction=false,
numColumns=1,
columnFraction=false,
shuffleColumns=true

case7_csv_part2 properties in out

numRows=3,
rowFraction=false,
numColumns=2,
columnFraction=false


Generated 2018-12-17 07:42:38 by Anduril 2.0.0