Up: Component summary Component

RegionConvert

Converts files containing genomic or other sequence related regions to other formats retaining all applicable information. Supported input and output types depend on the conversion method. The default is GROK's command line interface which supports all major sequencing file formats and various options. Type "grok" on the command line to get an overview.

Notice that currently GROK does not support BAM extra fields.

GROK's native CSV format is DNARegion2.

As a special case, DNARegion is explicitly supported as the output type "DNARegion" because some other components depend on it. That conversion allows specifying an ID column for the regions, so that region identity may be preserved.

Version 0.1
Bundle sequencing
Categories
Authors Lauri Lyly (lauri.lyly@helsinki.fi)
Issue tracker View/Report issues
Requires python ; installer (bash)
Source files component.xml convert_file.py
Usage Example with default values

Inputs

Name Type Mandatory Description
file BinaryFile Optional Single file to convert.
array Array<BinaryFile> Optional Array to convert.
folder BinaryFolder Optional Folder of files to convert. All convertible files or only those matching to the input type parameter will be converted.

Outputs

Name Type Description
file BinaryFile File converted from the file input.
array Array<BinaryFile> Files converted from the array input.
folder BinaryFolder Files converted from the folder input.

Parameters

Name Type Default Description
from string "" Input file type for all input files. By default, deduced from the file suffix. Accepted types are FIXME
id_column string "" Used only when output type is "DNARegion". Specifies the column or annotation field from which to read the ID for each region. Otherwise an ID will be generated for each region, starting from 1 and increasing.
method string "GROK" The framework used for conversion. GROK is the only choice now. This parameter is provided to make this component extensible without triggering re-execution.
options string "" Method specific options, usually appended to a command line or interpreted in some other way. A notable choice is to use --gzip for GROK to compress the output.
threads int 1 How many threads to use maximally at once. Set to 0 to detect amount of CPUs. The default value is chosen for safety to be 1.
to string "csv" The output type for all output files. The default type "csv" means DNARegion2, a CSV format. A suffix will be appended to the file names for arrays and folders. Accepted types are FIXME

Test cases

Test case Parameters IN
file
IN
array
IN
folder
OUT
file
OUT
array
OUT
folder
case1_vcf2csv properties file (missing) (missing) file array (missing)

from=vcf


Generated 2018-12-18 07:42:26 by Anduril 2.0.0