Up: Component summary Component

EnsemblDNA

Fetches DNA sequences from the Ensembl database. The sequences to fetch are determined by giving their genomic locations.

Ensembl Perl API needs to be installed and the enviroment variable PERL5LIB set to include the installation directories for the modules ensembl, ensembl-variation, ensembl-compara, ensembl-functgenomics and BioPerl. See installation instructions from the Ensembl homepage.

Version 1.0
Bundle microarray
Categories Annotation
Authors Ping Chen (ping.chen@helsinki.fi), Marko Laakso (Marko.Laakso@Helsinki.FI), Erkka Valo (erkka.valo@helsinki.fi)
Issue tracker View/Report issues
Requires Ensembl Perl API
Source files component.xml EnsemblAPI.pm EnsemblDNA.pl
Usage Example with default values

Inputs

Name Type Mandatory Description
regions DNARegion Mandatory Genomic locations of the target sequences. The input file should contain ID, strand, chromosome, start and end information for each sequence. The chromosomal start location should be lower than the end location. Sequences can be fetched either from the 1 or -1 strand. If the target sequence is located in the -1 strand, the start site of the target sequence is taken to be the given end site and the end site of the target sequence is taken to be the given start site.
connection Properties Mandatory Connection parameters of Ensembl database, including host, database, port, user and driver information.

Outputs

Name Type Description
sequences FASTA Target sequences. Sequences are returned in 5' to 3' direction in the strand of the target sequence.

Parameters

Name Type Default Description
csvOutput boolean false Output as CSV not FASTA formatted.
length int 0 If not 0, fetch a sequence of length length downstream from the start site in the strand of the target sequence. Note that the start site is first moved according to the value off the offset parameter.
mask boolean false A flag that can be used to activate repeat masking.
offset int 0 Number of base pairs to offset the start site. Negative for upstream, positive for downstream. The offset is done in the strand of the the target sequence.

Test cases

Test case Parameters IN
regions
IN
connection
OUT
sequences
case1 (missing) regions connection sequences
case2 properties regions connection sequences

offset=-5,
length=566

case3 properties regions connection sequences

csvOutput = true


Generated 2018-12-12 07:42:05 by Anduril 2.0.0