Up: Component summary Function


Creates reference files for gene features and known ncRNAs and creates additional columns for a putatively novel miRNA expression matrix with information on relative genomic location (e.g. intragenic/intergenic, host gene and transcript) and neighbouring or overlapping ncRNAs.

Version 1.0
Bundle sequencing
Categories Novel smallRNA
Authors Katherine Icay (katherine.icay@helsinki.fi)
Issue tracker View/Report issues
Requires biomaRt (R-package)
Source files component.xml function.scala
Usage Example with default values


Name Type Mandatory Description
expression CSV Mandatory Expression matrix of putative novel miRNAs with the first four columns containing the following information: chromosome, start position, end position, and strand.
gtf BinaryFile Mandatory Ensembl genes GTF file, unformatted. Contains transcript and exon locations.


Name Type Description
annotated CSV Expression matrix of putative novel miRNAs with additional information on relative genomic location, host gene, and neighbouring (or overlapping) known ncRNAs.


Name Type Default Description
ensembl_dataset string "hsapiens_gene_ensembl" biomaRt dataset parameter (i.e. species) to use.
ensembl_host string "feb2014.archive.ensembl.org" URL of Ensembl version to use (see Ensembl Archives). To guarantee optimal identification of transcripts, be sure to use the same genome build AND version of the genome as reference_hairpin.
knownFeatures string "" Path to a previously formatted tab-delimited file containing information on known ncRNAs to be included in the nearest-neighbour analysis. The file must have the following columns for the component to work: chr, start, end, strand, biotype, geneID, Name. This option is purely to speed up the component, which has a much longer run time when this option is not provided and it must process the gtf file.

Generated 2018-12-16 07:42:17 by Anduril 2.0.0