Up: Component summary Component

BiomartAnnotator

Fetches attributes with given filters using BioMart. This component uses the R-function getBM to fetch given attributes for a list of filter values. See documentation of the R-package biomaRt.

There are two modes of operation when input filter is given: one filter row at a time (batchSize=1) and all filter values in blocks (batchSize>1). In batchSize=1 mode, there is always a unique row for each filter row in the result file. In batchSize>1 mode, there may be several result rows for one filter row, rendering it difficult to interpret results. This happens when querying genes associated to Gene Ontology terms, for instance. However, when annotating, e.g., a gene or a microarray probe, there typically is exactly one row for a filter value. Batch mode is significantly faster than non-batch mode. In batch mode, the filter must also be included as an attribute so that results can be mapped to input query values. This is set with idAttribute, which defaults to the filter type of the filter.

The query to BioMart database can also be made defining the filters using the constantFilters parameter. This must be done using the non-batch mode (batchSize=1)

There can be more than one filter, except when batchSize>1, which is currently limited to one filter.

If one filter value produces multiple attribute values for one attribute those values are collapsed into a comma separated list. NA and duplicate values are removed from the filter value list before querying the BioMart database.

Available databases, filters and attributes can be browsed through BioMart web site. You may select the mart database and use the query tool to select the settings of interest. The actual keywords can be seen in the XML output that can be generated based on the selections.

For convenience, here are some Mart lists current as of 2013-10: all marts, datasets in ensembl, attributes in hsapiens_gene_ensembl, filters in hsapiens_gene_ensembl.

Version 1.3
Bundle microarray
Categories Annotation
Authors Erkka Valo (erkka.valo@helsinki.fi), Viljami Aittomaki (viljami.aittomaki@helsinki.fi), Kristian Ovaska (kristian.ovaska@helsinki.fi), Marko Laakso (Marko.Laakso@Helsinki.FI)
Issue tracker View/Report issues
Requires libssl-dev (DEB) ; biomaRt (R-bioconductor) ; RCurl (R-package)
Source files component.xml BiomartAnnotator.r
Usage Example with default values

Inputs

Name Type Mandatory Description
filter CSV Optional A list of filter values

Outputs

Name Type Description
annotations AnnotationTable Attributes returned from the database with given filters
databases Properties A properties file which lists the database version and the dataset used for fetching annotations.

Parameters

Name Type Default Description
attributes string (no default) A comma separated list of attributes to fetch. See biomaRt documentation on how to list available attributes for a given mart and a dataset in R.
batchSize int 1 If greater than one, enable batch mode where all filter values are fetched with one query. This is significantly faster than non-batch mode (=1), but in some instances there may be several result rows for one filter value. If 1, filter values are fetched individually.
constantFilters string "" A comma separated list of filterType=filterValue pairs that are common for all input rows when input filter is given. These can be used also as the only filters without the input filter.
dataset string "hsapiens_gene_ensembl" Dataset to get annotations from. Different BioMart databases (marts) have their own datasets. See biomaRt documentation on how to list available datasets for a mart in R.
filterColumns string "" Names of the filter column within filter file or an empty string for the first column(s).
filterTypes string "" Types of the filter values in the filter input, as a comma-separated list. See biomaRt documentation on how to list available filters for a given mart and a dataset in R.
idAttribute string "" For batchSize>1 mode, this is the name of the ID attribute that produces values that correspond to filter IDs. If empty, the value of filterType is used. Often, the name of the filter and the corresponding attribute are identical, in which case the default (empty) value can be used. This parameter is not used for batchSize=1 mode.
listLayout boolean true Result format is either lists (true), in which multiple hits are collapsed for comma-separated list for each column, or standard CSV-file having no collapsed columns.
mart string "ensembl" BioMart database to use. See biomaRt documentation on how to list available BioMart databases (marts).
martHost string "www.ensembl.org" Mart hosting server
martPath string "/biomart/martview" Mart web service URL within the server
uniq boolean true Removes duplicates from the values of individual result cells. Different filter entities may still produce references to the same attribute values.

Test cases

Test case Parameters IN
filter
OUT
annotations
OUT
databases
case01 properties filter annotations databases

attributes=allele,chr_name,chrom_start,chrom_strand,ensembl_gene_stable_id,validated,
filterTypes=snp_filter,
dataset=hsapiens_snp,
mart=snp

case02_filtercolumn properties filter annotations databases

attributes=allele,chr_name,chrom_start,chrom_strand,ensembl_gene_stable_id,validated,
filterColumns=rsID,
filterTypes=snp_filter,
dataset=hsapiens_snp,
mart=snp

case03_batch properties filter annotations databases

attributes=allele,chr_name,chrom_start,chrom_strand,ensembl_gene_stable_id,validated,
filterTypes=snp_filter,
idAttribute=refsnp_id,
dataset=hsapiens_snp,
mart=snp,
batchSize=1000

case04_multifilter properties filter annotations databases

attributes=ensembl_gene_id,external_gene_id,
filterTypes=chromosome_name,start,end,
dataset=hsapiens_gene_ensembl,
mart=ensembl

case05_noresults properties filter annotations databases

attributes = seq_region_start_1057,feature_type_name_1057,
dataset = hsapiens_feature_set,
filterColumns = chromosome,start,end,
filterTypes = reg_chromosome_name,ann_start,ann_end,
mart = functional_genomics

case06_COSMIC properties filter annotations databases

attributes =accession_number,
filterTypes=gene_name,
dataset =COSMIC67,
mart =CosmicMart,
martHost =cancer.sanger.ac.uk,
martPath =/biomart/martservice

case07_constants properties filter annotations databases

attributes = ensembl_gene_id,external_gene_id,
filterTypes = start,end,
dataset = hsapiens_gene_ensembl,
mart = ensembl,
constantFilters = chromosome_name=1,
uniq = false

case08_emptyinput properties filter annotations databases

attributes=allele,chr_name,chrom_start,chrom_strand,ensembl_gene_stable_id,validated,
filterTypes=refsnp,
dataset=hsapiens_snp,
mart=snp

case09_layout properties filter annotations databases

attributes=allele,chr_name,chrom_start,chrom_strand,ensembl_gene_stable_id,validated,
filterTypes=snp_filter,
idAttribute=refsnp_id,
dataset=hsapiens_snp,
mart=snp,
batchSize=5,
listLayout=false

case10_batch_collapse properties filter annotations databases

attributes = id_mutation,
filterTypes = uniprot_swissprot,
dataset = COSMIC67,
mart = CosmicMart,
martHost = cancer.sanger.ac.uk,
martPath = /biomart/martservice,
batchSize = 2,
uniq = false,
constantFilters = tumour_source=recurrent

case11_constants_only properties (missing) annotations databases

attributes = ensembl_gene_id,ensembl_transcript_id,chromosome_name,gene_biotype,
constantFilters = chromosome_name=Y,biotype=snoRNA

case12_list_layout_multifilter properties filter annotations databases

attributes=refsnp_id,allele,chr_name,chrom_start,validated,
dataset=hsapiens_snp,
filterTypes=chr_name,chrom_start,chrom_end,
listLayout=false,
mart=snp,
uniq=true


Generated 2018-12-11 07:42:06 by Anduril 2.0.0