Up: Component summary Component

AgilentReader

Imports data from microarray text files such as Agilent CSV files. The input is a directory of CSV files, each of which represents a two-channel or one-channel microarray. Three types of data can be extracted: numeric matrices containing channel or log-ratio values, probe annotations and sample-specific annotations. This component does not do normalization such as background correction. Binary files can not be processed by this component.

The outputs green, green2, red, and red2 are matrices, each containing values from one column in each CSV file. The columns for these matrices are named with the parameter channelColumns. In output matrices, each column represents one microarray (i.e., one input CSV file) and each row represents a probe. Interpretation of these matrices depends on the columns selected. The default settings, geared towards Agilent two-channel arrays, extract normalized values into "green" and "red" and raw values into "green2" and "red2". However, by modifying channelColumns, it is also possible to extract raw foreground values into "green" and "red" and background values into "green2" and "red2", for example. For one-channel arrays, "red" and "red2" are not used.

If combineProbes=true, probes that have multiple copies on the microarray are combined using median so that the output contains a unique value for each probe. Probes IDs are taken from the column named by idColumn. If combineProbes=false, probe IDs are renamed to internal unique IDs that can be mapped to original probe IDs using the first two columns of probeAnnotation.

Input rows can be filtered using the filter parameter. This enables to remove control and bad quality probes from the output. The default values are conservative in order to remove possible false positives; if the results are independently validated or the experiment setup includes dye-swap or the, less strict filtering may be applied.

Probe annotations are columns from input CSV files that do not depend on the sample being processed. Common annotation include gene name, textual description and nucleotide sequence. Sample annotations are columns from input CSV files that may depend on the sample and the channel. Sample annotations are defined using three parameters: sampleAnnotation gives output columns, sampleAnnotationChannel1 gives input column names for channel 1 and sampleAnnotationChannel2 for channel 2 (if present).

Version 1.0.1
Bundle microarray
Categories Data Import Agilent
Authors Kristian Ovaska (kristian.ovaska@helsinki.fi)
Issue tracker View/Report issues
Requires commons-math3-3.2.jar (jar) ; commons-primitives-1.0.jar (jar) ; jep-2.4.1.jar (jar)
Source files component.xml AgilentReader.java
Usage Example with default values

Inputs

Name Type Mandatory Description
agilent AgilentDirectory Mandatory Agilent source file directory.
sampleNames CSV Mandatory Sample definitions. The table contains the columns GreenSampleID (sample ID for the sample on green channel), GreenDescription (human-readable description for the sample), RedSampleID, RedDescription, Filename (key; relative to the Agilent source directory).

Outputs

Name Type Description
green LogMatrix Green channel, primary values. Source column is the first element of channelColumns.
green2 LogMatrix Green channel, secondary values. Source column is the second element of channelColumns. Depending on the column selected, these may be raw (unprocessed) values or background values.
red LogMatrix Red channel, primary values. Source column is the third element of channelColumns.
red2 LogMatrix Red channel, secondary values. Source column is the fourth element of channelColumns. Depending on the column selected, these may be raw (unprocessed) values or background values.
probeAnnotation AnnotationTable Probe annotations that are not sample-dependent. If combineProbes=false, the first two columns are InternalProbe and original probe ID (column name given with idColumn); all probes having the same value in the second column represent duplicates of the same probe. If combineProbes=false, the first column is the unique probe ID column. The rest of the columns are specified using probeAnnotation.
sampleAnnotation CSV Sample-dependent annotations. The first three columns ("SampleID", probe-id, "Index") uniquely identify the row. The rest of the column are specified by the parameter sampleAnnotation.
groups SampleGroupTable Sample group table that is generated based on sampleNames. All groups have the type sample.

Parameters

Name Type Default Description
channelColumns string "gProcessedSignal,gMedianSignal,rProcessedSignal,rMedianSignal" Column names for matrix extraction, in the order green, green2, red, red2. Empty values may be omitted, so "col1" is the same as "col1,,,". The default values, for Agilent two-channel arrays, extract preprocessed values into "green" and "red" and raw values into "green2" and "red2".
combineProbes boolean true If true, duplicate probes (having the same sequence) are combined into one using median. If false, duplicate probes are present in the output, with unique internal names.
filter string "ControlType!=0 || gIsSaturated==1 || rIsSaturated==1 || gIsWellAboveBG==0 || rIsWellAboveBG==0 || gIsFeatPopnOL==1 || rIsFeatPopnOL==1 || gIsBGPopnOL==1 || rIsBGPopnOL==1" Rows in source files matching this Boolean expression are excluded from the result. The expression can refer to any cell value of the current row using column names. Boolean and arithmetic operators and parenthesis as defined in Java are available. For example, "ControlType!=0 || (gIsSaturated==1 && rIsSaturated==1)" removes probes that are either control probes or are saturated on both green and red channels; ControlType, gIsSaturated and rIsSaturated must be valid columns in input files.
idColumn string "ProbeName" Column name in input CSV files that gives the probe ID. Features having the same probe ID are assumed to be copies of the same probe.
probeAnnotation string "GeneName,Description,Row,Col" Comma-separated list of column names in input CSV files that contain probe annotation. These are extracted to the probeAnnotation output.
sampleAnnotation string "" Comma-separated list of sample annotation columns for output. These column names appear in the sampleAnnotation output. These columns are not queried in input CSV files; rather, sampleAnnotationChannel1 and sampleAnnotationChannel2 define the column names in input files and this parameter gives the corresponding output column names.
sampleAnnotationChannel1 string "" Comma-separated list of sample annotation columns for channel 1 in the input files. The value for channel 1 is extracted from these columns. The list must have equal length to the sampleAnnotation list.
sampleAnnotationChannel2 string "" Comma-separated list of sample annotation columns for channel 2 in the input files. The value for channel 2 is extracted from these columns. The list must have equal length to the sampleAnnotation list. If empty, it is assumed that the array has one channel and annotations for channel 2 are not processed.
startPattern string "\"?FEATURES\"?\t" Regular expression that identifies the start of content in input CSV files. This allows to skip some content from the beginning of files. The pattern is matched to the start of each line. The matching line must be a header that contains column names.
useColumnNameMatch boolean false Instead of startPattern matching to find the start of the actual expression data, useColumnNameMatch=true selects the line that has all the channel column names (from channelColumns) to be the start of the data.

Test cases

Test case Parameters IN
agilent
IN
sampleNames
OUT
green
OUT
green2
OUT
red
OUT
red2
OUT
probeAnnotation
OUT
sampleAnnotation
OUT
groups
case1 properties agilent sampleNames green green2 red red2 probeAnnotation sampleAnnotation groups

probeAnnotation=GeneName,Description,Row,Col,ControlType,
sampleAnnotation=Saturated,
sampleAnnotationChannel1=gIsSaturated,
sampleAnnotationChannel2=rIsSaturated,
filter=

case2 properties agilent sampleNames green green2 red red2 probeAnnotation sampleAnnotation groups

probeAnnotation=GeneName,Description,Row,Col,ControlType,
sampleAnnotation=Saturated,Row,Col,
sampleAnnotationChannel1=gIsSaturated,Row,Col,
sampleAnnotationChannel2=rIsSaturated,Row,Col,
filter=

case3_filter properties agilent sampleNames green green2 red red2 probeAnnotation sampleAnnotation groups

probeAnnotation=GeneName,Description,Row,Col,ControlType,
sampleAnnotation=Saturated,Row,Col,
sampleAnnotationChannel1=gIsSaturated,Row,Col,
sampleAnnotationChannel2=rIsSaturated,Row,Col,
filter=gIsSaturated==1 && rIsSaturated==1

case4_onematrix properties agilent sampleNames green green2 red red2 probeAnnotation sampleAnnotation groups

channelColumns=,,rProcessedSignal,,
probeAnnotation=GeneName,Description,Row,Col,ControlType,
sampleAnnotation=Saturated,Row,Col,
sampleAnnotationChannel2=rIsSaturated,Row,Col,
filter=

case5_nocombine properties agilent sampleNames green green2 red red2 probeAnnotation sampleAnnotation groups

probeAnnotation=GeneName,Description,Row,Col,ControlType,
sampleAnnotation=Saturated,Row,Col,
sampleAnnotationChannel1=gIsSaturated,Row,Col,
sampleAnnotationChannel2=rIsSaturated,Row,Col,
filter=gIsSaturated==1 && rIsSaturated==1,
combineProbes=false


Generated 2018-12-17 07:42:18 by Anduril 2.0.0