Up: Component summary Component

HTMLReport

Visualizes CSV files and their relationships using statically generated HTML files. The primary input are a set of tables that may represent e.g., genes, proteins, SNPs or samples. Each table has a number of attributes (CSV columns) and a number of records (CSV rows). Tables may refer to each other, e.g. an attribute for a protein may name the gene that encodes the protein. References between tables are specified with the "mapping" input. Effectively, the set of tables and mappings between them (foreign keys) constitute a relational database. Records are identified by a single column (primary key) that can be named in the "mapping" input.

For each table, a summary HTML page is generated that shows all records of the table. For each record, a detail page is generated that shows all attributes of the record as well as mappings to other tables. For example, the detail page of a gene may show all proteins encoded by the gene. Mapping to other tables can either be in the form of simple links or as inline records that show a sub-record for each related record. The HTML pages have links to each other when applicable; also, external links can be configured using the "refs" input. Numeric fields can be colorized using the ColorRange column of the "mapping" input and color* parameters. Column names with format PREFIX:NAME are grouped together in the summary page; all neighboring columns with the same prefix belong to the same group. The prefix is displayed as a column group label.

Images (e.g., survival effect of a gene, the 3D structure of a protein) can be attached to records using the "images" and "imageMapping" inputs.

Numeric matrices (e.g., expression values) can be attached to records using the "matrixN" and "matrixMapping" records. Column and/or row names are matched to key values of a table and the correspoding row/column values are printed on the details page of the record. For example, if the column names of matrix1 refer to key values of table1, the row matching a given key is printed on the record details page of the key. If the matrix dimensions are over a specified threshold (matrixThreshold), only distribution statistics are printed.

Version 1.0
Bundle tools
Categories HTML
Authors Kristian Ovaska (kristian.ovaska@helsinki.fi)
Issue tracker View/Report issues
Requires commons-primitives-1.0.jar (jar)
Source files component.xml HTMLReport.java Matrix.java Table.java
Usage Example with default values

Inputs

Name Type Mandatory Description
table1 CSV Mandatory Input table 1
table2 CSV Optional Input table 2
table3 CSV Optional Input table 3
table4 CSV Optional Input table 4
table5 CSV Optional Input table 5
mapping CSV Optional Specifies various properties of tables (tableN), such as key columns, foreign key columns and colorized columns. The following columns may be present; any of them may also be missing.
"Table" gives the table ID in question (tableN); if missing, table1 is used.
"KeyColumn" gives the column name in the table that contains key values; if missing, the first column is used.
"ViewColumn" gives the column name that contains displayable values, such as human-readable gene names; if missing, the key column is used.
"ForeignKeys" specifies connections to other tables. It has the format "MyColumn1=tableM,MyColumn2=tableN", where MyColumn* are column names in the current table and table* are foreign table IDs. For example, values in MyColumn1 match the key column values of tableM.
"SummaryColumns" is a comma-separated list of column names that are included in the summary page; the special value * (default) includes all columns.
"IgnoreColumns" is a comma-separated list of column names that are ignored completely (default: none).
"SortColumns" is a comma-separated list of column names that can be used to sort the summary page (default: none).
"InlineTables" is a comma-separated list of tables (tableN) whose associated records should be printed inline on the same page as column from the primary table.
"InlineColumns" is a list of column names that are printed when the record is an inline part of another record page.
"AliasColumns" is a list of column names that contain aliases for the record. Aliases can be used in the search box on the index page. KeyColumn and ViewColumn are automatically included.
"ColorRange" specifies numeric columns that are colorized. It has the format "MyColumn1=5 12,MyColumn2=-2 10.5", meaning that the range of colors is 5 to 12 for MyColumn1 and -2 to 10.5 for MyColumn2. If the keyword "log" is included in the range specification (e.g., "5 12 log"), base-2 logarithm is taken before color computation; however, the printed value is the original.
matrix1 Matrix Optional Numeric matrix 1
matrix2 Matrix Optional Numeric matrix 2
matrix3 Matrix Optional Numeric matrix 3
matrixMapping CSV Optional Specifies mappings between matrix row/column names and tables. The following columns are defined.
"Matrix" names the matrix input; if omitted, it defaults to "matrix1".
"RowTable" names a table (tableN) whose key values match the row names of the matrix.
"ColumnTable" names a table for the column names. If a matrix is not bound to any table using either row or column names, the matrix is not used in the report.
images BinaryFolder Optional Directory containing images and other downloadable files (e.g., PDFs) that are shown on the details page of a record.
imageMapping CSV Optional Specifies how images in the "images" input are mapped to records. The following columns may be present.
"Table" gives the table ID in question (default: table1).
"ImageFile" gives the file name in the "images" folder.
"Target" is an ID that matches the primary key of the table; together with "ImageFile", this binds an image to a specific record.
"Label" gives a human readable label for the image; if missing, an empty label is used.
refs CSV Optional Formatting rules for external URL references. Contains three columns: "Table" (table ID), "Column" (name of the column in the given table) and "URL" (URL pattern containing an $ID$ tag that is replaced with cell contents). The specified column in the specified table contains HTML links to the external resource.
labels CSV Optional Contains human-readable labels for table columns, table query boxes and other places. The following columns must be present. "Table" gives the table ID (tableN), "Column" gives the column name and "Label" gives the label for the column. When the column is the special value "_QUERY", the entry gives the query box label on the main page. When the table is "_GLOBAL" and the column is "_DESCRIPTION", the entry gives a global description of the site that is shown on the main page. Labels for color slides can be configured using columns "LowLabel", "MiddleLabel" and "HighLabel".

Outputs

Name Type Description
report HTML Set of HTML pages containing table summaries, detail pages of records and associated files such as images.

Parameters

Name Type Default Description
colorEnd string "#ff0000" For color slides, this is the ending color (high limit).
colorMiddle string "#ffffff" For color slides, this is the middle color.
colorStart string "#00ff00" For color slides, this is the starting color (low limit).
digits int 2 Number of significant digits for printed numeric values.
includeSummaries string "*" Comma-separated list of table IDs whose summaries are printed. The special value * prints summaries of all tables.
matrixLabels string "" Comma-separated list of human-readable labels for matrices. If empty values are present, default labels (matrixN) are used. Even if custom labels are given, matrices are always referred to using matrixN in configuration files.
matrixThreshold int 10 The maximum number of matrix values that are printed on the details page of a record. If the matrix dimension is larger than this, statistics on the values are printed.
missingValue string "NA" Gives the string that is used in HTML pages for missing values (NA).
omitMissing boolean true If true, omit missing (NA) attributes in the record details page; the attribute is still printed in the summary page. If false, always print missing values.
recordsPerPage string "100" For the summary page, this gives the maximum number of records (rows) per page. If more records are present, they are split into multiple pages.
tableLabels string "" Comma-separated list of human-readable labels for tables. For example, "Gene,Protein,SNP" would indicate that table1 represents genes, table2 proteins etc. If empty values are present, default labels (tableN) are used. Even if custom labels are given, tables are always referred to using tableN in "mapping" and other configuration files.

Test cases

Test case Parameters IN
table1
IN
table2
IN
table3
IN
table4
IN
table5
IN
mapping
IN
matrix1
IN
matrix2
IN
matrix3
IN
matrixMapping
IN
images
IN
imageMapping
IN
refs
IN
labels
OUT
report
case1 properties table1 table2 table3 table4 (missing) mapping matrix1 matrix2 (missing) matrixMapping images imageMapping refs labels report

tableLabels=Gene,Transcript,,Pathway,
missingValue=-,
includeSummaries=table1,table3,table4,
recordsPerPage=10,
matrixLabels=GeneExpression,TranscriptExpression,
matrixThreshold=4,
metadata.timeout=0

case2_onetable (missing) table1 (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing)
case3 properties table1 (missing) (missing) (missing) (missing) mapping (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing)

metadata.timeout=0


Generated 2018-12-12 07:42:06 by Anduril 2.0.0