Up: Component summary Function

AlignStats

Evaluate sequencing data, especially RNA-seq data quality using RSeQC.

For overall sequence quality statistics, it outputs:
- The statistics CSV table from samtools flagstats and RSeQC (read_distribution.py)
- Latex report with more detailed information from selected modules. By default bam_stat.py - Calculate reads mapping statistics is included.

In the modules parameter the desired modules can be specified by using the number of the desired module according to the following list:

1. read_quality - Quality based on Phred score, output in boxplot or heatmap
2. read_duplication - Reads with exactly the same sequence content or mapped to the same genomic location
3. read_GC.py - GC content of reads
4. geneBody_coverage2.py - Read coverage over gene body
5. inner_distance.py - Calculate the inner distance (or insert size) between two paired RNA reads
6. junction_annotation.py - Annotated and novel junctions
7. junction_saturation.py - Check if the current sequencing depth deep enough to perform alternative splicing analyses
8. infer_experiment.py - Check strand specificity
9. RNA_fragment_size.py - Calculate fragment size for each gene/transcript
10. tin.py - Evaluate RNA integrity at transcript level

Version 1.2
Bundle sequencing
Categories
Authors Alejandra Cervera (alejandra.cervera@helsinki.fi), Ping Chen (ping.chen@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml function.scala
Usage Example with default values

Inputs

Name Type Mandatory Description
alignment BAM Mandatory The aligned RNA-seq reads in SAM or BAM format.
reference FASTA Mandatory Reference Genome in fasta format. The reference file folder should contain *.fai and *.dict.
annotation GTF Mandatory GTF file defining transcripts. Make sure the contig names ("1","2", etc or "chr1","chr2", etc) are the same as those in BAM.
refgene BED Optional Reference gene model in BED format. For speedier results only housekeeping genes can be provided. If input is not given then the annotation file supplied is transformed to bed format.
chromsize TextFile Optional Chromosome size file. Tab or space separated text file with 2 columns: first column is chromosome name, second column is size of the chromosome.
log BinaryFolder Optional The stats from TopHat or STAR aligner to be included in the report

Outputs

Name Type Description
report HTMLFile Figures
stats CSV Sequence quality statistics.

Parameters

Name Type Default Description
memory int 35000 Memory passed to geneBodyCoverage module
modules string "1,2,3,4,5,6,7,8,9,10" Prints the help message and exits.
sample string "SampleID" Identifier for the sample; useful when joining statistics tables from different samples

Test cases

Test case Parameters IN
alignment
IN
reference
IN
annotation
IN
refgene
IN
chromsize
IN
log
OUT
report
OUT
stats
case1 (missing) alignment reference annotation (missing) chromsize (missing) (missing) (missing)

Generated 2018-12-11 07:42:07 by Anduril 2.0.0