Up: Component summary Function


Runs STAR aligner in two pass mode for an array of samples together. The two pass mode means that the samples are aligned to the reference genome provided and STAR will create a list of identified splice junctions in each sample. In the in-build STAR two pass mode, those splice junctions are used to improve the reference genome and the samples are realigned to this enhanced genome. However, in this function we pool together the splice junctions of all the samples to make one single enhanced genome to which all the samples are aligned to. If you do not want to pool together the splice junctions of different samples you can run the two pass mode by using the STAR component or Align function and adding the two pass mode option (according to STAR manual) as a parameter. You can control which splice junctions are included by using the lowBound and highBound parameters. For example, lowBound="UniqueMapping=5" discards all splice junctons which do not have at least 5 uniquely mapped reads overlapping the junction. Columns available in the splice junctions file are "Chromosome,Start,End,Strand,IntronMotif,Annotated,UniqueMapping,MultiMapping,MaxOverhang". Only splice junctions from canonical chromosomes (1-22,X,Y) are kept. It is recommended to supply your own initial genome because if you need to rerun any parameters related to genome generation are used both for the first and second pass genome. lowBound and highBound parameters work exactly as defined in the CSVFilter component on each individual splice junctions file.

Version 1.0
Bundle sequencing
Categories Alignment
Authors Alejandra Cervera (alejandra.cervera@helsinki.fi)
Issue tracker View/Report issues
Requires STAR
Source files component.xml function.scala
Usage Example with default values


Name Type Mandatory Description
reference BinaryFile Mandatory Reference genome.
reads Array<BinaryFile> Mandatory FASTA or FASTQ file containing reads for the alignment.
mates Array<BinaryFile> Optional FASTA or FASTQ file containing mates. Required for paired end data.
genome BinaryFolder Optional A STAR genome for first pass.
annotation BinaryFile Optional Genome annotation. A GTF file will work by default. For a GFF3 file add to genomeParameters: "--sjdbGTFtagExonParentTranscript Parent"
parameterFile TextFile Optional This file overrides default STAR parameters, but will itself be overridden by the command line. Use parametersDefault from STAR source as template. It is needed unless you have specified every single parameter (even default ones as a parameter string).
custom TextFile Optional if you want to add custom parameters, aka sample specific such as readgroups, provide them here as they should be added to the STAR call (only for second pass). It must have two columns: Key and Custom. The keys should match the input keys.


Name Type Description
folder Array<BinaryFolder> All files created by STAR in the output folder.
alignments Array<BAM> (Sorted) alignment. A coordinate sorted file will be indexed, i.e. there is a .bai file.
spliceJunctions CSV Splice junctions. This CSV file is created by adding a header to STAR output. ("Chromosome\tStart\tEnd\tStrand\tIntronMotif\tAnnotated\tUniqueMapping\tMultiMapping\tMaxOverhang"):
  1. Column 1: chromosome
  2. Column 2: first base of the intron (1-based)
  3. Column 3: last base of the intron (1-based)
  4. Column 4: strand
  5. Column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT
  6. Column 6: 0: unannotated, 1: annotated (only if splice junctions database is used)
  7. Column 7: number of uniquely mapping reads crossing the junction
  8. Column 8: number of multi-mapping reads crossing the junction
  9. Column 9: maximum spliced alignment overhang
Of these, the following awk expression is relevant for 2-pass STAR: "if($5>0){print $1,$2,$3,strChar[$4]}}" In other words, chromosome and intron boundaries known, and intron motif is classified. Rest of the columns are useful for filtering interesting junctions in 2-pass STAR.


Name Type Default Description
alignParameters string "" Parameters passed on to STAR on the second alignment step.
execute string "changed" Change it to "once" if you do not want to re-execute the first pass and making genome steps if you change any parameter (such as threads)
genomeLoad string "LoadAndRemove" LoadAndRemove works for parallel STAR instances and if everything goes fine, should free memory after the last STAR exits. LoadAndKeep, LoadAndRemove, Remove, LoadAndExit and NoSharedMemory are the options.
genomeParameters string "" Parameters passed on to STAR in any of the two possible generating genome steps.
highBound string "" Same as lowBound but for max values (instead of min).
lowBound string "" For subsetting the splice junctions files used for generating the second pass genome. Define column name and threshold, ex: UniqueMapping=5.
mainAlignmentType string "" Depending on thparameters more than one alignment may be produced (ex. sortedByCoord or toTranscriptome). The alignment not selected will still be available in the folder output. The string defined here will define which alignment will be linked to the alignment output of this component.
memory int 10000 Memory passed to STAR call.
readFilesCommand string "" Used when input reads are compressed, ex. zcat or acat
threads int 1 Number of threads passed to STAR.
useEncodeParams boolean true set the parameters used for the Encode project specified in the manual (max and min intron size and max number of multiple alignments)

Test cases

Test case Parameters IN
case1 properties reference reads mates (missing) (missing) (missing) (missing) (missing) (missing) (missing)


case2 properties reference reads mates (missing) (missing) (missing) custom (missing) (missing) (missing)


Generated 2018-12-17 07:42:41 by Anduril 2.0.0