Analysis Tools



  The Isaac aligner is an ultrafast DNA sequence aligner, designed to align next-generation sequencing data with low-error rates (single or paired-ends). It is four to five times faster than BWA + GATK on equivalent hardware, with comparable accuracy. The Isaac aligner was developed by illumina, Inc.

  Please refer to the paper below for more information.

Raczy C, Petrovski R, Saunders CT, Chorny I, Kruglyak S, Margulies EH, Chuang HY, Kallberg M, Kumar SA, Liao A, Little KM, Stromberg MP, Tanner SW. Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 2013, 29(16), 2041-2043

  Strelka is an analysis method to accurately detect germline variation in small cohorts and somatic variation in tumor-normal pairs. The germline caller makes use of a well tiered haplotype model to improve accuracy and supply read-backed phasing. It also analyses input sequencing data, and reduces the indel noise, inconsistent alignment and incorrect read mapping by estimating the indel error and noise rates.

  More information can be found here:

https://support.illumina.com/content/dam/illumina-support/help/BaseSpace_App_WGS_v7_OLH_15050955_04/Content/Source/Informatics/Apps/Strelka_appWGS.htm

  SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes). Using this tool, we follow the annotation cascade shown below.

  SnpEff can generate the following results :

  (1) Gene annotation based on hg38 coordinates

  (2) dbSNP138 ID mapping

  (3) dbSNP151 ID mapping

  (4) 1000 Genomes phase 3 mapping

  (5) ESP6500 data mapping

  (6) CLINVAR data mapping

  More information can be found here:

http://snpeff.sourceforge.net/SnpEff.html

  Control-FREEC is a tool which enables automatic calculation of copy number and allelic content profiles, and consequently predicts regions of genomic alterations such as gains and losses. It accurately calls genotype status even when no control experiment is available. It also corrects for GC-content mappability biases of the polyploid genomes.

  More information can be found here:

  Boeva, V.; Popova, T.; Bleakley, K.; Chiche, P.; Cappo, J.; Schleiermacher, G.; Janoueix-Lerosey, I.; Delattre, O.; Barillot, E. Control-freec: A tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 2012, 28, 423-425.

  PennCNV is a tool which identifies overlapping or neighboring genes for Copy Number Variation(CNV) annotation. The scan_region.pl program, one of the pennCNV packages efficiently searches for overlapping CNV calls with UCSC known gene annotation. The output file contains two additional columns representing the gene symbols and the distance between CNV and gene.

  More information can be found here:

http://penncnv.openbioinformatics.org/en/latest/user-guide/annotation/

  Manta is a tool to call structural variants and indels from short paired-end sequencing reads. It combines paired-end and split read evidence during SV discovery and scoring to improve performance.

  However, it does not require split reads or successful breakpoint assemblies to report a variant in cases where there is strong evidence of an imprecise variant. It provides genotypes and quality scores for variants in single diploid samples, and will also call somatic variants when a matched tumor sample is specified. Manta can detect all classes of structural variants which can be identified in the absence of copy number analysis and large-scale assembly.

  This tool was developed specifically to work with Isaac alignment and its performance was verified in the recent ICGC-TCGA DREAM Mutation Calling Challenge.

https://www.synapse.org/#!Synapse:syn312572

  More information can be found here:

https://github.com/StructuralVariants/manta

  Circos is a tool for visualizing data into a circular layout. The number and types of variants or relationship between chromosomes can be represented by tracks.

  More information can be found here:

  Krzywinski, M.; Schein, J.; Birol, I.; Connors, J.; Gascoyne, R.; Horsman, D.; Jones, S.; Marra, M. Circos: An information aesthetic for comparative genomics. Genome Research 2009, 19:1639-1645



◦ Tool Version

Software Version
Isaac Aligner 04.18.11.09
Strelka 2.9.10
SnpEff 4.3t
Control-FREEC 11.5
PennCNV 1.0.5
Manta 1.5.0
Circos 0.69-6


◦ Tool Parameters

Software Parameter Value Remark
Isaac Aligner --base-quality-cutoff 15 3' end quality soft-clipping cutoff
--keep-duplicates 1 Does not remove duplicated reads
--default-adapters AGATCGGAAGAGC*, *GCTCTTCCGATCT
SnpEff Source hg38
dbSNP138, dbSNP151
1000 Genomes Phase 3
ESP6500
CLINVAR
Control-FREEC forceGCcontent Normalization 1 Corrects the Read Count (RC) for GC-content bias
ploidy 2 Genome ploidy
sex XY Sample sex
window 10000 Calculation window size
mateOrientation FR FR: illumina paired-ends
PennCNV Source refGene, refLink

• Software not listed in the table uses all default settings


◦ Database Version

Software Version