Home > Sequencing > General Information

Sequencing: General Information

Investigators have a wealth of study design options available to them using next generation sequencing technologies. We currently offer production scale whole genome, whole exome and custom targeted sequencing services which utilize LIMS tracking, robotic automation, strict QC standards and automated primary, secondary and tertiary analyses.

Depending on access pathway, we include at no additional cost:

Sample pretesting with Illumina QC Array which allows us to:
- Determine SNP genotypes for sample tracking and sequencing variant call QC purposes
- Identify file and/or aliquoting errors (primarily sex and/or Mendel discrepancies)
- Identify unexpected relationships among subjects and confirm expected relationships
- Bacterial contamination testing (whole genome)
- FFPE samples undergo a qulaity QC step prior to processing
- Identify samples that perform poorly
- Identify unexpected duplicate samples and confirm expected duplicates
- Provide PLINK file of pretesting data
- Option to resolve problems found during pretesting phase
Inclusion of study duplicates and positive controls.
Plate Map Design
Ability to use either Build 37 or Build 38 for analysis.
SNVs and indels called and annotated.
Data quality evaluated using a robust alignment and variant calling workflow.
Customizable Data Release and one year project archive.

Sequencing Release Formats

QC Metrics

The QC report is a per-sample report that contains >100 QC metrics, including sequencing completeness and coverage/depth information (such as mean coverage, % targeted bases covered at 20X), quality measures against high density SNP array (heterozygote sensitivity, homozygote and heterozygote concordance with SNP array), variant call quality measures (% dbSNP and TiTv ratio) and a number of laboratory oriented measures (% duplicates, library size, insert size, etc.). Cross-sample contamination is estimated using VerifyBamID. The QC report is used in-house to monitor QC metrics real-time and to quickly evaluate data.

Annotation

ANNOVAR is used for annotation of variants from VCF files. A merged per-sample ANNOVAR report includes information from databases such as RefGene, UCSC, Ensembl, dbSNP131 to 138, OMIM and the NHGRI GWAS catalogue. Global and population-specific non-reference allele frequencies are provided from gnomAD, 1000 Genomes Project and NHLBI Exome Sequencing Project (ESP), and functional prediction scores from CADD, revel and dbNSFP etc.

Variant Calls

CIDR can provide single-sample and/or multi-sample VCF files. SNV and small indels are called using GATK's 3+HaplotypeCaller joint-calling gVCF workflow using all samples sequenced for a project. Depending on the size of the project, variant filtering is performed either using GATK's Variant Quality Score Recalibration (VQSR) protocol or using "hard" cut-offs.

Raw Data

The most raw form of data released are the CRAM files. We prefer to release loss-less CRAM files generated using SAMTools 1.7.

In addition, we release the genotypes obtained from a high density SNP array.

Although our exploration of new analysis software and new algorithms is ongoing, we use an established pipeline for processing sequence data. The pipeline is updated frequently to enhance the quality of alignment and variant calling and details are included in the release.

Please inquire if you have specific questions related to sequencing release formats.