Sequencing Technology
There are now three main sequencing technologies that are available and commonly used: Illumina, PacBio and Oxford Nanopore. Understanding the assumptions and limitations of each of these technologies can aid in planning the experimental design.
Illumina
Illumina raw data are short (100-300bp) in size and of high quality for reads shorter than 200 bps. Quality scores for bases on reads between 250-300bp usually are of significant lower quality. The quality of the read diminishes as the length of the read increases. This trend of of quality does not change with the length of the run.
Number of fragments to expect to pass the filter
The NovaSeq 6000 is Illumina’s latest machine and has significantly higher output than previous generations of sequencing machines. The amount of output is tied to the flow cell type
. The numbers we need to know are below represented as fragments (single or paired). For the paired case, if you want to know the number of reads then just multiply by 2.
NovaSeq 6000 System | M = | Millions of Fragments | ||
---|---|---|---|---|
Flow Cell Type | SP | S1 | S2 | S4 |
Number of fragments | 650–800 M | 1300–1600 M | 3300 - 4100 M | 8000 - 10000 M |
Approximate number of samples you could run with each type of flow cell by application
This assumes 60-80X coverage per genome run.
NovaSeq 6000 System | ||||
---|---|---|---|---|
Flow Cell Type | SP | S1 | S2 | S4 |
3Gb Genomes per Run | 4 | 8 | 20 | 48 |
1Gb Genomes per Run | 12 | 24 | 60 | 144 |
Exomes per Run | 40 | 80 | 200 | 500 |
Transcriptomes per Run | 32 | 64 | 164 | 400 |
Read lengths and output at that read length
Flow Cell Type | SP | S1 | S2 | S4 |
---|---|---|---|---|
1 × 35 bp | No | No | No | 280-350 Gb |
2 × 50 bp | 65–80 Gb | 134–167 Gb | 333–417 Gb | No |
2 × 100 bp | 134–167 Gb | 266–333 Gb | 667–833 Gb | 1600–2000 Gb |
2 × 150 bp | 200–250 Gb | 400–500 Gb | 1000–1250 Gb | 2400–3000 Gb |
2 x 250 bp | 325-400 Gb | No | No | No |
Some useful links to Illumina related information
Video explanation
PacBio
PacBio raw data are long (~13,000-20,000bp) with max read lengths around 300,000 bp.
- HiFi = High Fidelity reads have shorter library insert sizes and the movies are typically longer, resulting in more passes.
- CLR = Continuous Long Reads, read once but capable of reading much longer reads.
System | Gb | Millions of Reads | |
---|---|---|---|
Sequel II | ~100 | ~400 | HiFi |
Sequel II | ~50 | ~40 | CLR |
Sequel I | ~15 | ~0.5 | HiFi |
Some useful links to Pacbio related information
- Table of Application-Options-and-Sequencing-Recommendations
- Preparing samples
- DNA extraction for Pacbio
- Intro to PacBio from UCDavis
- Sequencing rates at a service provider
- Reference for estimated output
Video explanation
Notes
- Multiplex up to 48 microbial samples per SMRT Cell 8M
Nanopore
Nanopore raw data are long (10,000 - 30,000 bp) with the longest confirmed read of 2.3 million bases. Nanopore is the fastest evolving of the three sequencing technologies and therefore this data is continuously becoming outdated. In December of 2020, a huge jump in base calling quality was announced with the mean above Q20 (99.13%) using the base caller Bonito
System | Gb | Millions of Reads |
---|---|---|
Minion | ~40 | ~2.5 |
Promethion | ~180 | 11.5 |
Some useful links to Nanopore related information
- This paper provides a nice overview of MinIon sequencing technologies and uses Paper
- Opportunities and challenges in long-read sequencing data analysis
Video Explanation
Sequencing rates at a service provider
Funding and Cost
Most research has a strict allowance for how much sequencing and bioinformatics can be performed to answer the biological question of interest. An understanding of the following terminology can aid in determining the type and amount of sequencing that is best suited for your biological purpose.
-
Read length:
Short reads (50bp) are difficult to align to unique locations in a genome, so unless the experiment is for smRNA it is uncommon to use very short reads.
-
Paired-end
Both ends of the DNA fragment are sequenced. This type of sequencing is useful for obtaining more unique alignments to a genome For RNA-Seq experiments with a known genome, it is recommended to use at least 100bp paired-end Illumina data. For RNA-Seq experiments without a genome or a genome of questionable quality, it recommended to use 150bp Illumina paired-end data.
-
Single-end
Used when the experiment has DNA fragments shorter than the length of the read. For example, smRNA experiments are typically done with 50bp single-end data.
-
Biological Replicates
It is extremely important to have at least 3 replicates and preferably 5 to 10 replicates for RNA-Seq experiments to determine differential expression
Examples
In the next sections we will go over several example experimental design problems from real world examples.