Sequencing Technology

There are now three main sequencing technologies that are available and commonly used: Illumina, PacBio and Oxford Nanopore. Understanding the assumptions and limitations of each of these technologies can aid in planning the experimental design.

Illumina

Illumina raw data are short (100-300bp) in size and of high quality for reads shorter than 200 bps. Quality scores for bases on reads between 250-300bp usually are of significant lower quality. The quality of the read diminishes as the length of the read increases. This trend of of quality does not change with the length of the run.

Number of fragments to expect to pass the filter

The NovaSeq 6000 is Illumina’s latest machine and has significantly higher output than previous generations of sequencing machines. The amount of output is tied to the flow cell type. The numbers we need to know are below represented as fragments (single or paired). For the paired case, if you want to know the number of reads then just multiply by 2.

NovaSeq 6000 System			M =	Millions of Fragments
Flow Cell Type	SP	S1	S2	S4
Number of fragments	650–800 M	1300–1600 M	3300 - 4100 M	8000 - 10000 M

Approximate number of samples you could run with each type of flow cell by application

This assumes 60-80X coverage per genome run.

NovaSeq 6000 System
Flow Cell Type	SP	S1	S2	S4
3Gb Genomes per Run	4	8	20	48
1Gb Genomes per Run	12	24	60	144
Exomes per Run	40	80	200	500
Transcriptomes per Run	32	64	164	400

Read lengths and output at that read length

Flow Cell Type	SP	S1	S2	S4
1 × 35 bp	No	No	No	280-350 Gb
2 × 50 bp	65–80 Gb	134–167 Gb	333–417 Gb	No
2 × 100 bp	134–167 Gb	266–333 Gb	667–833 Gb	1600–2000 Gb
2 × 150 bp	200–250 Gb	400–500 Gb	1000–1250 Gb	2400–3000 Gb
2 x 250 bp	325-400 Gb	No	No	No

Sequencing rates at a service provider

Video explanation

PacBio

PacBio raw data are long (~13,000-20,000bp) with max read lengths around 300,000 bp.

HiFi = High Fidelity reads have shorter library insert sizes and the movies are typically longer, resulting in more passes.
CLR = Continuous Long Reads, read once but capable of reading much longer reads.

System	Gb	Millions of Reads
Sequel II	~100	~400	HiFi
Sequel II	~50	~40	CLR
Sequel I	~15	~0.5	HiFi

Video explanation

Notes

Multiplex up to 48 microbial samples per SMRT Cell 8M

Nanopore

Nanopore raw data are long (10,000 - 30,000 bp) with the longest confirmed read of 2.3 million bases. Nanopore is the fastest evolving of the three sequencing technologies and therefore this data is continuously becoming outdated. In December of 2020, a huge jump in base calling quality was announced with the mean above Q20 (99.13%) using the base caller Bonito

System	Gb	Millions of Reads
Minion	~40	~2.5
Promethion	~180	11.5

Nanopore Information

This paper provides a nice overview of MinIon sequencing technologies and uses Paper
Opportunities and challenges in long-read sequencing data analysis

Video Explanation

Sequencing rates at a service provider

Funding and Cost

Most research has a strict allowance for how much sequencing and bioinformatics can be performed to answer the biological question of interest. An understanding of the following terminology can aid in determining the type and amount of sequencing that is best suited for your biological purpose.

Read length:Short reads (50bp) are difficult to align to unique locations in a genome, so unless the experiment is for smRNA it is uncommon to use very short reads.
Paired-end Both ends of the DNA fragment are sequenced. This type of sequencing is useful for obtaining more unique alignments to a genome For RNA-Seq experiments with a known genome, it is recommended to use at least 100bp paired-end Illumina data. For RNA-Seq experiments without a genome or a genome of questionable quality, it recommended to use 150bp Illumina paired-end data.
Single-end Used when the experiment has DNA fragments shorter than the length of the read. For example, smRNA experiments are typically done with 50bp single-end data.
Biological Replicates It is extremely important to have at least 3 replicates and preferably 5 to 10 replicates for RNA-Seq experiments to determine differential expression

Examples

In the next sections we will go over several example experimental design problems from real world examples.

Next Previous Table of contents

Sequencing Technology

Andrew Severin

Sequencing Technology

Illumina

Number of fragments to expect to pass the filter

Approximate number of samples you could run with each type of flow cell by application

Read lengths and output at that read length

Some useful links to Illumina related information

Video explanation

PacBio

Some useful links to Pacbio related information

Video explanation

Notes

Nanopore

Some useful links to Nanopore related information

Video Explanation

Funding and Cost

Examples