Upon completion of this section the learner will know the definitions for the following terms.
What is a base?
There are four common bases in DNA sequence,
Uracil is found in RNA in place of Thyamine
Image taken from wikipedia where more information about nucleotides can also be found.
What is a read?
A read is a string of bases represented by their one letter codes. Here is an example of a read that is 50 bases long.
What is a contig?
A contig is the consensus sequence generated by aligning reads to themselves.
The last line is the consensus of the aligned reads. We call this consensus sequence a contig.
What is a scaffold?
A scaffold is a set of contigs that have been ordered and oriented based on mate pair or long distance information.
In the line above
contigis a string of of bases (ATC or G)
- N is an unknown base
gitnocis the word contig written backwards to represent the reverse complement of a contig
What is a chromosome?
Chromosomes are the largest DNA molecules in a cell. Scaffolds can be ordered and oriented using a genetic map or Hi-C data into linkage groups or chromosomes. The ultimate goal of a genome assembly project is to assemble reads into phased chromosomes that represent an actual individual. Most chromosomal assemblies produced today are not phased or may represent multiple individuals. The closest to this ideal that science has produced to date is through Trio binning.