Genome Assembly is the process of using DNA sequencing data to generate a representation of bases contained in chromosomes or the full genomes of an organism in the proper order and orientation.
- Introduction to Canu
- Introduction to SPAdes
- Introduction to MaSuRCA
Genome Assembly Examples
- Bacillus thuringiensis data set
- Arabidopsis thaliana data set
Tools for assessing the quality of a Genome Assembly
- GenomeScope to Estimate Genome Size
- Checking a genome for contamination from vectors using UniVec
- Check a genome for PhiX contamination
Tools for Scaffolding assemblies
Genetic Map Construction
Genome Annotation has two separate but related definitions but is often used to mean both:
The process of identifying the location of genes by predicting the coding regions in a genome and generating gene models that represent the structure of a gene (start, stop, intron-exon boundaries, regulatory sequences, repeats).
The process of assigning a function to the gene models (gene names, protein products, domain structure)