A
Alignment [Sequence Alignment] -
in Bioinformatics, means finding matching fragments in compared sequences of biopolymers (DNA, RNA, proteins, peptides). Each sequence is represented as a string of letters corresponding to the monomer types (4-letter alphabet for nucleic acids and 20-letter alphabet for polypeptides), following their order in the molecule. All sequences aligned must be converted to the same 1-letter notation (either nucleotide or amino acid). The sequences are written horizontally in rows, and empty positions (gaps) are inserted between the letters so that the following columns contain identical or similar characters. The final pattern of matching data positions is called a sequence alignment.
The variations of sequence alignment facilitate many bioinformatics tasks, such as:
• [in proteomics] identification of functional, structural, or evolutionary relationships between proteins based on the similarity of the sequences [see Multiple Sequence Alignment]
• [in genetics] investigating the single-nucleotide polymorphism at a specific position in the genome or population [see MSA for SNP]
• [in transcriptomics] investigating gene expression based on the sequenced RNA [see Transcriptome Assembly]
• [in transcriptomics] identyfication of genes location in the reference genome based on the sequenced RNA [see Genome-guided Assembly]
• [in genomics] reconstructing the genome based on the sequenced DNA fragments [see Genome Assembly]
#RNA-Seq |
#transcriptome assembly |
#genome assembly |
#MSA |
Wikipedia ⤴ |
Alternative Splicing [Differential RNA splicing] -
is the process (* natural in vivo in eukaryotes) of assembling various messenger RNAs (mRNA) from a pool of exons encoded by a single DNA gene. That allows significant compression of genetic material because a single gene code for multiple products, such as protein variants (isoforms) and functional non-coding RNA. The most common mechanism for creating truncated mRNA alternatives is (selected) exon(s) skipping.
#gene expression |
#transcriptome |
#messenger RNA |
Wikipedia ⤴ |
Annotation -
definition
#hashtag1 |
#hashtag2 |
#hashtag3 |
Assembly -
definition
#genome assembly |
#transcriptome assembly |
B
Bioinformatics -
definition
#RNA-Seq |
#sequencing technologies |
#Big Data |
C
Chromosome -
definition
#DNA |
#gene |
#genome |
Contig -
definition
#read |
#scaffold |
#RNA-Seq |
D
Differential Expression -
definition
#gene expression |
#transcriptome |
#RNA-Seq |
Wikipedia ⤴ |
DNA -
definition
#gene |
#gene expression |
#sequencing technologies |
E
Exon -
definition
#gene expression |
#alternative splicing |
Wikipedia ⤴ |
F
G
Gene -
definition
#DNA |
#gene expression |
#chromosome |
Gene Expression -
definition
#transcriptome |
#RNA-Seq |
#alternative splicing |
Genome Assembly -
definition
#DNA |
#sequencing technologies |
#alignment |
Genome Index -
is a data structure of a compressed full-text file containing the reference genome (e.g., .fna file). Using the genome index makes it efficient when searching a substring (e.g., matching reads) in a large text. Programs such as HISAT2 ⤴ (hisat2-build indexer) build the reference genome index using the FM-index ⤴ approach, where the data is both compressed and indexed to reasonably fit within a computer’s memory.
#RNA-seq |
#transcriptome assembly |
#mapping |
#alignment |
#sequencing |
Genomic Reference [Reference Genome] -
definition
#RNA-seq |
#genome-guided alignment |
#transcriptome assembly |
Wikipedia ⤴ |
H
I
Intron -
definition
#exon |
#gene |
#gene expression |
Wikipedia ⤴ |
Isoform [transcript] -
definition
#alternative splicing |
#gene expression |
#messenger RNA |
Wikipedia ⤴ |
J
K
L
Locus -
is a known, fixed position on a chromosome where a specific gene or genetic marker is located.
#chromosome |
#gene |
#DNA |
M
Mapping [sequencing reads to reference genome] -
in RNA-seq analysis, means detecting the presence and location (position) of individual reads (fragments of a sequence obtained from sequencing experiment) within the sequence space of a reference genome. In particular, sequenced RNA can be mapped to the reference genome to identify genes and get information about gene expression.
#RNA-seq |
#reads alignment |
#gene expression |
Multiple Sequence Alignment [MSA] -
definition
#sequence alignment |
#evolutionary insights |
#SNP (polymorphism) |
- MSA for Evolutionary Insights [MSA of Proteins] -
definition
#MSA |
#sequence alignment |
#sequence similarity |
- MSA for SNP [MSA for Single-Nucleotide Polymorphism] -
definition
#MSA |
#sequence alignment |
#nucleotide variant |
Multiplexing [Multiplexed Sequencing] -
definition
#RNA-Seq |
#sequencing technologies |
#optimization | BW tutorial ⤴ |
N
Nuclotide -
definition
#DNA |
#RNA |
#monomer |
O
P
Q
Quality Control [of RNA-Seq data] -
definition
#RNA-Seq |
#read |
#FastQC |
R
Read -
is a text in 1-letter per nucleotide base notation for a single fragment of a sequence [DNA/RNA] obtained from sequencing experiment. Due to the huge size of genetic material (~250 million nucleotide in a single chromosome), a typical sequencing experiment requires fragmentation. The superset of selected/filtered fragments of unknown sequences creates the library. During sequencing, library items are encoded, providing a bulk set of reads (i.e., text strings specifying the order of nucleotides in fragments).
#RNA-Seq |
#sequencing technologies |
#quality control |
#assembly |
BW tutorial ⤴ |
Wikipedia ⤴ |
RNA -
definition
#DNA |
#transcriptome |
#RNA-Seq |
RNA-Seq [RNA Sequencing] -
definition
#gene expression |
#transcriptome assembly |
#NGS (sequencing) |
BW tutorial ⤴ |
Wikipedia ⤴ |
- RNA-Seq Library -
definition
</span>
#RNA-Seq |
#read |
#sequencing technologies |
S
Scaffold -
definition
#contig |
#read |
#RNA-Seq |
Sequence -
definition
#biopolymers |
#sequence alignment |
#RNA - DNA - proteins |
Sequencing Technologies -
definition
#NGS |
#RNA-Seq |
#assembly |
Soft-Clip [of reads] -
means ignoring the terminal fragments (ends) of the reads that do not match perfectly to the reference genome alignment. This procedure enables higher mapping efficiency and facilitates detecting structural variants. However, this also bears the danger of incorrectly trimming the reads, leading to the misassignment of reads to repetitive regions. [Learn more ⤴]
#RNA-seq |
#reads mapping |
#reads trimming |
T
Transcriptome -
definition
#RNA |
#gene expression |
#alternative splicing |
Trnascriptome Assembly -
definition
#RNA-Seq |
#gene expression |
- de novo Assembly [de novo Trnascriptome Assembly] -
definition
#transcriptome |
#no reference genome |
#gene expression |
- Genome-Guided Assembly [Genome-Guided Alignment/Trnascriptome Assembly] -
in RNA-Seq analysis, alignment of sequencing reads to reference genome, means detecting the presence and location (position) of individual reads (fragments of a sequence obtained from sequencing experiment) within the sequence space of a reference genome. In particular, sequenced RNA can be aligned to the reference genome to identify genes and get information about gene expression.
#RNA-seq |
#reads mapping |
#gene expression |
#reference genome |
BW tutorial ⤴ |
Wikipedia ⤴ |