Trinotate can be used to annotate the transcripts. The files used in this example are as follows:
trinity.fasta
longest_orfs.pep
1
2
3
4
5
6
7
wget https://data.broadinstitute.org/Trinity/Trinotate_v3_RESOURCES/uniprot_sprot.pep.gz
wget https://data.broadinstitute.org/Trinity/Trinotate_v3_RESOURCES/Pfam-A.hmm.gz
gunzip uniprot_sprot.pep.gz
gunzip Pfam-A.hmm.gz
makeblastdb -in uniprot_sprot.pep -dbtype prot
hmmpress Pfam-A.hmm
Next, database searches and predictions were carried out:
If you haven’t run the TransDecoder on your trinity.fasta
, you can run it as follows:
1
2
3
module load transdecoder
TransDecoder.LongOrfs -m 10 -t trinity.fa
# you will need `longest_orfs.pep` for next steps
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
blastx -query trinity.fasta \
-db uniprot_sprot.pep \
-num_threads 8 \
-max_target_seqs 1 \
-outfmt 6 > blastx.outfmt6
blastp -query longest_orfs.pep \
-db uniprot_sprot.pep \
-num_threads 8 \
-max_target_seqs 1 \
-outfmt 6 > blastp.outfmt6
hmmscan --cpu 8 \
--domtblout TrinotatePFAM.out \
Pfam-A.hmm longest_orfs.pep > pfam.log
signalp -f short \
-n signalp.out longest_orfs.pep
tmhmm --short < longest_orfs.pep > tmhmm.out
RnammerTranscriptome.pl --transcriptome ttrinity.fasta \
--path_to_rnammer /usr/bin/software/rnammer_v1.2/rnammer
Trinotate SQLite Database was updated with the new predictions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
wget "https://data.broadinstitute.org/Trinity/Trinotate_v3_RESOURCES/Trinotate_v3.sqlite.gz" -O Trinotate.sqlite.gz
gunzip Trinotate.sqlite.gz
get_Trinity_gene_to_trans_map.pl trinity.fasta > Trinity.fasta.gene_trans_map
Trinotate Trinotate.sqlite init \
--gene_trans_map Trinity.fasta.gene_trans_map \
--transcript_fasta trinity.fasta \
--transdecoder_pep longest_orfs.pep
Trinotate Trinotate.sqlite LOAD_swissprot_blastp blastp.outfmt6
Trinotate Trinotate.sqlite LOAD_swissprot_blastx blastx.outfmt6
Trinotate Trinotate.sqlite LOAD_pfam TrinotatePFAM.out
Trinotate Trinotate.sqlite LOAD_tmhmm tmhmm.out
Trinotate Trinotate.sqlite LOAD_signalp signalp.out
and finally, report was generated as follows:
1
Trinotate Trinotate.sqlite report > trinotate_annotation_report.xls
Using the above report, you can assign GO for your sequences as follows:
1
2
3
4
${TRINOTATE_HOME}/util/extract_GO_assignments_from_Trinotate_xls.pl \
--Trinotate_xls trinotate_annotation_report.xls \
-G --include_ancestral_terms \
> go_annotations.txt
Trinotate does not yet have its own paper but is recommended to cite the following: