Running Trinotate for annotating the transcripts:

Trinotate can be used to annotate the transcripts. The files used in this example are as follows:

  1. Input fasta file trinity.fasta
  2. Databases:
    • uniprot_sprot.pep.gz
    • Pfam-A.hmm.gz
  3. TransDecoder output: longest_orfs.pep

Database downloads

1
2
3
4
5
6
7
wget https://data.broadinstitute.org/Trinity/Trinotate_v3_RESOURCES/uniprot_sprot.pep.gz
wget https://data.broadinstitute.org/Trinity/Trinotate_v3_RESOURCES/Pfam-A.hmm.gz
gunzip uniprot_sprot.pep.gz
gunzip Pfam-A.hmm.gz
makeblastdb -in uniprot_sprot.pep -dbtype prot
hmmpress Pfam-A.hmm

Next, database searches and predictions were carried out: If you haven’t run the TransDecoder on your trinity.fasta, you can run it as follows:

TransDecoder

1
2
3
module load transdecoder
TransDecoder.LongOrfs -m 10 -t trinity.fa
# you will need `longest_orfs.pep` for next steps

Searches

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
blastx -query trinity.fasta \
  -db uniprot_sprot.pep \
  -num_threads 8 \
  -max_target_seqs 1 \
  -outfmt 6 > blastx.outfmt6
  
blastp -query longest_orfs.pep \
  -db uniprot_sprot.pep \
  -num_threads 8 \
  -max_target_seqs 1 \
  -outfmt 6 > blastp.outfmt6
  
hmmscan --cpu 8 \
  --domtblout TrinotatePFAM.out \
  Pfam-A.hmm longest_orfs.pep > pfam.log
  
signalp -f short \
  -n signalp.out longest_orfs.pep
  
tmhmm --short < longest_orfs.pep > tmhmm.out

RnammerTranscriptome.pl --transcriptome ttrinity.fasta \
  --path_to_rnammer /usr/bin/software/rnammer_v1.2/rnammer

Loading results

Trinotate SQLite Database was updated with the new predictions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
wget "https://data.broadinstitute.org/Trinity/Trinotate_v3_RESOURCES/Trinotate_v3.sqlite.gz" -O Trinotate.sqlite.gz

gunzip Trinotate.sqlite.gz

get_Trinity_gene_to_trans_map.pl trinity.fasta >  Trinity.fasta.gene_trans_map

Trinotate Trinotate.sqlite init \
  --gene_trans_map Trinity.fasta.gene_trans_map \
  --transcript_fasta trinity.fasta \
  --transdecoder_pep longest_orfs.pep
  
Trinotate Trinotate.sqlite LOAD_swissprot_blastp blastp.outfmt6
Trinotate Trinotate.sqlite LOAD_swissprot_blastx blastx.outfmt6
Trinotate Trinotate.sqlite LOAD_pfam TrinotatePFAM.out
Trinotate Trinotate.sqlite LOAD_tmhmm tmhmm.out
Trinotate Trinotate.sqlite LOAD_signalp signalp.out

and finally, report was generated as follows:

Report:

1
Trinotate Trinotate.sqlite report > trinotate_annotation_report.xls

GO Annotation (optional)

Using the above report, you can assign GO for your sequences as follows:

1
2
3
4
${TRINOTATE_HOME}/util/extract_GO_assignments_from_Trinotate_xls.pl  \
  --Trinotate_xls trinotate_annotation_report.xls \
  -G --include_ancestral_terms \
  > go_annotations.txt

More information

Trinotate does not yet have its own paper but is recommended to cite the following: