Predict miRNA and mRNA interactions with miranda software
Install
1
2
3
4
5
| /work/gif/remkv6/USDA/06_miranda
ml miniconda3
conda create -n miranda
source activate miranda
conda install -c bioconda miranda
|
Download resources – miRNAs and targets
Downloaded all ascaris suum miRNAs from here
1
2
| https://www.mirbase.org/summary.shtml?org=asu
vi AscarissuumMirnasMature.fasta
|
Since miRNAs primarily affect expression through binding of the 3’ UTR of transcripts, we need to isolate those sequences
Downloaded ascaris suum gff3 and genome from here
1
2
3
4
| wget ftp://ftp.ebi.ac.uk/pub/databases/wormbase/parasite/releases/WBPS16/species/ascaris_suum/PRJNA62057/ascaris_suum.PRJNA62057.WBPS16.genomic.fa.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/wormbase/parasite/releases/WBPS16/species/ascaris_suum/PRJNA62057/ascaris_suum.PRJNA
62057.WBPS16.annotations.gff3.gz
|
1
2
3
4
5
| awk '$3=="three_prime_UTR"' ascaris_suum.PRJNA62057.WBPS16.annotations.gff3 >UTRs.gff3
#bedtools doesn't use the Parent= name, so make it a tab output, paste with gff3 names, and convert it to fasta. Make sure you set -s to get the strand information correct. I also cut UTRs that were 10bp or shorter.
ml bedtools2
bedtools getfasta -fi ascaris_suum.PRJNA62057.WBPS16.genomic.fa -bed UTRs.gff3 -s -tab |paste <(cut -f 9 UTRs.gff3|sed 's/Parent=transcript://g' ) - |awk 'length($3)>10' |cut -f 1,3 |sed 's/\t/#/1' |sed 's/^/>/g' |tr "#" "\n " >NamedUTRs.fasta```
|
Run miranda in a loop, otherwise run times are much longer
1
2
3
| fasta-splitter.pl --n-parts 36 AscarissuumMirnasMature.fasta
for f in *part*; do echo "miranda "$f" NamedUTRs.fasta >"${f%.*}".out &";done >runMiranda.sh
sh runMiranda.sh &
|
Filter Miranda Output
With my ~200 miRNAs and my 75k UTRs, I got ~1.5 million putative interactions. I filter based on the highest scores (sort), and arbitrarily cut off at highest scoring 50k interactions.
1
| cat *out |grep ">>" |sort -k5,5nr |awk 'NR<50000' |sed 's/>>//g' |cat <(echo "Seq1,Seq2,Tot Score,Tot Energy,Max Score,Max Energy,Strand,Len1,Len2,Positions" |tr "," "\t") - >MirandaOutput.tab
|
Interpretation of Miranda output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
One or more miRNA sequences from file1 are scanned against
all sequences in file2 and potential target sites are
reported. Potential target sites are identified using a
two-step strategy. First a dynamic programming local
alignment is carried out between the query miRNA sequence
and the reference sequence. This alignment procedure
scores based on sequence complementarity and not on
sequence identity. In other words we look for A:U and G:C
matches instead of A:A, G:G, etc. The G:U wobble bair is
also permitted, but generally scores less than the more
optimal matches. Here is an example alignment:
Query: 3' GTCGAAAGTTTTACTAGAGTG 5' (eg. miRNA)
:||:||||| ||||||||||
Ref: 5' TAGTTTTCACAATGATCTCGG 3' (eg. 3'UTR)
The second phase of the algorithm takes high-scoring
alignments (Those above a score threshold, defined by -sc)
detected from phase 1 and estimates the thermodynamic sta�
bility of RNA duplexes based on these alignments. This
second phase of the method utilizes folding routines from
the RNAlib library, which is part of the ViennaRNA package
written by Ivo Hofacker. At this stage we generate a con�
strained fictional single-stranded RNA composed of the
query sequence, a linker and the reference sequence
(reversed). This structure then folded using RNAlib and
Optionally some rudimentary statistics about each target
site can be generated by performing a number of alignments
using shuffled reference sequences (see -shuffled option).
A distribution is built from these data and statistical
parameters from this distribution are used to produce a Z-
Score for a detected target site.
|
Final output of Miranda Ascaris suum miRNAS vs Ascaris suum 3’ UTR sequences
Download Miranda Output (tab file)
Back to the Assembly and Annotation Index page