Predict miRNA and mRNA interactions with miranda software

Install

1
2
3
4
5
/work/gif/remkv6/USDA/06_miranda
ml miniconda3
conda create -n miranda
source activate miranda
conda install -c bioconda miranda

Download resources – miRNAs and targets

Downloaded all ascaris suum miRNAs from here

1
2
https://www.mirbase.org/summary.shtml?org=asu
vi AscarissuumMirnasMature.fasta

Since miRNAs primarily affect expression through binding of the 3’ UTR of transcripts, we need to isolate those sequences Downloaded ascaris suum gff3 and genome from here

1
2
3
4
wget ftp://ftp.ebi.ac.uk/pub/databases/wormbase/parasite/releases/WBPS16/species/ascaris_suum/PRJNA62057/ascaris_suum.PRJNA62057.WBPS16.genomic.fa.gz

wget ftp://ftp.ebi.ac.uk/pub/databases/wormbase/parasite/releases/WBPS16/species/ascaris_suum/PRJNA62057/ascaris_suum.PRJNA
62057.WBPS16.annotations.gff3.gz

Extract the UTR sequences

1
2
3
4
5
awk '$3=="three_prime_UTR"' ascaris_suum.PRJNA62057.WBPS16.annotations.gff3 >UTRs.gff3

#bedtools doesn't use the Parent= name, so make it a tab output, paste with gff3 names, and convert it to fasta.  Make sure you set -s to get the strand information correct.  I also cut UTRs that were 10bp or shorter.  
ml bedtools2
bedtools getfasta -fi ascaris_suum.PRJNA62057.WBPS16.genomic.fa -bed UTRs.gff3 -s -tab |paste <(cut -f 9 UTRs.gff3|sed 's/Parent=transcript://g' ) - |awk 'length($3)>10' |cut -f 1,3 |sed 's/\t/#/1' |sed 's/^/>/g' |tr "#" "\n " >NamedUTRs.fasta```

Run miranda in a loop, otherwise run times are much longer

1
2
3
fasta-splitter.pl --n-parts 36 AscarissuumMirnasMature.fasta
for f in *part*; do echo "miranda "$f" NamedUTRs.fasta >"${f%.*}".out &";done >runMiranda.sh
sh runMiranda.sh &

Filter Miranda Output

With my ~200 miRNAs and my 75k UTRs, I got ~1.5 million putative interactions. I filter based on the highest scores (sort), and arbitrarily cut off at highest scoring 50k interactions.

1
cat *out |grep ">>" |sort -k5,5nr |awk 'NR<50000' |sed 's/>>//g' |cat <(echo "Seq1,Seq2,Tot Score,Tot Energy,Max Score,Max Energy,Strand,Len1,Len2,Positions" |tr "," "\t") - >MirandaOutput.tab

Interpretation of Miranda output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
       One or more miRNA sequences from file1 are scanned against
       all sequences in file2  and  potential  target  sites  are
       reported.  Potential  target  sites are identified using a
       two-step strategy.   First  a  dynamic  programming  local
       alignment  is carried out between the query miRNA sequence
       and  the  reference  sequence.  This  alignment  procedure
       scores  based  on  sequence  complementarity  and  not  on
       sequence identity.  In other words we look for A:U and G:C
       matches  instead of A:A, G:G, etc.  The G:U wobble bair is
       also permitted, but generally scores less  than  the  more
       optimal matches. Here is an example alignment:

           Query:  3' GTCGAAAGTTTTACTAGAGTG 5' (eg. miRNA)
                      :||:||||| ||||||||||
           Ref:    5' TAGTTTTCACAATGATCTCGG 3' (eg. 3'UTR)

       The  second  phase  of  the  algorithm  takes high-scoring
       alignments (Those above a score threshold, defined by -sc)
       detected from phase 1 and estimates the thermodynamic sta�
       bility of RNA duplexes based on  these  alignments.   This
       second  phase of the method utilizes folding routines from
       the RNAlib library, which is part of the ViennaRNA package
       written  by Ivo Hofacker. At this stage we generate a con�
       strained fictional single-stranded  RNA  composed  of  the
       query  sequence,  a  linker  and  the  reference  sequence
       (reversed). This structure then folded  using  RNAlib  and
       Optionally  some  rudimentary statistics about each target
       site can be generated by performing a number of alignments
       using shuffled reference sequences (see -shuffled option).
       A distribution is built from these  data  and  statistical
       parameters from this distribution are used to produce a Z-
       Score for a detected target site.

Final output of Miranda Ascaris suum miRNAS vs Ascaris suum 3’ UTR sequences

Download Miranda Output (tab file)

Back to the Assembly and Annotation Index page