SRA toolkit has been configured to connect to NCBI SRA and download via FTP. The simple command to fetch a SRA file and to split it to forward/reverse reads, use this command:

module load sratoolkit
fastq-dump --split-files --origfmt --gzip SRR1234567

You will see 2 files SRR1234567_1.fastq and SRR1234567_2.fastq downloaded directly from NCBI. If the file size is more than 1Gb, submit this within a PBS script.

If this doesn’t work for you (or too slow, because of FTP) then, you can try aspera which will be fast (very useful if you have large number of files to download)

ascp -i $KEY/asperaweb_id_dsa.openssh -k 1 -QT -l 200m \
anonftp@ftp-trace.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR390/SRR390728/SRR390728.sra \ # url for the file
./ # save location
# the above command should be in single line

If you have large number of files to download (usually organized as project with a SRR project number), you can save all the IDs in a file and loop through the lines.

Finally, if you want to download using wget (which will be very slow), you can use this template:

wget http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=dload&run_list=SRRXXXXXX&format=fastq

If you have a previously downloaded FASTQ file, without using the --origfmt, option, then the first field (before the instrument name) is SRA id number. Although. most of the programs won't mind, some of them might throw an error example example [Khmer] (http://khmer.readthedocs.org/en/v1.0).

1
2
3
4
5
6
7
8
9
10
11
12
13
head -n 12 SRR447882_1.fastq
@SRR447882.1.1 HWI-EAS313_0001:7:1:6:844 length=84
ATTGATCATCGACCAGAGNCTCATACACCTCACCCCACATATGTTTCCTTGCCATAGATCACATTCTTGNNNNNNNGGTGGANA
+SRR447882.1.1 HWI-EAS313_0001:7:1:6:844 length=84
BBBBBB;BB?;>7;?<?B#AA3@CBAA?@BAA@)=6ABBBBB?ACA;0A=257?A7+;;&########################
@SRR447882.2.1 HWI-EAS313_0001:7:1:6:730 length=84
AGTTGATTGTGATATAGGNGTCTATCGACATTGATGCATAGGTCCTCTATTAAACTTGTTTTGTGATGTNNNNNNNTTTTTTNA
+SRR447882.2.1 HWI-EAS313_0001:7:1:6:730 length=84
A?@B:@CA:=?BCBC:2C#7>BACB??@4@B@<=>;'>@>3:86>=6@=B@B<;)@@###########################
@SRR447882.3.1 HWI-EAS313_0001:7:1:6:1343 length=84
CATCAATGCAAGGATTGTNCCATTGGTAACAATTCCACTCCTAACTTGTCAATTGATTTTCATATAACTNNNNNNNCCAAAANT
+SRR447882.3.1 HWI-EAS313_0001:7:1:6:1343 length=84
BCB@BBC+5BCA>BABBA#@4BCCA>?CBBB4CB(*ABB?ABBAACCB8ABBB?(<<B?:########################

To remove this using the below script:

for f in SRR447882_[12]_paired.fq; do\
awk '$1 ~ /@SRR447882*/ {$1="@"}{print}' $f | sed 's/^@ /@/g' | \
sed 's/^+SRR447882.\+/+/g' > $f.cleaned; \
done

Now the sequences should look like these:

1
2
3
4
5
6
7
8
9
10
11
12
@HWI-EAS313_0001:7:1:6:844 length=84
ATTGATCATCGACCAGAGNCTCATACACCTCACCCCACATATGTTTCCTTGCCATAGATCACATTCTTGNNNNNNNGGTGGANA
+
BBBBBB;BB?;>7;?<?B#AA3@CBAA?@BAA@)=6ABBBBB?ACA;0A=257?A7+;;&########################
@HWI-EAS313_0001:7:1:6:730 length=84
AGTTGATTGTGATATAGGNGTCTATCGACATTGATGCATAGGTCCTCTATTAAACTTGTTTTGTGATGTNNNNNNNTTTTTTNA
+
A?@B:@CA:=?BCBC:2C#7>BACB??@4@B@<=>;'>@>3:86>=6@=B@B<;)@@###########################
@HWI-EAS313_0001:7:1:6:1343 length=84
CATCAATGCAAGGATTGTNCCATTGGTAACAATTCCACTCCTAACTTGTCAATTGATTTTCATATAACTNNNNNNNCCAAAANT
+
BCB@BBC+5BCA>BABBA#@4BCCA>?CBBB4CB(*ABB?ABBAACCB8ABBB?(<<B?:########################

Instead of this, you can also redonwload the original SRA file using --origfmt option, if it saves time.

Download all SRR files related to a project

If you have large number of SRR files to donwload, see if they belong to a specific project. Eg., project [[http://www.ncbi.nlm.nih.gov/Traces/sra/?study=SRP011907 SRP011907 ]] has 283 SRR files. You can use aspera to download all 283 files at once.

Here are the steps:

  • Get the FTP link: go to [[http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=download_reads SRA download reads]] and look for the project id (eg., SRP011907, is located under reads --> ByStudy --> sra --> srp --> SRP011 --> SRP011907 . Once you reach the link, clicking the “save icon” (next to the total size) will give the ftp link.
  • Now, download it using Aspera as expalined before.
module load aspera/3.3.3.81344
ascp -i $KEY/asperaweb_id_dsa.openssh -k 1 -QT -l 200m \
  anonftp@ftp-trace.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByStudy/sra/SRP/SRP011/SRP011907 \
  ./Destination_dir