SRA toolkit has been configured to connect to NCBI SRA and download via FTP. The simple command to fetch a SRA file and to split it to forward/reverse reads, use this command:
module load sratoolkit fastq-dump --split-files --origfmt --gzip SRR1234567
You will see 2 files
If this doesn’t work for you (or too slow, because of FTP) then, you can try
ascp -i $KEY/asperaweb_id_dsa.openssh -k 1 -QT -l 200m \ anonftp@ftp-trace.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR390/SRR390728/SRR390728.sra \ # url for the file ./ # save location # the above command should be in single line
If you have large number of files to download (usually organized as
Finally, if you want to download using
wget http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=dload&run_list=SRRXXXXXX&format=fastq
If you have a previously downloaded FASTQ file, without using the
1
2
3
4
5
6
7
8
9
10
11
12
13
head -n 12 SRR447882_1.fastq
@SRR447882.1.1 HWI-EAS313_0001:7:1:6:844 length=84
ATTGATCATCGACCAGAGNCTCATACACCTCACCCCACATATGTTTCCTTGCCATAGATCACATTCTTGNNNNNNNGGTGGANA
+SRR447882.1.1 HWI-EAS313_0001:7:1:6:844 length=84
BBBBBB;BB?;>7;?<?B#AA3@CBAA?@BAA@)=6ABBBBB?ACA;0A=257?A7+;;&########################
@SRR447882.2.1 HWI-EAS313_0001:7:1:6:730 length=84
AGTTGATTGTGATATAGGNGTCTATCGACATTGATGCATAGGTCCTCTATTAAACTTGTTTTGTGATGTNNNNNNNTTTTTTNA
+SRR447882.2.1 HWI-EAS313_0001:7:1:6:730 length=84
A?@B:@CA:=?BCBC:2C#7>BACB??@4@B@<=>;'>@>3:86>=6@=B@B<;)@@###########################
@SRR447882.3.1 HWI-EAS313_0001:7:1:6:1343 length=84
CATCAATGCAAGGATTGTNCCATTGGTAACAATTCCACTCCTAACTTGTCAATTGATTTTCATATAACTNNNNNNNCCAAAANT
+SRR447882.3.1 HWI-EAS313_0001:7:1:6:1343 length=84
BCB@BBC+5BCA>BABBA#@4BCCA>?CBBB4CB(*ABB?ABBAACCB8ABBB?(<<B?:########################
To remove this using the below script:
for f in SRR447882_[12]_paired.fq; do\ awk '$1 ~ /@SRR447882*/ {$1="@"}{print}' $f | sed 's/^@ /@/g' | \ sed 's/^+SRR447882.\+/+/g' > $f.cleaned; \ done
Now the sequences should look like these:
1
2
3
4
5
6
7
8
9
10
11
12
@HWI-EAS313_0001:7:1:6:844 length=84
ATTGATCATCGACCAGAGNCTCATACACCTCACCCCACATATGTTTCCTTGCCATAGATCACATTCTTGNNNNNNNGGTGGANA
+
BBBBBB;BB?;>7;?<?B#AA3@CBAA?@BAA@)=6ABBBBB?ACA;0A=257?A7+;;&########################
@HWI-EAS313_0001:7:1:6:730 length=84
AGTTGATTGTGATATAGGNGTCTATCGACATTGATGCATAGGTCCTCTATTAAACTTGTTTTGTGATGTNNNNNNNTTTTTTNA
+
A?@B:@CA:=?BCBC:2C#7>BACB??@4@B@<=>;'>@>3:86>=6@=B@B<;)@@###########################
@HWI-EAS313_0001:7:1:6:1343 length=84
CATCAATGCAAGGATTGTNCCATTGGTAACAATTCCACTCCTAACTTGTCAATTGATTTTCATATAACTNNNNNNNCCAAAANT
+
BCB@BBC+5BCA>BABBA#@4BCCA>?CBBB4CB(*ABB?ABBAACCB8ABBB?(<<B?:########################
Instead of this, you can also redonwload the original SRA file using
Download all SRR files related to a project
If you have large number of SRR files to donwload, see if they belong to a specific project. Eg., project [[http://www.ncbi.nlm.nih.gov/Traces/sra/?study=SRP011907 | SRP011907 ]] has 283 SRR files. You can use aspera to download all 283 files at once. |
Here are the steps:
-
Get the FTP link: go to [[http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=download_reads SRA download reads]] and look for the project id (eg., SRP011907, is located under reads --> ByStudy --> sra --> srp --> SRP011 --> SRP011907 . Once you reach the link, clicking the “save icon” (next to the total size) will give the ftp link. - Now, download it using Aspera as expalined before.
module load aspera/3.3.3.81344 ascp -i $KEY/asperaweb_id_dsa.openssh -k 1 -QT -l 200m \ anonftp@ftp-trace.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByStudy/sra/SRP/SRP011/SRP011907 \ ./Destination_dir