Downloading files using wget

Wget is short for World Wide Web get and is used on the command line to download a file from a website or webserver.

Learning Objective

Upon completion of this section the learner will be able to:

  • Utilize wget to download a files
  • Download multiple files using regular expressions
  • Download an entire website

Here is a generic example of how to use wget to download a file.

1
wget http://link.edu/filename

A are a couple of specific Examples

  • Photo of a kitten in Rizal Park
  • Photo of Arabidopsis
1
2
wget https://upload.wikimedia.org/wikipedia/commons/0/06/Kitten_in_Rizal_Park%2C_Manila.jpg
wget https://upload.wikimedia.org/wikipedia/commons/6/6f/Arabidopsis_thaliana.jpg

Sometimes you may find a need to download an entire directory of files and downloading directory using wget is not straightforward.

wget for multiple files and directories

There are 2 options. You can either specify a regular expression for a file or put a regular expression in the URL itself. First option is useful, when there are large number of files in a directory, but you want to get only specific format of files (eg., fasta)

1
wget -r --no-parent -A 'bar.*.tar.gz' http://url/dir/

The second option is useful if you have numerous files that have the same name, but are in different directory

1
wget -r --no-parent accept-regex=/pub/current_fasta/*/dna/*dna.toplevel.fa.gz ftp://ftp.ensembl.org

The files won’t be overwritten (as they all have same names), instead they are saved as-is maintaining the directory structure.

Some times, if you have a series of files to download (and are numbered accordingly), you can use UNIX brace expansion

1
2
3
4
5
6
7
wget http://localhost/file_{1..5}.txt
# this will download 
# |_ file_1.txt
# |_ file_2.txt
# |_ file_3.txt
# |_ file_4.txt
# |_ file_5.txt

To archive the entire website (yes, every single file of that domain), you can use the mirror option.

1
wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL

Other options to consider

Option What it does Use case
-limit-rate=20k Limits Speed to 20KiB/s Limit the data rate to avoid impacting other users’ accessing the server.
-spider Check if File Exists For if you don’t want to save a file but just want to know if it still exists.
-w Wait Seconds After this flag, add a number of seconds to wait between each request - again, to not overload a server.
-user= Set Username wget will attempt to login using the username provided.
-password= Use Password wget will use this password with your username to authenticate.
-ftp-user= or -ftp-password= FTP Credentials Just like the previous settings, wget can login to an FTP server to retrieve files.

Citations


Next Previous Table of contents