Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4535

Parallel Downloading with wget

$
0
0

1. Overview

In this tutorial, we'll use a simple tool wget to download multiple files in parallel.

The commands used in this article were tested in bash, but should work in other POSIX compliant shells as well.

2. Downloading Files with wget

Downloading files with wget is fairly straightforward:

wget https://my.website.com/archive.zip

Unfortunately, we can only download one file at a time.

We have to resort to shell scripting to download multiple files in a single command:

#!/bin/bash
while read file; do
    wget ${file}
done < files.txt

Here, files.txt contains all files that have to be downloaded, each on its own line:

https://my.website.com/archive-1.zip
https://my.website.com/archive-2.zip
https://my.website.com/archive-3.zip

The problem with this approach, however, is that the files are downloaded sequentially. We might speed things up by downloading files in parallel.

3. Parallelizing Downloads with wget

There are different ways in which we can make wget download files in parallel.

3.1. The Bash Approach

A simple and somewhat naive approach would be to send the wget process to the background using the &-operator:

#!/bin/bash
while read file; do
    wget ${file} &
done < files.txt

Each call to wget is forked to the background and runs asynchronously in its own separate sub-shell.

Although we now download the files in parallel, this approach is not without its drawbacks. For example, there is no feedback of completed or failed downloads. Also, we can't control how many processes will be executed at once.

3.2. Let wget Fork Itself

We can do a little better and let wget fork itself to the background by passing -b as a parameter:

#!/bin/bash
while read file; do
    wget ${file} -b
done < files.txt

Just as with the &-operator, each call is forked to the background and run asynchronously. What is different though, is that the -b parameter additionally makes for us a log file for each download. We can grep these log files to check that no errors occurred.

3.3. Using xargs

The most sophisticated and clean solution to our problem is using xargs. The xargs command takes a list of arguments and passes these to a utility of choice with the possibility to run multiple processes in parallel.

Above all, it gives us control over the maximum number of processes that will run at once at any given time.

For example, we can call wget for each line in files.txt with a maximum of two processes in parallel:

#!/bin/bash
cat files.txt | xargs -n 1 -P 2 wget -q

We also set wget to be quiet (-q). Without that, xargs would redirect the output of all processes to stdout, which would have cluttered our terminal in no time. Instead, we can rely on xargs‘s return code. It'll exit with a value of 0 if no error has occurred and with a value of 1 otherwise.

4. Conclusion

As we have seen, there are different ways in which we can download multiple files in parallel using wget. The xargs command provides the cleanest solution to this problem. It's very useful in scripts because it offers the right amount of control and has a clean exit code.


Viewing all articles
Browse latest Browse all 4535

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>