The Linux Page

Parallelize a process in bash

Run processes in parallel in a bash script.

The other day, I worked on creating a video which means extract over 4,000 frames and then processing those frames. One of the processes is to overlay one image over another (using convert from ImageMagick). This process is very slow because each time the convert tool reloads the background image and the movie frame to overlay... so I thought I should run these commands in parallel. After all, I have 64 CPUs, let's use them!

In bash, there is a special option on the wait command: -n. This option means: wait for any one of the currently running jobs to finish. This is quite practical since we can start the next job ASAP after a previous one is done. At first, I start 30 jobs in a row. After a while, it looks like one job ends, the next start, some time passes, and it repeats.

The basics to parallelize that loop was like so:

f=1
while test $f -le $count
do
    if test $f -gt 30
    then
        wait -n
    fi
    convert ... $f ... &
done
wait

I'm not showing you the convert command because that's not the point here. But the loop is very close to what I have in my own script.

First I initialize a counter variable named f.

As long as f is less than count, run convert.

If we already started convert 30 times, then wait for one of the processes to stop.

Start convert in the background.

Once the loop exits, make sure to wait for the last few convert command to be done before moving forward with the next step.

Two very important bits:

  1. The convert command is started with the '&' at the end of the line, meaning it gets started in the background; if your command is more than one simple line, use a bash function
  2. The wait after the loop is very important to make sure that all the processing is done before you move on to the next steps in your script (although in your case you may not need it or maybe you can again use wait -n in your next loop)

In regard to using a function, it would look like this:

commands() {
    your
    multiple
    commands
    go
    here
}

...
    # instead of convert use:
    commands $f &
...

What happens in this case?

In theory, when calling a function, bash actually creates a sub-shell and executes the commands present in the function from that sub-shell. The &, therefore, applies to the one sub-shell. The wait -n will therefore wait on that sub-shell and not any specific commands within the function. How all of this is really implemented is specific to each shell.