Ryan's Blog

Running commands in parallel with xargs

Posted in programming by ryanlayer on February 18, 2016

Because GNU Parallel is too complex for me, I use xargs.  The  -P N option will farm off each command to a pool of N threads, and since each command is likely to be some chain of pipes and file writes I also use the sh -c 'foo | bar > baz' option.  You will want to modify the -d " " option to contain whatever delimiter you are using.  Here I have spaces, but if your input is coming from a file you may need "\t" or "\n".

For example, if you have a file f.bed:

#chr start end
10   100   200
2    200   300
1    300   400
X    400   500
Y    500   600
15   600   700
7    100   200
1    200   300
3    300   400
5    400   500

And you want to split the file out by chromosome, sort by start, keep the header, and use 10 threads. Then you could:

echo -n $(seq 1 22) X Y \
| xargs -d ' ' -I{} -P 10 \
sh -c '(head -n1 f.bed; grep -w "^{}" f.bed | sort -n -k 2) > {}.f.bed'

This sends a list of chromosomes (echo -n $(seq 1 22) X Y xargs) to xargs. That list is then split by space (-d ' '). Each element in the split list is used to create a new command where the “{}” values are replaced by the element value. xargs will then manage the execution of these commands, in this case running 10 of these commands at a time (-P 10).

Tagged with: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: