Ryan's Blog

Running commands in parallel with xargs

Posted in programming by ryanlayer on February 18, 2016

Because GNU Parallel is too complex for me, I use xargs.  The  -P N option will farm off each command to a pool of N threads, and since each command is likely to be some chain of pipes and file writes I also use the sh -c 'foo | bar > baz' option.  You will want to modify the -d " " option to contain whatever delimiter you are using.  Here I have spaces, but if your input is coming from a file you may need "\t" or "\n".

For example, if you have a file f.bed:


#chr start end
10   100   200
2    200   300
1    300   400
X    400   500
Y    500   600
15   600   700
7    100   200
1    200   300
3    300   400
5    400   500

And you want to split the file out by chromosome, sort by start, keep the header, and use 10 threads. Then you could:


echo -n $(seq 1 22) X Y \
| xargs -d ' ' -I{} -P 10 \
sh -c '(head -n1 f.bed; grep -w "^{}" f.bed | sort -n -k 2) > {}.f.bed'

This sends a list of chromosomes (echo -n $(seq 1 22) X Y xargs) to xargs. That list is then split by space (-d ' '). Each element in the split list is used to create a new command where the “{}” values are replaced by the element value. xargs will then manage the execution of these commands, in this case running 10 of these commands at a time (-P 10).

Tagged with: ,

BASH: Compare the dates of two files

Posted in programming by ryanlayer on May 27, 2014


if [ "$file1" -ot "$file2" ]
then
  cp -f "$file2" "$file1"
fi

via stackoverflow

Tagged with: