Running commands in parallel with xargs
Because GNU Parallel is too complex for me, I use xargs
. The -P N
option will farm off each command to a pool of N threads, and since each command is likely to be some chain of pipes and file writes I also use the sh -c 'foo | bar > baz'
option. You will want to modify the -d " "
option to contain whatever delimiter you are using. Here I have spaces, but if your input is coming from a file you may need "\t"
or "\n"
.
For example, if you have a file f.bed
:
#chr start end
10 100 200
2 200 300
1 300 400
X 400 500
Y 500 600
15 600 700
7 100 200
1 200 300
3 300 400
5 400 500
And you want to split the file out by chromosome, sort by start, keep the header, and use 10 threads. Then you could:
echo -n $(seq 1 22) X Y \
| xargs -d ' ' -I{} -P 10 \
sh -c '(head -n1 f.bed; grep -w "^{}" f.bed | sort -n -k 2) > {}.f.bed'
This sends a list of chromosomes (echo -n $(seq 1 22) X Y xargs
) to xargs
. That list is then split by space (-d ' '
). Each element in the split list is used to create a new command where the “{}
” values are replaced by the element value. xargs
will then manage the execution of these commands, in this case running 10 of these commands at a time (-P 10
).
Structural Variation Graph (sv graph) Thoughts
The data structure consists of:
– a set of chromosomes
– each chromosome is represented by a name, and an ordered (doublely linked) list of nodes
– each node represents one tag of a pair
– each node has a double sided link to the next node in chromosome order (the node with the next largest offset), and a list of nodes in which the node is part of a pair. The links to the pairs are one-way.
The chromosome name is not part of the node struct. Each node does have a pointer back to it’s chromosome structure, and that chromosome struct contains the name. This prevents a potentially long chromosome name from being stored a ton of time (in each node). When we are reading the nodes from a file, we must pass in a char pointer so that the name of the chromosome can be set. We will use this char pointer to put the node into the proper chromosome structure.
Simple CUDA Program
Getting this code to work may require some environment variable changes:
- export LD_LIBRARY_PATH=/usr/local/cuda/lib/:$LD_LIBRARY_PATH
- export PATH=/usr/local/cuda/bin/:$PATH
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include <sys/time.h>
__global__ void vecMult_d(int *A, int *B, int N)
{
int i = blockIdx.x * blockDim.x + threadIdx.x ;
if(i<N) { B[i] = A[i]*2; }
}
void vecMult_h(int *A, int *B, int N)
{
for(int i=0;i<N;i++) { B[i] = A[i]*2; }
}
int main() {
int *a_h, *b_h; // pointers to host memory; a.k.a. CPU
int *a_d, *b_d; // pointers to device memory; a.k.a. GPU
//int blocksize=512, grid_size, n=32000;
int blocksize=512, n=1000000;
struct timeval t1_start,t1_end,t2_start,t2_end;
double time_d, time_h;
// allocate arrays on host
a_h = (int *)malloc(sizeof(int)*n);
b_h = (int *)malloc(sizeof(int)*n);
// allocate arrays on device
cudaMalloc((void **)&a_d,n*sizeof(int));
cudaMalloc((void **)&b_d,n*sizeof(int));
dim3 dimBlock( blocksize);
dim3 dimGrid( ceil(float(n)/float(dimBlock.x)));
for(int j=0;j<n;j++) a_h[j]=j;
// GPU
cudaMemcpy(a_d,a_h,n*sizeof(int),cudaMemcpyHostToDevice);
gettimeofday(&t1_start,0);
vecMult_d<<<dimGrid,dimBlock>>>(a_d,b_d,n);
cudaThreadSynchronize();
gettimeofday(&t1_end,0);
cudaMemcpy(b_h,b_d,n*sizeof(int),cudaMemcpyDeviceToHost);
// CPU
gettimeofday(&t2_start,0);
vecMult_h(a_h,b_h,n);
gettimeofday(&t2_end,0);
time_d = (t1_end.tv_sec-t1_start.tv_sec)*1000000 + t1_end.tv_usec – t1_start.tv_usec;
time_h = (t2_end.tv_sec-t2_start.tv_sec)*1000000 + t2_end.tv_usec – t2_start.tv_usec;
printf(“%d %lf %lf\n”,n,time_d,time_h);free(a_h);
free(b_h);
cudaFree(a_d);
cudaFree(b_d);
return(0);
}
SOURCE: https://visualization.hpc.mil/wiki/Simple_CUDA_Program
leave a comment