Quick Tip: throttling parallel batch processes in Terminal

[Tweet : nvALT]

Parallel Planes ClipartItQuick tips are random posts regarding something I discovered on my way to something bigger. They usually get longer than “quick” would imply, for which I refuse to apologize.

It starts out with mdfind and all of the creative scripting you can do with it. You start finding batches of files with something in common and you do things with them or to them. It sounds genocidal; it’s not. It’s very productive after the initial script setup.

Take, for example, a little script I run to add thumbnails to weblocs I have laying around. It looks something like this:

mdfind -onlyin ~/Dropbox/Sync/Bookmark/ \
'(! ( ((kMDItemOMUserTags == "*donotthumbnail*"cd) \
|| (kOMUserTags == "*donotthumbnail*"cd) ) ) \
&& (kMDItemFSHasCustomIcon = "0") \
&& (kMDItemContentType == "*webloc*"cd))' | while read file; \
do /usr/local/bin/setWeblocThumb "$file" ; done

It’s a one-liner, you’d want to reassemble it to run it (remove the backslashes at the line ends and join them all together), but it uses mdfind to search my shared bookmarks folder for recent items which don’t already have a custom thumbnail, passes them to setWeblocThumb and processes them… one at a time. I know my machine and my bandwidth can handle more than that, but if the list is 50+ long, that’s a lot of processes doing some relatively intensive labor. It would grind my machine to a halt. Yes, I tried it just to be sure.

So I needed a way to throttle the number of simultaneous processes, and I know that someone out there must have long beat me to the solution. There it was: parallel. It’s a script you can download and make executable in your path, and then run it with a few parameters and a batch of files or arguments. It will keep your defined number of processes going until the job is done, but won’t let things get out of hand. You can add nice (man page) in each process if you need more cpu control over the process.

My new command looks like:

mdfind -onlyin /Users/ttscoff/Dropbox/Sync/Bookmark/ \
'(! ( ((kMDItemOMUserTags == "*donotthumbnail*"cd) \
|| (kOMUserTags == "*donotthumbnail*"cd) ) ) \
&& (kMDItemFSHasCustomIcon = "0") \
&& (kMDItemContentType == "*webloc*"cd))'| parallel -j 8 -r "/usr/local/bin/setWeblocThumb"

Seriously, if you’re doing anything in batch you should check Parallel out, or show me an even better one. Parallel made my morning, and by keeping CPU from maxing I actually got through some batches even faster. I’m sure there are other elegant ways of handling this. Let ‘em rip.