16 May 2007

This is something I wrote over at DocForge, but I’m keeping it here for posterity. You never know what those crazy wiki-ers might do with it over there ;-)

I’ve got a directory full of eBooks in the godawful Microsoft .lit format. I had marked with the Finder labels the ones I’ve already read, and wanted to convert all the ones I hadn’t read yet into a readable format.

After using the Spotlight UNIX tools and applying liberal amounts of command-line trickery, the end result was a rather beautiful pipeline, if I do say so myself. First, I used the mdfind command to filter all the Items that had the red label. We’re going to query the kMDItemFSLabel properties; the red label has a value of 6 (I found this out by using mdls on a file with the desired label).

Since I only want to search a particular directory, I use the -onlyin switch to limit the query:

mdfind -onlyin /Users/phil/Desktop/books/ "kMDItemFSLabel != 6"
/Users/phil/Desktop/books/one.lit
/Users/phil/Desktop/books/two.lit
/Users/phil/Desktop/books/other.rtf
/Users/phil/Desktop/books/three.lit
/Users/phil/Desktop/books/something.html
...
...

Some of those aren’t .lit files, so I’ll just use grep:

mdfind -onlyin /Users/phil/Desktop/books/ "kMDItemFSLabel != 6" | grep ".lit"
/Users/phil/Desktop/books/one.lit
/Users/phil/Desktop/books/two.lit
/Users/phil/Desktop/books/three.lit
...
...

I could have limited the Spotlight query further, but what fun would that be?

Now, ultimately I’m going to use this output with xargs, but because of limitations imposed by the .lit conversion app, I need to get the basename of these files. For this sed will do the trick:

mdfind -onlyin /Users/phil/Desktop/books/ "kMDItemFSLabel != 6" | grep ".lit" | \
    sed 's/\/Users\/phil\/Desktop\/books\///'
one.lit
two.lit
three.lit
...
...

Finally, I pass this onto xargs, then the unfortunately named ConvertLIT tool:

mdfind -onlyin /Users/phil/Desktop/books/ "kMDItemFSLabel != 6" | grep ".lit" | \
    sed 's/\/Users\/phil\/Desktop\/books\///'  | xargs -I \{\} clit \{\} oebps/\{\}/
$ clit one.lit oebps/one.lit/
$ clit two.lit oebps/two.lit/
$ clit three.lit oebps/three.lit/
$ ...
$ ...

I’m replicating the syntax of find’s -exec switch with xargs-I switch. This will replace all occurrences of the {} with the filename from standard input.