Personal Adrastos on 15 Jun 2007 08:46 pm
I know Regex.
So far while working on the Azureus codebase, this is the most fun thing I’ve done: Learning a bit of regular expression syntax in order to use Sed. I was trying to get a diff of the classes between BitTyrant and the Azureus 2.5.0.0 source on which it is based. Here’s what I did:
1. Extract the source .tar.gz files into a directory.
2. The BitTyrant codebase includes a lot of “CVS” folders which will show up in the diff, and I don’t want them or need them. Find cannot remove directories with ‘find -delete’, and rm can’t do search. So here’s what I did:
adrian@kashra:~/Projects/BitTyrant/diff$ find ./ -name "CVS" > cvsfolders.txt
adrian@kashra:~/Projects/BitTyrant/diff$ sed -i 's/^/rm -rf /' cvsfolders.txt
adrian@kashra:~/Projects/BitTyrant/diff$ bash ./cvsfolders.txt
By the way, sed’s ‘-i’ option tells it to edit the file in place. ’s/^/rm -rf /’ tells it to insert at the very beginning of each line “rm -rf “.
3. Then we run our diff:
diff -iEbwBqrN Azureus BitTyrant > difffiles.txt
quick explanation of the options at the beginning:
iEbwB: All for ignoring whitespaces in different context. I’m pretty sure not all of this is necessary, but I did all of them.
q: Output filenames only.
r: Recurse on directories.
N: Treat absent files as empty (meaning display that difference).
4. Now each line in the file difffiles.txt looks like this:
Files Azureus/com/aelitis/azureus/core/dht/db/impl/DHTDBImpl.java and BitTyrant/com/aelitis/azureus/core/dht/db/impl/DHTDBImpl.java differ
so for our eyeballs’ sake, we run the following:
adrian@kashra:~/Projects/BitTyrant/diff$ sed -i 's/Files\(.*\) and BitTyrant\///' difffiles.txt
adrian@kashra:~/Projects/BitTyrant/diff$ sed -i 's/ differ//' difffiles.txt
adrian@kashra:~/Projects/BitTyrant/diff$ sed -i 's/\//./g' difffiles.txt
adrian@kashra:~/Projects/BitTyrant/diff$ sed -i '/.java/!d' difffiles.txt
adrian@kashra:~/Projects/BitTyrant/diff$ sed -i 's/.java//' difffiles.txt
These are, in order:
- Remove everything up to the first subdirectory under BitTyrant (in this case “com” or “org”)
- Remove that last word ” differ”
- Replace all “/” with “.” so that it’s similar to eclipse names
- Selectively delete any line not containing “.java”
- Remove .java from the remaining lines.
Now each line will look like:
com.aelitis.azureus.core.dht.db.impl.DHTDBImpl
Yay! And it’s only… 46 files! Remove the UI business and i now only have to look at 31 files!
I realize this is not the most complicated use of Regex, but given that I haven’t looked at its syntax since Algorithms (and I barely looked at it then), I feel pretty good about myself.
