There are some unix utilities which give it a bad name - prime culprits are sed(1) - just read the man(1) page and you'll understand why. I think sort(1) is pretty abstruse too - I've been using it to manipulate log files which monitor market data info being pushed in and out of wombat.
The trouble with log files is that they usually are full of everything - which is fine if you have the time or patience to extract the information you require, however, now that we're shoving hundreds of trades through the algo system, this generates hundreds of thousands of log messages, I can no longer use vi(1) - the unix editor, to view them, as it runs out of space for the temp file. This mans resorting to all sorts of sed/awk/grep nonsense in order to extract the info we need. The criteria for this embryonic scriptette was to order entries according to a suffix alphabetically, then order numerically ascending within that suffix. Here's a script which does the job. The input data looks like this:
10:23:34.323 : 5 3,
10:23:34.541 : 6 3,
All 598344 lines of it. The first line sorts on the field "LT" and "SS" above and gives us a list of subsets that we need to process:
FIELDS="`sort -u -t '.' -k 5,5.2 MarketDataServer0.log | sed 's/.*\.\(.*\) {.*/\1/'`"
Now we create a file callled out
> out
for CODE in $FIELDS
do
sed -n '/.*\.'"$CODE"' {.*/p' MarketDataServer0.log | sort -n -t '>' -k 2,2 >> out
done
Then we cut out the entries for each "code" then pass them to our sort command which uses the > as a field delimiter and sorts numerically on the second field - ugly but necessary. No error handling or parameter passing yet - but this saves a whole lot of pain. Looks painful? sure but it's the sort of thing you just can't do on windows (well without Cygwin anyway)