The Inferno



The Inferno :: It is a fallacy to state that something exists just because it can’t be proven that it doesn’t
Archive for the 'awk' Category
2/01/10
9:46 pm
Just how long again?

If you’ve ever wanted to see how long you’ll be playing a particular artist’s music for, if you were to play their entire discography, I present a one-liner in bash that will show you just that.

find . -type f -name '*.mp*' -exec exiftool '{}' + | grep Duration | awk '{x += $3; print x;}'

The venerable find command needs no introduction. Suffice it to say that the type switch restricts it to files, and the *.mp* restricts the files found to MP3s or MP2s.

Exiftool is a nifty command-line processing tool for tags of all kinds, as seen by its name. In this case, we want just the ‘Duration’ field of each song. Once we have those, we pipe those to awk and get a running total, which shows us how long the entire discography is when the final total is printed.

Now I know that I have 5668.14 minutes of Zappa goodness, or a mere 208 minutes of godly Death. You need to run this in the folder that has all the albums by the artist, or of course, you can adapt it to a script and pass in parameters and so on.

8/14/08
11:02 pm
She sells Shell Scripts on the Sea Shore

Recently, I had to parse data in several text files and calculate averages. From these averages, I had to create a chart. So I could either write a program using a real language like C or Perl or something, or even worse, copy and paste each value into a spreadsheet and then go from there. You should be shuddering by now.

The data was in 5 different folders, with each folder containing 25 files, with the contents of each file being:

real 70.67
user 70.66
sys 0.00
real 70.82
user 70.81
sys 0.01
real 70.89
user 70.88
sys 0.00

What I needed was the average of the three lines with the word “real” in them. So, first we grep through all the files for the word real to get something like:

real 70.67
real 70.82
real 70.89
real 41.27
real 41.16
real 41.39
real 125.75
real 125.42

Now, we need to sum up every 3 lines and divide by 3 to get the average. Enter awk:

awk  ‘{x+=$2;if(!(NR%3)){printf(“%2.3f\n”,x/3);x=0}}’

What this does is to add the second column ($2) to a variable called x. Awk automatically initializes all variables to zero, so we don’t need to worry about bogus data. The NR variable holds the number of lines, so every time we pass three lines, divide the current sum by 3 and then reset the subtotal to zero. Perhaps, making the code a bit tidier might help, even though everyone loves those cryptic one liners:

awk  ‘{

x+=$2;

if(!(NR%3)){

printf(“%2.3f\n”,x/3);

x=0}

}’

Why, it’s almost C, I can hear you say.

Now, we stitch them together into one glorious command:

cat 2.30GHz/* | grep real | awk  ‘{x+=$2;if(!(NR%3)){printf(“%2.3f\n”,x/3);x=0}}’

We need to replace the 2.30GHz by a variable, so we can iterate through the folders. And we need to append the output to a file, to be imported into your favorite spreadsheet later. Here’s the final script

#!/bin/bash

freqs=( 2.30GHz 2.00GHz 1.70GHz 1.40GHz 1.15GHz )

for l in ${freqs[@]}
do
data=`cat $l/* | grep real | awk  ‘{x+=$2;if(!(NR%3)){printf(“%2.3f\n”,x/3);x=0}}’`
echo $data >> file.csv
done

And file.csv of course looks like:

198.967 265.083 543.800 139.247 51.973 70.793 41.273 125.640 214.127 220.863 91.303 15.230 46.397 256.093 176.000 178.037 213.133 183.947 31.743 181.223 220.143 192.857 47.360 82.017 177.790
224.363 295.043 601.253 150.720 58.730 81.213 47.243 132.723 244.263 244.343 103.090 17.160 53.183 273.963 188.223 201.857 238.343 209.247 36.773 207.530 235.840 223.670 53.480 91.600 196.297
256.523 316.210 676.660 165.870 67.860 95.867 55.793 138.647 279.900 273.450 118.363 19.590 62.377 293.080 199.717 236.997 271.997 242.750 42.667 244.317 253.797 259.940 61.300 102.497 220.560

…..

There you have it. Averages of all the required numbers from every file, all in one file. Import it as a space delimited file into Calc or Excel and Robert’s your mother’s brother.