Calculate the average of data series specified at f= parameter based on the key at k= with hash function.
The processing speed of this command is faster than mavg since the key fields do not have require prior sorting. However, variation in key lengths (different length of strings in field) will slow down the processing speed.
mhashavg f= [hs=] [k=] [-n] [i=] [o=] [-nfn] [-nfno] [-x] [precision=] [--help] [--version]
f= Calculate the average of the field name (Multiple fields can be specified) . Specify the new field name after colon ":". Example: f=Quantity:AverageQuantity. k= Calculate the average on the data series based on the key field(s) (Multiple keys can be specified). This command do not use aggregate key break processing, prior sorting is not required. hs= Hash size (Default value: 199999) Refer to mhashsum for related information. -n Return NULL in output if there are null values in f=.
Calculate the average Quantity and average Amount for each Customer.
$ more dat1.csv Customer,Quantity,Amount A,1, B,,15 A,2,20 B,3,10 B,1,20 $ mhashavg k=Customer f=Quantity,Amount i=dat1.csv o=rsl1.csv #END# kghashavg f=Quantity,Amount i=dat1.csv k=Customer o=rsl1.csv $ more rsl1.csv Customer,Quantity,Amount A,1.5,20 B,2,15
The output returns NULL if there NULL value is present in Quantity and Amount. Use -n option to print the null value.
$ mhashavg k=Customer f=Quantity,Amount -n i=dat1.csv o=rsl2.csv #END# kghashavg -n f=Quantity,Amount i=dat1.csv k=Customer o=rsl2.csv $ more rsl2.csv Customer,Quantity,Amount A,1.5, B,,15
Refer to the benchmark at mhashsum to find out more on processing speed.
mavg : Compute average
mhashsum : Compute hash total value