Replicate original records and shift specified fields. A fixed window with constant width is set when calculating moving averages for time series data. The first element of moving average is obtained by taking the average of the initial fixed subset of the number series. The subset is shifted forward and included the next number following the original subset in the series. This method is known as sliding window calculation.
An example is shown from Table 3.49.
Table 3.47: input data date val 4/6 1 4/7 2 4/8 3 4/9 4 Table 3.48: wk=date:win t=2
Table 3.49: wk=date:win t=2 -r
|
Table 3.47 shows the input data which contains total daily values of four consecutive days. The figures could represent the changes in sales in supermarket and stock price trends.
This example calculates the moving average from 4/6 to 4/9 with a subset size of 2 for each window. Three window intervals [(4/6,1),(4/7,2)], [(4/7,2),(4/8,3)], [(4/8,3),(4/9,4)] are generated, where [ ] indicates a window, and ( ) indicates a line.
Based on the unique key of each windows (referred as "window key"), the output prints the maximum value of the window (the last row of item can be specified by wk= parameter) and the field name based (Table 3.48). -r option is used as the minimum value (first row of data) of each window (Table 3.49). Afterwards, the output results (Table 3.48) is followed by using mavg to calculate the averages of the data series.
The mmvavg command is equivalent to processing the data with mwindow+mavg as described above. However, mmvavg is 3.5 times faster when experimented with a data set of 200MB for 10 million records with a subset size of 10 for each window.
mwindow wk= t= [k=key] [-r] [-n] [i=] [o=] [-nfn] [-nfno] [-x] [-q] [--help] [--version]
wk= Specify an unique value from the field name in the input data that identifies the window. After the specified field is sorted, the sliding window is created, %r is added for descending sort order, %n is added for numeric sorting. When sorting in numeric descending order, %nr is added. It is necessary to define the field name of window key after a colon. Multiple fields can be specified. t= Specify the window size (number of rows). k= Specify the unit for the generation of windows. -r Use the first row of data as baseline of sliding window. By default, the last row of data is used as baseline. -n Print all window intervals even though the window size less than the defined parameter at t=. i= Input file name -nfn Input data without field header in the first row.
$ more dat1.csv date,val 20130406,1 20130407,2 20130408,3 20130409,4 $ mwindow wk=date:win t=2 i=dat1.csv o=rsl1.csv #END# kgwindow i=dat1.csv o=rsl1.csv t=2 wk=date:win $ more rsl1.csv win%0,date,val 20130407,20130406,1 20130407,20130407,2 20130408,20130407,2 20130408,20130408,3 20130409,20130408,3 20130409,20130409,4
$ mwindow wk=date:win t=3 -r i=dat1.csv o=rsl2.csv #END# kgwindow -r i=dat1.csv o=rsl2.csv t=3 wk=date:win $ more rsl2.csv win%0,date,val 20130406,20130406,1 20130406,20130407,2 20130406,20130408,3 20130407,20130407,2 20130407,20130408,3 20130407,20130409,4
$ mwindow wk=date:win t=3 -r -n i=dat1.csv o=rsl3.csv #END# kgwindow -n -r i=dat1.csv o=rsl3.csv t=3 wk=date:win $ more rsl3.csv win%0,date,val 20130406,20130406,1 20130406,20130407,2 20130406,20130408,3 20130407,20130407,2 20130407,20130408,3 20130407,20130409,4 20130408,20130408,3 20130408,20130409,4 20130409,20130409,4
$ more dat2.csv store,date,val a,20130406,1 a,20130407,2 a,20130408,3 a,20130409,4 b,20130406,11 b,20130407,12 b,20130408,13 b,20130409,14 $ mwindow k=store wk=date:win t=2 i=dat2.csv o=rsl4.csv #END# kgwindow i=dat2.csv k=store o=rsl4.csv t=2 wk=date:win $ more rsl4.csv win%1,store%0,date,val 20130407,a,20130406,1 20130407,a,20130407,2 20130408,a,20130407,2 20130408,a,20130408,3 20130409,a,20130408,3 20130409,a,20130409,4 20130407,b,20130406,11 20130407,b,20130407,12 20130408,b,20130407,12 20130408,b,20130408,13 20130409,b,20130408,13 20130409,b,20130409,14
In the above example, moving average is calculated based on the last day of the window. mslide can be used for instances to calculate the moving averages of current day and previous day. The example is as follows:
$ mslide f=date:date2 -q i=dat1.csv o=rsl5.csv #END# kgslide -q f=date:date2 i=dat1.csv o=rsl5.csv $ more rsl5.csv date,val,date2 20130406,1,20130407 20130407,2,20130408 20130408,3,20130409
$ mwindow wk=date2:win t=2 i=rsl5.csv o=rsl6.csv #END# kgwindow i=rsl5.csv o=rsl6.csv t=2 wk=date2:win $ more rsl6.csv win%0,date,val,date2 20130408,20130406,1,20130407 20130408,20130407,2,20130408 20130409,20130407,2,20130408 20130409,20130408,3,20130409
mmvavg : Command that specializes in computing average of sliding windows.
mmvstats : Compute various statistics of sliding windows.