3.76 mwindow - Generate Sliding Window

Replicate original records and shift specified fields. A fixed window with constant width is set when calculating moving averages for time series data. The first element of moving average is obtained by taking the average of the initial fixed subset of the number series. The subset is shifted forward and included the next number following the original subset in the series. This method is known as sliding window calculation.

An example is shown from Table 3.49.

Table 3.47: input data

date

val

4/6

1

4/7

2

4/8

3

4/9

4

Table 3.48: wk=date:win t=2

win

date

val

4/7

4/6

1

4/7

4/7

2

4/8

4/7

2

4/8

4/8

3

4/9

4/8

3

4/9

4/9

4

Table 3.49: wk=date:win t=2 -r

win

date

val

4/6

4/6

1

4/6

4/7

2

4/7

4/7

2

4/7

4/8

3

4/8

4/8

3

4/8

4/9

4

Table 3.47 shows the input data which contains total daily values of four consecutive days. The figures could represent the changes in sales in supermarket and stock price trends.

This example calculates the moving average from 4/6 to 4/9 with a subset size of 2 for each window. Three window intervals [(4/6,1),(4/7,2)], [(4/7,2),(4/8,3)], [(4/8,3),(4/9,4)] are generated, where [ ] indicates a window, and ( ) indicates a line.

Based on the unique key of each windows (referred as "window key"), the output prints the maximum value of the window (the last row of item can be specified by wk= parameter) and the field name based (Table 3.48). -r option is used as the minimum value (first row of data) of each window (Table 3.49). Afterwards, the output results (Table 3.48) is followed by using mavg to calculate the averages of the data series.

The mmvavg command is equivalent to processing the data with mwindow+mavg as described above. However, mmvavg is 3.5 times faster when experimented with a data set of 200MB for 10 million records with a subset size of 10 for each window.

Format

mwindow wk= t= [k=key] [-r] [-n] [i=] [o=] [-nfn] [-nfno] [-x] [-q] [--help] [--version]

wk=

Specify an unique value from the field name in the input data that identifies the window.

 

After the specified field is sorted, the sliding window is created,

 

%r is added for descending sort order, %n is added for numeric sorting.

 

When sorting in numeric descending order, %nr is added.

 

It is necessary to define the field name of window key after a colon. Multiple fields can be specified.

t=

Specify the window size (number of rows).

k=

Specify the unit for the generation of windows.

-r

Use the first row of data as baseline of sliding window. By default, the last row of data is used as baseline.

-n

Print all window intervals even though the window size less than the defined parameter at t=.

i=

Input file name

-nfn

Input data without field header in the first row.

Example

Example 1: Basic Example

$ more dat1.csv
date,val
20130406,1
20130407,2
20130408,3
20130409,4
$ mwindow wk=date:win t=2 i=dat1.csv o=rsl1.csv
#END# kgwindow i=dat1.csv o=rsl1.csv t=2 wk=date:win
$ more rsl1.csv
win%0,date,val
20130407,20130406,1
20130407,20130407,2
20130408,20130407,2
20130408,20130408,3
20130409,20130408,3
20130409,20130409,4

Example 2: Use first row as baseline data

$ mwindow wk=date:win t=3 -r i=dat1.csv o=rsl2.csv
#END# kgwindow -r i=dat1.csv o=rsl2.csv t=3 wk=date:win
$ more rsl2.csv
win%0,date,val
20130406,20130406,1
20130406,20130407,2
20130406,20130408,3
20130407,20130407,2
20130407,20130408,3
20130407,20130409,4

Example 3: Print all window intervals even if the window size is less than the defined parameter

$ mwindow wk=date:win t=3 -r -n i=dat1.csv o=rsl3.csv
#END# kgwindow -n -r i=dat1.csv o=rsl3.csv t=3 wk=date:win
$ more rsl3.csv
win%0,date,val
20130406,20130406,1
20130406,20130407,2
20130406,20130408,3
20130407,20130407,2
20130407,20130408,3
20130407,20130409,4
20130408,20130408,3
20130408,20130409,4
20130409,20130409,4

Example 4: Example of specifying key field

$ more dat2.csv
store,date,val
a,20130406,1
a,20130407,2
a,20130408,3
a,20130409,4
b,20130406,11
b,20130407,12
b,20130408,13
b,20130409,14
$ mwindow k=store wk=date:win t=2 i=dat2.csv o=rsl4.csv
#END# kgwindow i=dat2.csv k=store o=rsl4.csv t=2 wk=date:win
$ more rsl4.csv
win%1,store%0,date,val
20130407,a,20130406,1
20130407,a,20130407,2
20130408,a,20130407,2
20130408,a,20130408,3
20130409,a,20130408,3
20130409,a,20130409,4
20130407,b,20130406,11
20130407,b,20130407,12
20130408,b,20130407,12
20130408,b,20130408,13
20130409,b,20130408,13
20130409,b,20130409,14

Example 5: Find out the moving averages between current day and previous day

In the above example, moving average is calculated based on the last day of the window. mslide can be used for instances to calculate the moving averages of current day and previous day. The example is as follows:

$ mslide f=date:date2 -q i=dat1.csv o=rsl5.csv
#END# kgslide -q f=date:date2 i=dat1.csv o=rsl5.csv
$ more rsl5.csv
date,val,date2
20130406,1,20130407
20130407,2,20130408
20130408,3,20130409

Example 6: Find out the moving averages from the previous day

$ mwindow wk=date2:win t=2 i=rsl5.csv o=rsl6.csv
#END# kgwindow i=rsl5.csv o=rsl6.csv t=2 wk=date2:win
$ more rsl6.csv
win%0,date,val,date2
20130408,20130406,1,20130407
20130408,20130407,2,20130408
20130409,20130407,2,20130408
20130409,20130408,3,20130409

Related command

mmvavg : Command that specializes in computing average of sliding windows.

mmvstats : Compute various statistics of sliding windows.