3.57 mstats - Calculate Statistics of 1 Variable

Specify the numeric fields in the parameter f=, and calculate the statistics specified in the parameter c=. Specify the aggregate key unit at k=. NULL value in the specified field(s) at f= are ignored. However, if all records include NULL values, NULL values will be included in the output.

Format

mstats c= f= [k=] [i=] [o=] [-nfn] [-nfno] [-x] [-q] [precision=] [--help] [--version]

Parameters

k=

Compute aggregate statistics on the key field(s) specified (multiple fields can be specified).

f=

Fields for which statistics are computed (multiple fields can be specified).

c=

Statistics (select one from the list below)

 

sum|mean|count|ucount|devsq|var|uvar|sd|usd|USD|cv|min|qtile1|

 

median|qtile3|max|range|qrange|mode|skew|uskew|kurt|ukurt

List of statistics

Value of c=

Description

Equation

Remarks

count

Count (Except NULL value)

$n$: Number of non-NULL records

It can not be applied to character string field.

ucount

Unique count

$un$: Number of duplicate values removed

It can not be applied to character string field.

sum

Total

$sum=\sum _{i=1}^ n x_ i$

 

mean

Arithmetic mean

$m=\frac{1}{n}\sum _{i=1}^ n x_ i$

 

devsq

Sum of squared deviation

$S=\sum _{i=1}^ n(x_ i-m)^2$

 

var

Variance

$s^2=\frac{1}{n}S$

 

uvar

Variance (unbiased estimate)

$u^2=\frac{1}{n-1}S$

 

sd

Standard deviation

$s=\sqrt {s^2}$

 

usd

Standard deviation (unbiased variance)

$u=\sqrt {u^2}$

commonly used standard deviation

USD

Unbiased standard deviation

Omission

Accurate unbiased estimation

cv

Coefficient of variation

$cv=s/mx100\% $

 

mode

Mode

$mode$: Most frequent value

Print the value of the smaller value if the frequency is same

     

Print NULL if values are different.

min

Minimum value

$min=\min _ i x_ i$

 

max

Maximum value

$max=\max _ i x_ i$

 

range

Range

$r=max-min$

 

median

Median

$Q2=Second quartile when sorted in ascending order$

 

qtile1

First quartile

$Q1=First quartile when sorted in ascending order$

 

qtile3

Third quartile

$Q3=Third quartile when sorted in ascending order$

 

qrange

Interquartile range

$rq=Q3-Q1$

 

skew

Skewness

$\frac{\frac{1}{n}\sum _{i=1}^ n (x_ i-m)^3}{s^3}$

 

uskew

Skewness (unbiased estimate)

omitted

 

kurt

Kurtosis

$\frac{\frac{1}{n}\sum _{i=1}^ n (x_ i-m)^4}{s^4}-3.0$

 

ukurt

Kurtosis (unbiased estimated)

omitted

 

Examples

Example 1: Basic

Calculate the statistical sum of "quantity" and "amount" field for each "customer".

$ more dat1.csv
customer,quantity,amount
A,1,10
B,5,20
B,2,10
C,1,15
C,3,10
C,1,21
$ mstats k=customer f=quantity,amount c=sum i=dat1.csv o=rsl1.csv
#END# kgstats c=sum f=quantity,amount i=dat1.csv k=customer o=rsl1.csv
$ more rsl1.csv
customer%0,quantity,amount
A,1,10
B,7,30
C,5,46

Example 2: Basic Example 2

Calculate the statistical maximum value.

$ mstats k=customer f=quantity,amount c=max i=dat1.csv o=rsl2.csv
#END# kgstats c=max f=quantity,amount i=dat1.csv k=customer o=rsl2.csv
$ more rsl2.csv
customer%0,quantity,amount
A,1,10
B,5,20
C,3,21

Related Commands

msim : Find out the bivariate statistics.

mavg : Commands specific toc=avg.

msum : Commands specific to c=sum.

mcount : Unlike c=count, this count the number of rows for each aggregate key.