3.59 msummary Calculate Statistics for 1 Variable

Calculate the type of statistics specified at c= parameter for fields specified at f= parameter.

Format

msummary c= f= [a=] [k=] [i=] [o=] [-nfn] [-nfno] [-x] [-q] [precision=] [--help] [--version]

Parameters

k=

Compute statistics based on the key field(s) specified (multiple fields can be specified).

f=

Field lists for computation of statistical summary (multiple fields can be specified).

 

When -x,-nfn option is specified, specify the field number (0 ).

c=

Statistics (multiple fields can be specified)

 

Specify list of statistics delimited by comma.

 

Statistics list:

 

sum/mean/count/ucount/devsq/var/uvar/sd/usd/cv/min/qtile1/median/qtile3/max/

 

range/qrange/mode/skew/uskew/kurt/ukurt

-a

New column name.

 

Results from calculation on column(s) specified at f= parameter (default is fld).

List of Statistics

The list of statistics specified at c= parameter is shown in Table 3.33.

Table 3.33: List of Statistics

Value of c=

Description

Equation

Remarks

count

Count (Except NULL value)

$n$: Number of non-NULL records

It can not be applied to character string field.

ucount

Unique count

$un$: Number of duplicate values removed

It can not be applied to character string field.

sum

Total

$sum=\sum _{i=1}^ n x_ i$

 

mean

Arithmetic mean

$m=\frac{1}{n}\sum _{i=1}^ n x_ i$

 

devsq

Sum of squared deviation

$S=\sum _{i=1}^ n(x_ i-m)^2$

 

var

Variance

$s^2=\frac{1}{n}S$

 

uvar

Variance (unbiased estimate)

$u^2=\frac{1}{n-1}S$

 

sd

Standard deviation

$s=\sqrt {s^2}$

 

usd

Standard deviation (sort of unbiased variance)

$u=\sqrt {u^2}$

commonly used standard deviation

cv

Coefficient of variation

$cv=s/m100\% $

 

mode

Mode

$mode$: Most frequent value

Print the value of the smaller value

     

if the frequency is same.

min

Minimum value

$min=\min _ i x_ i$

 

max

Maximum value

$max=\max _ i x_ i$

 

range

Range

$r=max-min$

 

median

Median

$Q2=Second quartile when sorted in ascending order$

 

qtile1

First quartile

$Q1=First quartile when sorted in ascending order$

 

qtile3

Third quartile

$Q3=Third quartile when sorted in ascending order$

 

qrange

Interquartile range

$rq=Q3-Q1$

 

skew

Skewness

$\frac{\frac{1}{n}\sum _{i=1}^ n (x_ i-m)^3}{s^3}$

 

uskew

Skewness (unbiased estimate)

omitted

 

kurt

Kurtosis

$\frac{\frac{1}{n}\sum _{i=1}^ n (x_ i-m)^4}{s^4}-3.0$

 

ukurt

Kurtosis (unbiased estimate)

omitted

 

Examples

Example 1: Basic Example

Find out the median and average "quantity" and "amount" by each customer. Save the output in a new column named "type".

$ more dat1.csv
customer,quantity,amount
A,1,10
A,2,20
B,1,15
B,3,10
B,1,20
$ msummary k=customer f=quantity,amount c=median:medianval,mean:meanval a=type i=dat1.csv o=rsl1.csv
#END# kgsummary a=type c=median:medianval,mean:meanval f=quantity,amount i=dat1.csv k=customer o=rsl1.csv
$ more rsl1.csv
customer%0,type,medianval,meanval
A,quantity,1.5,1.5
A,amount,15,15
B,quantity,1,1.666666667
B,amount,15,15

Related Commands

mstats : Compute one type of statistics.