3.48 mselrand - Random Sampling

Random selection of records based on the number of rows set at c= and p= parameters (random sampling without replacement). When k= is specified, a defined number of records with same key are randomly selected, when option -B specified at the same time, records are selected based in the key.

This command used Mersenne twister (developed in 1937) as pseudo random number generator (Webpage of author , boost library).

Format

mselrand c=|p= [k=] [S=] [u=] [-B] [i=] [o=] [-nfn] [-nfno] [-x] [-q] [--help] [--version]

Parameters

c=

Select row(s) based on the number of keys and field specified.

 

This parameter must be specified when p= is not specified.

p=

Define the percentage of records for selection based on each key value.

 

This parameter must be specified if c= parameter is not specified.

k=

Select certain number of rows randomly from records with same key (Allow multiple fields).

S=

The same random seed generates the same row selection sequence.

 

The default setting of random seed is set to the current time if the random seed is not specified.

 

Range of random seed value is between -2147483648 - 2147483647.

u=

Print unmatched records to this output file.

-B

Selection based on key unit.

Examples

Example 1: Basic Example

Randomly select 1 transaction for each customer.

$ more dat1.csv
Customer,Date,Amount
A,20081201,10
A,20081207,20
A,20081213,30
B,20081002,40
B,20081209,50
$ mselrand k=Customer c=1 S=1 i=dat1.csv o=rsl1.csv
#END# kgselrand S=1 c=1 i=dat1.csv k=Customer o=rsl1.csv
$ more rsl1.csv
Customer%0,Date,Amount
A,20081201,10
B,20081002,40

Example 2: Randomly select a percentage of records

Select 50% of each customers’ records at random. Save other records to a separate file oth.csv.

$ mselrand k=Customer p=50 S=1 u=oth2.csv i=dat1.csv o=rsl2.csv
#END# kgselrand S=1 i=dat1.csv k=Customer o=rsl2.csv p=50 u=oth2.csv
$ more rsl2.csv
Customer%0,Date,Amount
A,20081201,10
B,20081002,40
$ more oth2.csv
Customer%0,Date,Amount
A,20081207,20
A,20081213,30
B,20081209,50

Example 3: Select records by same key

In the following example, select two out of the four customers A,B,C,D at random. Customer C,D is selected, and all records of customer C,D is printed to the output.

$ more dat2.csv
Customer,Date,Amount
A,20081201,10
A,20081207,20
A,20081213,30
B,20081002,40
B,20081209,50
C,20081210,60
D,20081201,70
D,20081205,80
D,20081209,90
$ mselrand k=Customer c=2 S=1 -B i=dat2.csv o=rsl3.csv
#END# kgselrand -B S=1 c=2 i=dat2.csv k=Customer o=rsl3.csv
$ more rsl3.csv
Customer%0,Date,Amount
C,20081210,60
D,20081201,70
D,20081205,80
D,20081209,90

Related Commands

msel : Use normally distributed random numbers.

mrand : Add random numbers as a new column.