3.17 mdata - Generate Datasets

Generate a variety of data sets, including the best known Iris data set in the field of pattern recognition and data sets for the MCMD tutorial. Noted that the way of defining parameters in this command is completely different with other commands.

Format

mdata dataset/parameter1/parameter2/...

Parameters

Define the the name of data set and parameter separated with "/". The dataset summary is listed under contents in Table 3.1. The usage of parameter differs depending on the dataset, its corresponding details of the parameters for each data set can be found in the Table below.

Table 3.1: Description of data set

Name of data set

Description

Parameters

iris

This dataset records characteristics and size of

N/A

 

sepal and petal of a variety of Iris

 
 

best for building classification model.

 

man0

Data used in Figure 2.1 of this manual

N/A

man1

Data used in Figure 2.3of this manual

N/A

tutorial_jp

Artificial supermarket purchasing data is used.

If the data name is specified, mdata

   

writes the specified data to the standard output.

 

Comprised of customer master, product master,

If not specified, all files will be generated under the

 

and multiple files.

directory tutorial_jp

   

The details of each data name are as follows.

   

dat: Purchasing data

   

syo: Product master

   

cust: Customer master

   

jicfs1,jicfs2,jicfs4,jicfs6: Product category master

tutorial_en

English version of the tutorial_jp dataset

Same as tutorial_jp

Examples

Example 1 Generate iris dataset

Write Iris dataset to standard output.

$ mdata iris
SepalLength,SepalWidth,PetalLength,PetalWidth,Species
5.1,3.5,1.4,0.2,setosa
4.9,3,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
         :

Example 2 Create Tutorial dataset

Create all files for the tutorial dataset.

$ mdata tutorial_en
#END# mdata tutorial_en

$ ls -l tutorial_en
total 4704
-rw-r--r--  1 hamuro  staff    20673  8 22 08:14 cust.csv
-rw-r--r--  1 hamuro  staff  2281312  8 22 08:14 dat.csv
-rw-r--r--  1 hamuro  staff      128  8 22 08:14 jicfs1.csv
-rw-r--r--  1 hamuro  staff      529  8 22 08:14 jicfs2.csv
-rw-r--r--  1 hamuro  staff     6630  8 22 08:14 jicfs4.csv
-rw-r--r--  1 hamuro  staff    36400  8 22 08:14 jicfs6.csv
-rw-r--r--  1 hamuro  staff    46466  8 22 08:14 syo.csv

$ more tutorial_en/dat.csv
customer,dob,gender
00000A,19711107,female
00000B,19461025,female
00000C,19660307,female
         :

Example 3 Create individual tutorial dataset

Write Product Master dataset to standard output.

$ mdata tutorial_en/syo
product,productName,Code1Desc,Code2Desc,Code4Desc,Code6Desc,manufacturer,brand,unitCost
0000000,Processed_Seafood_2,1,11,1197,119705,0495,049502,310
0000001,Other_Yogurt_Drink_2,1,14,1404,140497,1658,165801,215
0000002,Carbonic_Flavor_3,1,14,1403,140305,1911,191100,406
             :