Merge all records in the files specified at i= parameter according to the order of files. If a wild card is used to specify file names, the files will be merged in alphabetical order of the file name.
mcat [f=] [-skip_fnf] [-nostop|-skip|-force] [i=] [o=] [-nfn] [-nfno] [-x] [--help] [--version]
i= Specify list of input file names. Read multiple CSV files separated by comma delimiter. Wild card characters can be used in file name. f= Specify the field name(s) to concatenate. If f= is not specified, the field names defaults to the first file defined in the i= parameter. -skip_fnf If a specified file in the i= parameter does not exist, the program will bypass the error. However, the program returns an error if all files cannot be found. -nostop -nostop,-skip,-force are parameters for controlling exceptions when header is not present. -nostop flag returns null if field name is not specified. When -nfn flag is used with stop flag, the program terminates if the number of items in the data is different than the parameter defined. -skip Files are not concatenated if field name(s) is not specified. When -nfn flag is used with -skip flag, files are not concatenated if the number of data items are different. -force Force concatenation of files using location of fields when header is not present. Print output to null if item number is not available. -stdin Merge from standard input. -add_fname Add file name in the last column. Standard input will be named as /dev/stdin. The field name for this option is fixed as "fileName", error will be returned if input data contains the same field name.
Wild card characters ("?" and "*") can be used to specify multiple directory and file names.
The symbol ~/can be used to indicate home directory.
The files are concatenated according according to the order specified in the i= parameter. If a wild card is used, files will be merged in alphabetical order. Standard input takes precedence when merging files.
$ more dat1.csv customer,date,amount A,20081201,10 B,20081002,40 $ more dat2.csv customer,date,amount A,20081207,20 A,20081213,30 B,20081209,50 $ mcat i=dat1.csv,dat2.csv o=rsl1.csv #END# kgcat i=dat1.csv,dat2.csv o=rsl1.csv $ more rsl1.csv customer,date,amount A,20081201,10 B,20081002,40 A,20081207,20 A,20081213,30 B,20081209,50
The first file dat1.csv defined at i= contains columns "customer,date,amount". However, since "amount" is not present in dat3.csv, it will return an error. Nevertheless, the contents in the first file dat1.csv is merged and saved in the output.
$ more dat3.csv customer,date,quantity A,20081201,3 B,20081002,1 $ mcat i=dat1.csv,dat3.csv o=rsl2.csv #ERROR# field name [amount] not found on file [dat3.csv] (kgcat) $ more rsl2.csv customer,date,amount A,20081201,10 B,20081002,40
When previous example is attached with -nostop option, the command will continue processing and return NULL value for the data item not found. Other options such as skip,force handle conditions when the field name is not found. For details, refer to the description of parameters.
$ more dat3.csv customer,date,quantity A,20081201,3 B,20081002,1 $ mcat -nostop i=dat1.csv,dat3.csv o=rsl3.csv #END# kgcat -nostop i=dat1.csv,dat3.csv o=rsl3.csv $ more rsl3.csv customer,date,amount A,20081201,10 B,20081002,40 A,20081201, B,20081002,
Merge field names specified at f=.
$ mcat f=customer,date i=dat2.csv,dat3.csv o=rsl4.csv #END# kgcat f=customer,date i=dat2.csv,dat3.csv o=rsl4.csv $ more rsl4.csv customer,date A,20081207 A,20081213 B,20081209 A,20081201 B,20081002
Read file dat2.csv from standard input by specifying -stdin option.
$ mcat -stdin i=dat1.csv o=rsl5.csv <dat2.csv #END# kgcat -stdin i=dat1.csv o=rsl5.csv $ more rsl5.csv customer,date,amount A,20081207,20 A,20081213,30 B,20081209,50 A,20081201,10 B,20081002,40
When -add_fname is specified, the original file name fileName is added as a new column. File name of standard input is /dev/stdin.
$ mcat -add_fname -stdin i=dat1.csv o=rsl6.csv <dat2.csv #END# kgcat -add_fname -stdin i=dat1.csv o=rsl6.csv $ more rsl6.csv customer,date,amount,fileName A,20081207,20,/dev/stdin A,20081213,30,/dev/stdin B,20081209,50,/dev/stdin A,20081201,10,dat1.csv B,20081002,40,dat1.csv
Specifying wild card dat*.csv to concatenate the three CSV files dat1.csv,dat2.csv,dat3.csv in the current directory.
$ more dat1.csv customer,date,amount A,20081201,10 B,20081002,40 $ more dat2.csv customer,date,amount A,20081207,20 A,20081213,30 B,20081209,50 $ more dat3.csv customer,date,quantity A,20081201,3 B,20081002,1 $ mcat -force i=dat*.csv o=rsl7.csv #END# kgcat -force i=dat*.csv o=rsl7.csv $ more rsl7.csv customer,date,amount A,20081201,10 B,20081002,40 A,20081207,20 A,20081213,30 B,20081209,50 A,20081201,3 B,20081002,1
Same file can be specified more than one time.
$ mcat i=dat1.csv,dat1.csv,dat1.csv o=rsl8.csv #END# kgcat i=dat1.csv,dat1.csv,dat1.csv o=rsl8.csv $ more rsl8.csv customer,date,amount A,20081201,10 B,20081002,40 A,20081201,10 B,20081002,40 A,20081201,10 B,20081002,40
msep : Reverse the operation mentioned above and separate data files.