Join vector elements with corresponding taxonomy elements from reference file with the same key. A vector field is shown in Table 3.41 where the column item includes multiple elements separated by a space delimiter.
Table 3.41 - 3.42 highlights some examples.
Table 3.41: Input data in.csv |
||||||||||
no items 1 a b c 2 a d 3 b f e f 4 f c d |
item |
taxo |
a |
X |
b |
Y |
c |
Z |
e |
X |
f |
Z |
Table 3.42: Basic example vf=items m=ref.csv K=item f=taxo |
||||||||||
no items 1 a b c X Y Z 2 a d X 3 b f e f Y Z X Z 4 f c d Z Z |
no |
items |
1 |
a b c X Y Z |
2 |
a d X * |
3 |
b f e f Y Z X Z |
4 |
f c d Z Z * |
Take note that the mvjoin common read the whole reference file at once into memory, thus huge reference file may consume massive amounts of memory.
mvjoin vf= K= f= [n=] m=| i= [o=] [delim=] [-nfn] [-nfno] [-x] [--help] [--version]
vf= Field name of vector (from i= input file) for joining. Multiple fields can be specified. Sorting of the vectors is not required. m= Reference file. K= Specify key field in reference file (m=) where corresponding taxonomy elements are joined to the vector. The sequence of vector should be unique, sorting is not required. The output may differ if the string sequence is not unique. f= Field name of vector (element) for joining. n= Specify the replacement character when the key elements do not match in vf= and K= . The vector (element) will not be joined with the reference file when this option not specified.
$ more dat1.csv items b a c c c e a a $ more ref1.csv item,taxo a,X Y b,X c,Z Z $ mvjoin vf=items K=item m=ref1.csv f=taxo i=dat1.csv o=rsl1.csv #END# kgVjoin K=item f=taxo i=dat1.csv m=ref1.csv o=rsl1.csv vf=items $ more rsl1.csv items b a c X X Y Z Z c c Z Z Z Z e a a X Y X Y
$ more dat2.csv items1,items2 b a c,b b c c,a d e a a,a a $ more ref2.csv item,taxo a,X b,X c,Y d,Y $ mvjoin vf=items1,items2 K=item m=ref2.csv f=taxo i=dat2.csv o=rsl2.csv #END# kgVjoin K=item f=taxo i=dat2.csv m=ref2.csv o=rsl2.csv vf=items1,items2 $ more rsl2.csv items1,items2 b a c X X Y,b b X X c c Y Y,a d X Y e a a X X,a a X X
mvcommon : Use this command to select common elements of vector.