3.71 mvjoin - Join Reference Vector Elements

Join vector elements with corresponding taxonomy elements from reference file with the same key. A vector field is shown in Table 3.41 where the column item includes multiple elements separated by a space delimiter.

Table 3.41 - 3.42 highlights some examples.

Table 3.41: Input data
in.csv

no

items

1

a b c

2

a d

3

b f e f

4

f c d

: Caption text
Reference file ref.csv

item

taxo

a

X

b

Y

c

Z

e

X

f

Z

Table 3.42: Basic example
vf=items m=ref.csv K=item f=taxo

no

items

1

a b c X Y Z

2

a d X

3

b f e f Y Z X Z

4

f c d Z Z

: Caption text
An example defining unmatched taxonomy elements vf=items m=ref.csv K=item f=taxo n=*

no

items

1

a b c X Y Z

2

a d X *

3

b f e f Y Z X Z

4

f c d Z Z *

Take note that the mvjoin common read the whole reference file at once into memory, thus huge reference file may consume massive amounts of memory.

Format

mvjoin vf= K= f= [n=] m=| i= [o=] [delim=] [-nfn] [-nfno] [-x] [--help] [--version]

vf=

Field name of vector (from i= input file) for joining.

 

Multiple fields can be specified. Sorting of the vectors is not required.

m=

Reference file.

K=

Specify key field in reference file (m=) where corresponding taxonomy elements are joined to the vector.

 

The sequence of vector should be unique, sorting is not required.

 

The output may differ if the string sequence is not unique.

f=

Field name of vector (element) for joining.

n=

Specify the replacement character when the key elements do not match in vf= and K= .

 

The vector (element) will not be joined with the reference file when this option not specified.

Example

Example 1: Combine vector with elements from reference file

$ more dat1.csv
items
b a c
c c
e a a
$ more ref1.csv
item,taxo
a,X Y
b,X
c,Z Z
$ mvjoin vf=items K=item m=ref1.csv f=taxo i=dat1.csv o=rsl1.csv
#END# kgVjoin K=item f=taxo i=dat1.csv m=ref1.csv o=rsl1.csv vf=items
$ more rsl1.csv
items
b a c X X Y Z Z
c c Z Z Z Z
e a a X Y X Y

Example 2: Join elements to multiple fields

$ more dat2.csv
items1,items2
b a c,b b
c c,a d
e a a,a a
$ more ref2.csv
item,taxo
a,X
b,X
c,Y
d,Y
$ mvjoin vf=items1,items2 K=item m=ref2.csv f=taxo i=dat2.csv o=rsl2.csv
#END# kgVjoin K=item f=taxo i=dat2.csv m=ref2.csv o=rsl2.csv vf=items1,items2
$ more rsl2.csv
items1,items2
b a c X X Y,b b X X
c c Y Y,a d X Y
e a a X X,a a X X

related command

mvcommon : Use this command to select common elements of vector.