3.37 mnrjoin - Natural Join within Multiple Ranges with Reference File

Join columns according to the range of values in the column from reference file. The field specified at r= parameter is matched with the the range of values defined as two arguments at the R= parameter in the reference file defined at the m= parameter. the field(s) specified atf= parameter are joined for records with the same value.

If there are more than one match for each record, natural join returns output for all rows. The range of values is compared as character strings by default. Attach %n after the field name at the r= parameter to process as numerical values.

Format

mnrjoin R= r= [k=] [K=] [f=] [-n] [-N] m=| i= [o=] [-nfn] [-nfno] [-x] [-q] [--help] [--version]

Parameters

f=

The field name(s) (multiple fields can be specified) to join from the reference file.

 

When this is not defined, the all fields except the key specified at K= will be joined.

m=

Reference file name.

 

Read from standard input if this parameter is not set (when i= is specified).

R=

Field names of the range (limit to 2 fields).

 

Field names (start,end) of the range in reference file.

 

If the first field is NULL, the range is any number less than the ending value of the range.

 

If the second field is NULL, the range is any number greater than the starting value of the range.

r=

Compare the values in this field [%n] against the range.

 

Field name in the input file.

 

Add %n after the field name in the r= parameter to process as numerical values.

k=

Key field name(s) (multiple fields can be specified) from the input data for comparison

 

Join records with same key fields in the input data k= and reference data K=.

K=

Key field name(s) (multiple fields can be specified) from the reference data for comparison

 

This key field(s) from reference data is compared with the key field(s) from the input data

 

specified at k= parameter, fields where records with common key are joined.

 

This parameter is not required if the field name name is the same as the one defined at the k= parameter.

-n

Output NULL values when reference data does not consist of input data.

-N

Output NULL values when input data does not consist of reference data.

For example, given the parameters k=key K=Key r=val%n R=range i=dat.csv m=ref.csv, if the sort criteria for the input data dat.csv is carried out by msortf f=key,val%n, the sort criteria for ref.csv should follow accordingly as msortf f=Key,range%n.

Examples

Example 1: Basic Example

For records where the value of date field is 20080203, select those records in the input data where amount field is more than 5 but less than 15 and join field where avg=150. For records where amount field is more than 40 but less than 50, join field avg=200.

$ more dat1.csv
date,price
20080123,10
20080123,20
20080203,10
20080203,35
200804l0,50
$ more ref1.csv
date,priceF,priceT,avg
20080203,5,15,150
20080203,40,50,200
$ mnrjoin k=date f=avg m=ref1.csv R=priceF,priceT r=price%n i=dat1.csv o=rsl1.csv
#END# kgnrjoin R=priceF,priceT f=avg i=dat1.csv k=date m=ref1.csv o=rsl1.csv r=price%n
$ more rsl1.csv
date%0,price,avg
20080203,10,150

Example 2: Output unmatched data

Use -n to return all records in the input data even if they do not match with those in the reference file (row where avg= Null), and use -N to return records in the reference file even if they do not match with those in the input file (rows where price= null). This is known as outer-join.

$ mnrjoin k=date f=avg m=ref1.csv R=priceF,priceT r=price%n -n -N i=dat1.csv o=rsl2.csv
#END# kgnrjoin -N -n R=priceF,priceT f=avg i=dat1.csv k=date m=ref1.csv o=rsl2.csv r=price%n
$ more rsl2.csv
date%0,price,avg
20080123,10,
20080123,20,
20080203,10,150
20080203,35,
20080203,,200
200804l0,50,

Related Command

mrjoin : Use mrjoin if there are repeated values in join key (K= field) from the reference data.