3.45 msed - Replace String Matching Regular Expression

Replace string in the fields specified in the f= parameter with a string specified in the v= parameter for content that matches the regular expression specified in the c= parameter .

Format

msed c= f= v= [-A] [-g] [-W] [i=] [o=] [-nfn] [-nfno] [-x] [--help] [--version]

Parameters

`f=`	specify the target list of field name(s) (multiple fields can be specified) for parsing.
`c=`	Define the regular expression for string substitution.
	Refer to usage of regular expressions.
`v=`	Specify the string to replace the substring that matches with the regular expression specified in the `c=` parameter.
	It is possible to substitute match result with the following methods:
	`$&` : Matched string
	$` : Search for the string from the beginning of the target replacement character string, until a string is matched.
	`$'` : After a matched string, substitute target replacement string with matched string till the end.
	`$N` : partial string match for the N-th occurrance (`N>=1`).
`-A`	Instead of replacing the specified field, add field as a new column.
`-g`	Replace all matches of the regular expression.
`-W`	Replace wide character matches of the regular expression.

Using regular expressions

List of regular expression specified in the c= parameter is shown from Table 3.12 to Table 3.15.

Table 3.12: Regular expression match with 1 character

Regular expression	Description	Example of pattern	Example of `c=,v=`	Result
`.`	Any character	`abbbcc`	`c=. v=X -g`	`XXXXXX`
`[abc]`	either `a,b, or c` character	`abbbcc`	`c=[ac] v=X -g`	`XbbbXX`
`[^abc]`	Any character other than `a,b,c`	`abbbcc`	`c=[^ac] v=X -g`	`aXXXcc`
`[a-z]`	Any character from `a` to `z`	`abbbcc`	`c=[a-b] v=X -g`	`XXXXcc`
`[^a-z]`	Any character outside the range of `a` to `z`	`abbbcc`	`c=[^a-b] v=X -g`	`abbbXX`
`\t`	Tab character
`\w`	Word string (`[0-9a-zA-Z_]`)	`ab#cd&ef`	`c=\w v=X -g`	`XX#XX&XX`
`\W`	Characters other than Word string	`ab#cd&ef`	`c=\w v=X -g`	`abXcdXef`
`\s`	Space character (`[ \t]`)	`ab cd ef`	`c=\s v=X -g`	`abXcdXef`
`\S`	Non-whitespace character	`ab cd ef`	`c=\s v=X -g`	`XX XX XX`
`\d`	Numeric constituent characters (`[0-9]`)	`ab12c0`	`c=\d v=X -g`	`abXXcX`
`\D`	Non-numeric constituent characters	`ab12c0`	`c=\d v=X -g`	`XX12X0`

Table 3.13: Repetition of regular expressions

Regular expression	Description	Example of pattern	Example of `c=,v=`	Result
`a*`	Zero or more repetition of `a`	`abbbcc`	`c=ab* v=X`	`Xcc`
`a+`	Repetition of one or more `a`	`abbbcc`	`c=ab+ v=X`	`Xcc`
`a?`	Single occurrence of `a`	`abbbcc`	`c=ab? v=X`	`Xbbcc`
`a{M,N}`	Repetition of `a` more than M and less than N	`abbbbbcc`	`c=ab{3,4} v=X`	`Xbcc`
`a{M}`	Repetition of `a` more than M times	`abbbbbcc`	`c=ab{3} v=X`	`Xbbcc`
`a｜b`	`a` or `b`	`abbbc`	`c=(ab)｜(bc) v=X`	`XbX`
`?`	Shortest match after the repeat sign	`abbbc`	`c=ab*? v=X`	`Xbbbc`

Table 3.14: Position of regular expression

Regular expression	Description	Example of pattern	Example of `c=,v=`	Result
`^`	Match from the beginning	`abac`	`c=^a v=X -g`	`Xbac`
`$`	Match till the end	`acac`	`c=c$ v=X -g`	`acaX`
`\b`	Match starting characters of string	`aac ba ac bac`	`c=\ba v=X -g`	`Xac bX Xc bac`
`\B`	Match within the string	`aac ba ac bac`	`c=\Ba v=X -g`	`aXc ba ac bXc`

Table 3.15: Others

Regular expression	Description	Example of pattern	Example of `c=,v=`	Result
(expr)	Grouping
`\1,..,\9`	Back reference	`abbcababc`	`c=(ab)(bc)\1 v=x`	`Xabc`
`(?=expr)`	Position before matched string at `expr`
`(?!expr)`	Position before unmatched string at `expr`

Examples

Example 1: Basic Example

Replace the 4-digit substring in the zipCode field starting 00 with ####.

$ more dat1.csv
customer,zipCode
A,6230041
B,6240053
C,6330032
D,6230087
E,6530095
$ msed f=zipCode c=00.. v=#### i=dat1.csv o=rsl1.csv
#END# kgsed c=00.. f=zipCode i=dat1.csv o=rsl1.csv v=####
$ more rsl1.csv
customer,zipCode
A,623####
B,624####
C,633####
D,623####
E,653####

Example 2: Specify field name

Replace the 4-digit substring in the zipCode field starting 00 with ####. Save output in column zipCode4.

$ msed f=zipCode:zipCode4 c='00\d\d' v=#### i=dat1.csv o=rsl2.csv
#END# kgsed c=00\d\d f=zipCode:zipCode4 i=dat1.csv o=rsl2.csv v=####
$ more rsl2.csv
customer,zipCode4
A,623####
B,624####
C,633####
D,623####
E,653####

Example 3: Global replacement

Global search using the regular expression - to replace value of 0 in zipCode.

$ msed f=zipCode c=0 v=- -g i=dat1.csv o=rsl3.csv
#END# kgsed -g c=0 f=zipCode i=dat1.csv o=rsl3.csv v=-
$ more rsl3.csv
customer,zipCode
A,623--41
B,624--53
C,633--32
D,623--87
E,653--95

Example 4: Replace substring

Delete fruit from the beginning of the string in item. Note that when first match (^) is specified, the substring within the word grapefruit in the last row is retained.

$ more dat2.csv
item,price
fruit:apple,100
fruit:peach,250
fruit:pineapple,300
fruit:orange,450
fruit:grapefruit,500
$ msed f=item c='^fruit' v= -g i=dat2.csv o=rsl4.csv
#END# kgsed -g c=^fruit f=item i=dat2.csv o=rsl4.csv v=
$ more rsl4.csv
item,price
:apple,100
:peach,250
:pineapple,300
:orange,450
:grapefruit,500

Example 5: Substitution using match results

Replaced 1 or more consecutive character strings of b using $& is defined in the v=.

$ more dat3.csv
str1
abc
abbc
ac
$ msed f=str1 c='b+' v='#$&#' i=dat3.csv o=rsl5.csv
#END# kgsed c=b+ f=str1 i=dat3.csv o=rsl5.csv v=#$&#
$ more rsl5.csv
str1
a#b#c
a#bb#c
ac

Example 6: Combination of the global match

When performing a global match, each match is evaluated against the contents defined at v=.

$ msed f=str1 c=b v='#$&#' -g i=dat3.csv o=rsl6.csv
#END# kgsed -g c=b f=str1 i=dat3.csv o=rsl6.csv v=#$&#
$ more rsl6.csv
str1
a#b#c
a#b##b#c
ac

Example 7: Prefix substitution

Replace the matching first character of b in the character string (prefix) using $`.

$ msed f=str1 c=b v='#$`#' i=dat3.csv o=rsl7.csv
#END# kgsed c=b f=str1 i=dat3.csv o=rsl7.csv v=#$`#
$ more rsl7.csv
str1
a#a#c
a#a#bc
ac

Example 8: Suffix substitution

Replace the matching last character of b in the character string (suffix) using $'.

$ msed f=str1 c=b v="#$'#" i=dat3.csv o=rsl8.csv
#END# kgsed c=b f=str1 i=dat3.csv o=rsl8.csv v=#$'#
$ more rsl8.csv
str1
a#c#c
a#bc#bc
ac

Related Commands

mchgstr : Use this command to replace with a simple string match.

mcal : Include several functions to handle the regular expression.