2.3 mcaseframe.rb Extract Case Frame

This command extracts case frame from the output of KNP analysis.

Case frame refers to the clause consisting of verb and case particle in Japanese, such as 「リンゴ(が)」+「好き」、「望遠鏡(で)」+「見る」. This command reads the parsing results (in XML format) from mknp.rb command, extracts the case frame, and saves the output in CSV format.

2.3.1 Format

mcaseframe.rb I= o= [-key] [-mcmdenv] [--help]

I=

: Path name of the parsing results in XML file from mknp.rb.

o=

: Output of case frame file name.

-key

: Output in key type format.

-mcmdenv

: Display MCMD message containing environment variables.

 

Default returns warning and error message (KG_VerboseLevel=2).

--help

: Display help

Extract case frame

The XML output of mknp.rb command is shown below (excerpt).

 <sentence id='0' text='子どもはリンゴがすきです。'>
  <chunk id='0' link='2' phraseType='格助詞句' caseType='ガ2格' phrase='子供' phraseTok='子
   <token id='0' class1='名詞' class2='普通名詞' word='子ども' orgWord='子ども' daiWord='子供
   <token id='1' class1='助詞' class2='副助詞' word='は' orgWord='は'/>
  </chunk>
  <chunk id='1' link='2' phraseType='格助詞句' caseType='ガ格' phrase='林檎' phraseTok='リン
   <token id='2' class1='名詞' class2='普通名詞' word='リンゴ' orgWord='リンゴ' daiWord='林檎
   <token id='3' class1='助詞' class2='格助詞' word='が' orgWord='が'/>
  </chunk>
  <chunk id='2' link='-1' phraseType='用言句' phraseTok='すきだ' rawPhrase='すきです。' phrase
   <token id='4' class1='形容詞' class3='ナ形容詞' class4='デス列基本形' word='すきだ' orgWord
   <token id='5' class1='特殊' class2='句点' word='。' orgWord='。'/>
  </chunk>
 </sentence>

In the examples, chunk id=’0’「子どもは」 in link=’2’, chunk id=’1’「リンゴが」also has link=’2’, therefore, chunk id=’2’「すきです」is related . The dependency relationship is illustrated in the figure.

子どもは──┐ 
リンゴが──┤ 
      すきです。

When using this command, the dependency relationships are extracted and saved as CSV shown as follows.

aid,sid,cid,contrastConj,denial,declinableWord,lid,caseWord,case
test.txt,0,2,,,すきだ,0,子ども,ガ2
test.txt,0,2,,,すきだ,1,リンゴ,ガ

The meaning of each item of CSV is shown as follows.

aid

: Name of input file

sid

: Line number (sentence ID)

cid

: Chunk ID

contrastConj

: Reverse connection conjunctions

denial

: Set as 1 when chunk contains negative word

declinableWord

: Verb clause

lid

: Chunk ID of case particle clause

caseWord

: Case particle clause

case

: Kind of case particle clause

2.3.2 Examples

Example 1: Basic example

Example used in the previous section. One line has become one case frame.

$ more xml/test.txt
<?xml version='1.0' encoding='UTF-8'?>
<article id='test.txt'>
  <sentence id='0' text='子どもはリンゴがすきです。'>
    <chunk id='0' link='2' phraseType='格助詞句' caseType='ガ2格' phrase='子供' phraseTok='
      <token id='0' class1='名詞' class2='普通名詞' word='子ども' orgWord='子ども' daiWord='
      <token id='1' class1='助詞' class2='副助詞' word='は' orgWord='は'/>
    </chunk>
    <chunk id='1' link='2' phraseType='格助詞句' caseType='ガ格' phrase='林檎' phraseTok='リ
      <token id='2' class1='名詞' class2='普通名詞' word='リンゴ' orgWord='リンゴ' daiWord='
      <token id='3' class1='助詞' class2='格助詞' word='が' orgWord='が'/>
    </chunk>
    <chunk id='2' link='-1' phraseType='用言句' phraseTok='すきだ' rawPhrase='すきです。' ph
      <token id='4' class1='形容詞' class3='ナ形容詞' class4='デス列基本形' word='すきだ' or
      <token id='5' class1='特殊' class2='句点' word='。' orgWord='。'/>
    </chunk>
  </sentence>
  <sentence id='1' text='望遠鏡で泳ぐ少女を見た。'>
    <chunk id='0' link='3' phraseType='格助詞句' caseType='デ格' phrase='望遠鏡' phraseTok='
      <token id='0' class1='名詞' class2='普通名詞' word='望遠' orgWord='望遠' daiWord='望遠
      <token id='1' class1='名詞' class2='普通名詞' word='鏡' orgWord='鏡' daiWord='鏡' cate
      <token id='2' class1='助詞' class2='格助詞' word='で' orgWord='で'/>
    </chunk>
    <chunk id='1' link='2' phraseType='用言句' phrase='泳ぐ' phraseTok='泳ぐ' rawPhrase='泳
      <token id='3' class1='動詞' class3='子音動詞ガ行' class4='基本形' word='泳ぐ' orgWord=
    </chunk>
    <chunk id='2' link='3' phraseType='格助詞句' caseType='ヲ格' phrase='少女' phraseTok='少
      <token id='4' class1='名詞' class2='普通名詞' word='少女' orgWord='少女' daiWord='少女
      <token id='5' class1='助詞' class2='格助詞' word='を' orgWord='を'/>
    </chunk>
    <chunk id='3' link='-1' phraseType='用言句' phraseTok='見る' rawPhrase='見た。' phrase='
      <token id='6' class1='動詞' class3='母音動詞' class4='タ形' word='見る' orgWord='見た'
      <token id='7' class1='特殊' class2='句点' word='。' orgWord='。'/>
    </chunk>
  </sentence>
</article>mcaseframe.rb I=xml o=caseframe.csv
#END# /Users/maegawa/.rvm/rubies/ruby-2.0.0-p247/bin/mcaseframe.rb I=xml o=caseframe.csv
more caseframe.csv
aid,sid,cid,contrastConj,denial,declinableWord,lid,caseWord,case
test.txt,0,2,,,すきだ,0,子ども,ガ2
test.txt,0,2,,,すきだ,1,リンゴ,ガ
test.txt,1,3,,,見る,0,望遠鏡,デ
test.txt,1,3,,,見る,2,少女,ヲ

Example 2: Output of key type format

When executing by adding the option -key, case particle influencing inflectable word from the line is saved in output.

$ mcaseframe.rb -key I=xml o=caseframe2.csv
#END# /Users/maegawa/.rvm/rubies/ruby-2.0.0-p247/bin/mcaseframe.rb -key I=xml o=caseframe2.c
$ more caseframe2.csv
aid,sid,cid,contrastConj,denial,lid,word,type
test.txt,0,2,,,2,すきだ,用言
test.txt,0,2,,,0,子ども,ガ2
test.txt,0,2,,,1,リンゴ,ガ
test.txt,1,1,,,1,泳ぐ,用言
test.txt,1,3,,,3,見る,用言
test.txt,1,3,,,0,望遠鏡,デ
test.txt,1,3,,,2,少女,ヲ