This command extracts case frame from the output of KNP analysis.
Case frame refers to the clause consisting of verb and case particle in Japanese, such as 「リンゴ(が)」+「好き」、「望遠鏡(で)」+「見る」. This command reads the parsing results (in XML format) from mknp.rb command, extracts the case frame, and saves the output in CSV format.
mcaseframe.rb I= o= [-key] [-mcmdenv] [--help]
I= : Path name of the parsing results in XML file from mknp.rb. o= : Output of case frame file name. -key : Output in key type format. -mcmdenv : Display MCMD message containing environment variables. Default returns warning and error message (KG_VerboseLevel=2). --help : Display help
The XML output of mknp.rb command is shown below (excerpt).
<sentence id='0' text='子どもはリンゴがすきです。'> <chunk id='0' link='2' phraseType='格助詞句' caseType='ガ2格' phrase='子供' phraseTok='子 <token id='0' class1='名詞' class2='普通名詞' word='子ども' orgWord='子ども' daiWord='子供 <token id='1' class1='助詞' class2='副助詞' word='は' orgWord='は'/> </chunk> <chunk id='1' link='2' phraseType='格助詞句' caseType='ガ格' phrase='林檎' phraseTok='リン <token id='2' class1='名詞' class2='普通名詞' word='リンゴ' orgWord='リンゴ' daiWord='林檎 <token id='3' class1='助詞' class2='格助詞' word='が' orgWord='が'/> </chunk> <chunk id='2' link='-1' phraseType='用言句' phraseTok='すきだ' rawPhrase='すきです。' phrase <token id='4' class1='形容詞' class3='ナ形容詞' class4='デス列基本形' word='すきだ' orgWord <token id='5' class1='特殊' class2='句点' word='。' orgWord='。'/> </chunk> </sentence>
In the examples, chunk id=’0’「子どもは」 in link=’2’, chunk id=’1’「リンゴが」also has link=’2’, therefore, chunk id=’2’「すきです」is related . The dependency relationship is illustrated in the figure.
子どもは──┐ リンゴが──┤ すきです。
When using this command, the dependency relationships are extracted and saved as CSV shown as follows.
aid,sid,cid,contrastConj,denial,declinableWord,lid,caseWord,case test.txt,0,2,,,すきだ,0,子ども,ガ2 test.txt,0,2,,,すきだ,1,リンゴ,ガ
The meaning of each item of CSV is shown as follows.
aid : Name of input file sid : Line number (sentence ID) cid : Chunk ID contrastConj : Reverse connection conjunctions denial : Set as 1 when chunk contains negative word declinableWord : Verb clause lid : Chunk ID of case particle clause caseWord : Case particle clause case : Kind of case particle clause
Example used in the previous section. One line has become one case frame.
$ more xml/test.txt <?xml version='1.0' encoding='UTF-8'?> <article id='test.txt'> <sentence id='0' text='子どもはリンゴがすきです。'> <chunk id='0' link='2' phraseType='格助詞句' caseType='ガ2格' phrase='子供' phraseTok=' <token id='0' class1='名詞' class2='普通名詞' word='子ども' orgWord='子ども' daiWord=' <token id='1' class1='助詞' class2='副助詞' word='は' orgWord='は'/> </chunk> <chunk id='1' link='2' phraseType='格助詞句' caseType='ガ格' phrase='林檎' phraseTok='リ <token id='2' class1='名詞' class2='普通名詞' word='リンゴ' orgWord='リンゴ' daiWord=' <token id='3' class1='助詞' class2='格助詞' word='が' orgWord='が'/> </chunk> <chunk id='2' link='-1' phraseType='用言句' phraseTok='すきだ' rawPhrase='すきです。' ph <token id='4' class1='形容詞' class3='ナ形容詞' class4='デス列基本形' word='すきだ' or <token id='5' class1='特殊' class2='句点' word='。' orgWord='。'/> </chunk> </sentence> <sentence id='1' text='望遠鏡で泳ぐ少女を見た。'> <chunk id='0' link='3' phraseType='格助詞句' caseType='デ格' phrase='望遠鏡' phraseTok=' <token id='0' class1='名詞' class2='普通名詞' word='望遠' orgWord='望遠' daiWord='望遠 <token id='1' class1='名詞' class2='普通名詞' word='鏡' orgWord='鏡' daiWord='鏡' cate <token id='2' class1='助詞' class2='格助詞' word='で' orgWord='で'/> </chunk> <chunk id='1' link='2' phraseType='用言句' phrase='泳ぐ' phraseTok='泳ぐ' rawPhrase='泳 <token id='3' class1='動詞' class3='子音動詞ガ行' class4='基本形' word='泳ぐ' orgWord= </chunk> <chunk id='2' link='3' phraseType='格助詞句' caseType='ヲ格' phrase='少女' phraseTok='少 <token id='4' class1='名詞' class2='普通名詞' word='少女' orgWord='少女' daiWord='少女 <token id='5' class1='助詞' class2='格助詞' word='を' orgWord='を'/> </chunk> <chunk id='3' link='-1' phraseType='用言句' phraseTok='見る' rawPhrase='見た。' phrase=' <token id='6' class1='動詞' class3='母音動詞' class4='タ形' word='見る' orgWord='見た' <token id='7' class1='特殊' class2='句点' word='。' orgWord='。'/> </chunk> </sentence> </article>mcaseframe.rb I=xml o=caseframe.csv #END# /Users/maegawa/.rvm/rubies/ruby-2.0.0-p247/bin/mcaseframe.rb I=xml o=caseframe.csv more caseframe.csv aid,sid,cid,contrastConj,denial,declinableWord,lid,caseWord,case test.txt,0,2,,,すきだ,0,子ども,ガ2 test.txt,0,2,,,すきだ,1,リンゴ,ガ test.txt,1,3,,,見る,0,望遠鏡,デ test.txt,1,3,,,見る,2,少女,ヲ
When executing by adding the option -key, case particle influencing inflectable word from the line is saved in output.
$ mcaseframe.rb -key I=xml o=caseframe2.csv #END# /Users/maegawa/.rvm/rubies/ruby-2.0.0-p247/bin/mcaseframe.rb -key I=xml o=caseframe2.c $ more caseframe2.csv aid,sid,cid,contrastConj,denial,lid,word,type test.txt,0,2,,,2,すきだ,用言 test.txt,0,2,,,0,子ども,ガ2 test.txt,0,2,,,1,リンゴ,ガ test.txt,1,1,,,1,泳ぐ,用言 test.txt,1,3,,,3,見る,用言 test.txt,1,3,,,0,望遠鏡,デ test.txt,1,3,,,2,少女,ヲ