原名「台灣學術線上」
包含TAO期刊庫 + TAO書籍庫 + 論文 + 史料文獻
首頁 | 關於TAO | 瀏覽 | 進階查詢 | 參考工具 | 會員服務 | 已購專書 | RSS服務 | 電子報 | FAQ  
查詢範圍:
   
查詢模式:
熱門查詢詞:
dvdDiy扣件原住民教育困境OTA
   
   
   
     
   
 
項次 書目
1
題名:Histogram Equalization on Statistical Approaches for Chinese Unknown Word Extraction     (40點)
著者:Bor-Shen LinYi-Cong Chen
出版地區:台灣
出版城市:台北市
學科:電機資訊
關鍵字:Unknown Word Extraction ; Word Identification ; Machine Learning ; Multilayer Perceptrons ; Histogram Equalization
刊名:International Journal of Computational Linguistics & Chinese Language Processing
卷期:16卷3、4期合刊(2011.12)
頁碼:41-60
語言:英語
摘要: 英文摘要PDF

With the evolution of human lives and the spread of information, new things emerge quickly and new terms are created every day. Therefore, it is important for natural language processing systems to extract new words in progression with time. Due to the broad areas of applications, however, there might exist the mismatch of statistical characteristics between the training domain and the testing domain, which inevitably degrades the performance of word extraction. This paper proposes a scheme of word extraction in which histogram equalization for feature normalization is used. Through this scheme, the mismatch of the feature distributions due to different corpus sizes or changes of domain can be compensated for appropriately such that unknown word extraction becomes more reliable and applicable to novice domains. The scheme was initially evaluated on the corpora announced in SIGHAN2. 68.43% and 71.40% F-measures for word identification, which correspond to 66.72%/32.94% and 75.99%/58.39% recall rates for IV/OOV, respectively, were achieved for the CKIP and the CUHK test sets, respectively, using four combined features with equalization. When applied to unknown word extraction for a novice domain, this scheme can identify such pronouns as "海角七號" (Cape No. 7, the name of a film), "蠟筆小新" (Crayon Shinchan, the name of a cartoon figure), "金融海嘯" (Financial Tsunami) and so on, which cannot be extracted reliably with rule-based approaches, although the approach appears not so good at identifying such terms as the names of humans, places, or organizations, for which the semantic structure is prominent. This scheme is complementary with the outcomes of two word segmentation systems, and is promising if other rule-based approaches could be further integrated.


    

本卷期目次
International Journal of Computational Linguistics & Chinese Language Processing 16卷3、4期合刊 (2011.12)
English Article Errors in Taiwanese College Students' EFL Writing/ Neil Edward BarrettLi-Mei Chen
基於辭典詞彙釋義之多階層釋義關聯程度計量-以「目」字部為例/ 趙逢毅鍾曉芳
Histogram Equalization on Statistical Approaches for Chinese Unknown Word Extraction/ Bor-Shen LinYi-Cong Chen
Intent Shift Detection Using Search Query Logs/ Chieh-Jen WangHsin-Hsi Chen
Characteristics of Independent Claim: A Corpus-Linguistic Approach to Contemporary English Patents/ Darren Hsin-Hung LinShelley Ching-Yu Hsieh
 
   
 
   

與TAO合作 | 隱私與版權聲明 | 聯絡方式 | 下載Adobe Reader
地址:台北市中正區(100)北平東路30-12號3樓
電話:(02)2393-6968 傳真:(02)2393-6877
Email: service@wordpedia.com
Wordpedia Family: 學校、企業版入口 | 遠流影音館
Copyright©2011 Wordpedia Co., Ltd. All Rights Reserved.