谷歌浏览器插件
订阅小程序
在清言上使用

Is Character Glyph Useless? Improving Neural Chinese Word Segmentation with Character Glyph Embedding

semanticscholar(2018)

引用 0|浏览1
暂无评分
摘要
There is rich information hidden in the glyph of Chinese characters, which consist of many small picture-like components. However, there are few works aware of the importance of overall glyph information and even some draw negative conclusion on it. Based on the idea of utilizing the overall glyph information in Chinese word segmentation (CWS) task, we propose a model by introducing autoencoder before BiLSTM with CRF on our synthetic Chinese Character Image Datasets to generate character glyph embeddings. Our experimental results show that the model performs quite well without any extra external dictionaries, word features or resources on several standard datasets including Simplified Chinese and Traditional Chinese, whose glyph is more regular with less evolutionary simplifications. These verify the feasibility of Chinese character glyph for Chinese word segmentation, especially its impressive support in solving the out-of-vocabulary(OOV) words and its great help for Traditional Chinese Word Segmentation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要