Aligning words in French-English non-parallel medical texts: effect of term frequency distributions.

Studies in health technology and informatics(2004)

引用 2|浏览22
暂无评分
摘要
In this paper, we present a method for aligning words based on a statistical model of word distribution similarity. The basis underlying our method is that there is a correlation between the patterns of word co-occurrences in texts of different languages. Using automatically downloaded pages from different medical web sites and a combined bilingual lexicon of general and medical terms as language sources, a similarity score is assigned to each proposed translated pair of words, based on the distributional contexts of these two words. We vary several parameters of the method. Experimental results confirm a positive effect of frequency, show that medical words are better handled than less specialized words, and do not evidence a clear influence of context window size. Future directions for improvement include working with very large, part-of-speech tagged corpora.
更多
查看译文
关键词
natural language processing,controlled vocabulary,multilingualism,translation,algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要