Learning to Label Sequences in One Pass

msra

引用 23|浏览20
暂无评分
摘要
The sequence labelling task consists in predicting a sequence of labels given an observed sequence of tokens. This task is an example of structured output learning system and appears in practical problems in computational linguistics and signal processing. Two informal assumptions are crucial for this task. The first states that a label depends only on the surrounding labels and tokens. The second states that this de- pendency its invariant with is time index. These assumptions are expressed through the parametric formulation of the models, and, in the case of probabilistic models, through conditional independence assumptions (Markov models). Part of the model specification is then the inference procedure that recovers the predicted labels for any input sequence. Batch sequence learning algorithms determine the model parameters by optimizing a global objective function that depends on all the training sequences. This approach is compatible with a variety of inference procedures. However the computational cost of learning usually grows faster than the total number of tokens in the training set. Online sequence learning algorithms are less costly because they iteratively update the model parameters by separately processing each training sequence, or each training token. Although algorithm of the latter kind are restricted to models based on greedy inference, they have been shown to be extremely competitive in practice. Following (2), we cast both exact and greedy inference as two quadratic program- ming problems whose kernel matrices define the same feature space and then derive two online sequence learning algorithms using a slightly simplified (improved) vari- ant of the LaRank algorithm (1). Both algorithms empirically perform as well as the equivalent batch algorithm with exact inference with only one epoch over the training data. Their training times scale linearly with the number of training to- kens. Since both algorithms derive from the same setup we can also discuss the observed di!erences in training time and sparsity that tend to favor the greedy online algorithm.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要