A general approach to progressive learning

semanticscholar(2020)

引用 0|浏览8
暂无评分
摘要
In biological learning, data is used to improve performance on the task at hand, while simultaneously improving performance on both previously encountered tasks and as yet unconsidered future tasks. In contrast, classical machine learning starts from a blank slate, or tabula rasa, using data only for the single task at hand. While typical transfer learning algorithms can improve performance on future tasks, their performance degrades upon learning new tasks. Many recent approaches have attempted to mitigate this issue, called catastrophic forgetting, to maintain performance given new tasks. But striving to avoid forgetting sets the goal unnecessarily low: the goal of progressive learning, whether biological or artificial, is to improve performance on all tasks (including past and future) with any new data. We propose a general approach to progressive learning that ensembles representations, rather than learners. We show that ensembling representations—including representations learned by decision forests or neural networks—enables both forward and backward transfer on a variety of simulated and real data tasks, including vision, language, and adversarial tasks. This work suggests that further improvements in progressive learning may follow from a deeper understanding of how biological learning achieves such high degrees of efficiency. Learning from data is the process by which an intelligent system improves performance on a given task by leveraging data [24]. In biological learning, learning is progressive, continually building on past knowledge and experiences, improving on many tasks given data associated with any task. For example, learning a second language often improves performance in an individual’s native language [40]. In classical machine learning, the system often starts with essentially zero knowledge, a “tabula rasa”, and is optimized for a single task [37, 36]. While it is relatively easy to simultaneously optimize for multiple tasks (multi-task learning) [8], it has proven much more difficult to sequentially optimize for multiple tasks [34, 35]. Specifically, classical machine learning systems, and natural extensions thereof, exhibit “catastrophic forgetting” when trained sequentially, meaning their performance on the prior tasks drops precipitously upon training on new tasks [22, 21]. This is the opposite of many biological learning settings, such as the second language learning setting mentioned above. In the past 30 years, a number of sequential learning methods have attempted to overcome catastrophic forgetting. These approaches naturally fall into one of two camps. In the first, the system adds (or builds) resources as new data arrive [30, 19]. Biologically, this corresponds to development, where brains grow by adding neurons and/or synapses. In the second, the system has fixed resources, and so must reallocate resources (essentially compressing representations) in order to incorporate new knowledge [16, 20, 39, 20, 32]. Biologically, this corresponds to adulthood, where brains have a nearly fixed or decreasing number of neurons and synapses. Approaches from both camps demonstrate continual [27], or lifelong learning. In particular, they can sometimes learn new tasks while not catastrophically forgetting old tasks. However, as we will show, none of the previous algorithms are able to meaningfully transfer knowledge across tasks—either forward or backward—both key capabilities in progressive learning. We present a general approach to progressive learning that we call “ensembling representations”. Ensembling representation algorithms sequentially learn a representation for each task, and ensemble both old and new representations for all future decisions. The task-specific representations can be learned using any desirable mechanism. We implement two complementary ensembling representation algorithms, one based on ensembling decision forests (Lifelong Forests), and another based on ensembling deep neural networks (Lifelong Networks). Simulations illustrate the limitations and capabilities of these approaches, including performance properties in the presence of adversarial tasks. We then Johns Hopkins University (JHU), Microsoft Research, JHU Applied Physics Lab † denotes equal contribution, ∗ corresponding author: jovo@jhu.edu 1 ar X iv :2 00 4. 12 90 8v 1 [ cs .A I] 2 7 A pr 2 02 0 demonstrate the capabilities of ensembling representation approaches on multiple real datasets, including both vision and language applications. Although ensembling representations are in the resource building camp mentioned above, we illustrate that they can continue to develop, converting from the juvenile resource building state to the adult resource recruiting state, while maintaining their progressive learning capabilities.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要