Refine
Abstract knowledge is deeply grounded in many computer-based applications. An important research area of Artificial Intelligence (AI) deals with the automatic derivation of knowledge from data. Machine learning offers the according algorithms. One area of research focuses on the development of biologically inspired learning algorithms. The respective machine learning methods are based on neurological concepts so that they can systematically derive knowledge from data and store it. One type of machine learning algorithms that can be categorized as "deep learning" model is referred to as Deep Neural Networks (DNNs). DNNs consist of multiple artificial neurons arranged in layers that are trained by using the backpropagation algorithm. These deep learning methods exhibit amazing capabilities for inferring and storing complex knowledge from high-dimensional data.
However, DNNs are affected by a problem that prevents new knowledge from being added to an existing base. The ability to continuously accumulate knowledge is an important factor that contributed to evolution and is therefore a prerequisite for the development of strong AIs. The so-called "catastrophic forgetting" (CF) effect causes DNNs to immediately loose already derived knowledge after a few training iterations on a new data distribution. Only an energetically expensive retraining with the joint data distribution of past and new data enables the abstraction of the entire new set of knowledge. In order to counteract the effect, various techniques have been and are still being developed with the goal to mitigate or even solve the CF problem. These published CF avoidance studies usually imply the effectiveness of their approaches for various continual learning tasks.
This dissertation is set in the context of continual machine learning with deep learning methods. The first part deals with the development of an application-oriented real-world evaluation protocol which can be used to investigate different machine learning models with regard to the suppression of the CF effect. In the second part, a comprehensive study indicates that under the application-oriented requirements none of the investigated models can exhibit satisfactory continual learning results. In the third part, a novel deep learning model is presented which is referred to as Deep Convolutional Gaussian Mixture Models (DCGMMs). DCGMMs build upon the unsupervised approach of Gaussian Mixture Models (GMMs). GMMs cannot be considered as deep learning method and they have to be initialized in a data-driven manner before training. These aspects limit the use of GMMs in continual learning scenarios.
The training procedure proposed in this work enables the training of GMMs by using Stochastic Gradient Descent (SGD) (as applied to DNNs). The integrated annealing scheme solves the problem of a data-driven initialization, which has been a prerequisite for GMM training. It is experimentally proven that the novel training method enables equivalent results compared to conventional methods without iterating their disadvantages. Another innovation is the arrangement of GMMs in form of layers, which is similar to DNNs. The transformation of GMMs into layers enables the combination with existing layer types and thus the construction of deep architectures, which can derive more complex knowledge with less resources.
In the final part of this work, the DCGMM model is examined with regard to its continual learning capabilities. In this context, a replay approach referred to as Gaussian Mixture Replay (GMR) is introduced. GMR describes the generation and replay of data samples by utilizing the DCGMM functionalities. Comparisons with existing CF avoidance models show that similar continual learning results can be achieved by using GMR under application-oriented conditions. All in all, the presented work implies that the identified application-oriented requirements are still an open issue with respect to "applied" continual learning research approaches. In addition, the novel deep learning model provides an interesting starting point for many other research areas.