Learning High-Dimensional Evolving Data Streams With Limited Labels

IEEE Trans Cybern. 2022 Nov;52(11):11373-11384. doi: 10.1109/TCYB.2021.3070420. Epub 2022 Oct 17.

Abstract

In the context of streaming data, learning algorithms often need to confront several unique challenges, such as concept drift, label scarcity, and high dimensionality. Several concept drift-aware data stream learning algorithms have been proposed to tackle these issues over the past decades. However, most existing algorithms utilize a supervised learning framework and require all true class labels to update their models. Unfortunately, in the streaming environment, requiring all labels is unfeasible and not realistic in many real-world applications. Therefore, learning data streams with minimal labels is a more practical scenario. Considering the problem of the curse of dimensionality and label scarcity, in this article, we present a new semisupervised learning technique for streaming data. To cure the curse of dimensionality, we employ a denoising autoencoder to transform the high-dimensional feature space into a reduced, compact, and more informative feature representation. Furthermore, we use a cluster-and-label technique to reduce the dependency on true class labels. We employ a synchronization-based dynamic clustering technique to summarize the streaming data into a set of dynamic microclusters that are further used for classification. In addition, we employ a disagreement-based learning method to cope with concept drift. Extensive experiments performed on many real-world datasets demonstrate the superior performance of the proposed method compared to several state-of-the-art methods.

MeSH terms

  • Algorithms*
  • Cluster Analysis