ConcVAE: Conceptual Representation Learning

IEEE Trans Neural Netw Learn Syst. 2024 Jul 3:PP. doi: 10.1109/TNNLS.2024.3404496. Online ahead of print.

Abstract

Disentangled representation learning aims at obtaining an independent latent representation without supervisory signals. However, the independence of a representation does not guarantee interpretability to match human intuition in the unsupervised settings. In this article, we introduce conceptual representation learning, an unsupervised strategy to learn a representation and its concepts. An antonym pair forms a concept, which determines the semantically meaningful axes in the latent space. Since the connection between signifying words and signified notions is arbitrary in natural languages, the verbalization of data features makes the representation make sense to humans. We thus construct Conceptual VAE (ConcVAE), a variational autoencoder (VAE)-based generative model with an explicit process in which the semantic representation of data is generated via trainable concepts. In visual data, ConcVAE utilizes natural language arbitrariness as an inductive bias of unsupervised learning by using a vision-language pretraining, which can tell an unsupervised model what makes sense to humans. Qualitative and quantitative evaluations show that the conceptual inductive bias in ConcVAE effectively disentangles the latent representation in a sense-making manner without supervision. Code is available at https://github.com/ganmodokix/concvae.