Random access with a distributed Bitmap Join Index for Star Joins

Heliyon. 2020 Feb 17;6(2):e03342. doi: 10.1016/j.heliyon.2020.e03342. eCollection 2020 Feb.

Abstract

Indices improve the performance of relational databases, especially on queries that return a small portion of the data (i.e., low-selectivity queries). Star joins are particularly expensive operations that commonly rely on indices for improved performance at scale. The development and support of index-based solutions for Star Joins are still at very early stages. To address this gap, we propose a distributed Bitmap Join Index (dBJI) and a framework-agnostic strategy to solve join predicates in linear time. For empirical analysis, we used common Hadoop technologies (e.g., HBase and Spark) to show that dBJI significantly outperforms full scan approaches by a factor between 59% and 88% in queries with low selectivity from the Star Schema Benchmark (SSB). Thus, distributed indices may significantly enhance low-selectivity query performance even in very large databases.

Keywords: Computer science; Distributed Bitmap Index; Hadoop ecosystem; Low-selectivity queries; Random access; Star Join.