Transactional data from point-of-sales systems may not consider customer behavior before purchasing decisions are finalized. A smart shelf system would be able to provide additional data for retail analytics. In previous works, the conventional approach has involved customers standing directly in front of products on a shelf. Data from instances where customers deviated from this convention, referred to as "cross-location", were typically omitted. However, recognizing instances of cross-location is crucial when contextualizing multi-person and multi-product tracking for real-world scenarios. The monitoring of product association with customer keypoints through RANSAC modeling and particle filtering (PACK-RMPF) is a system that addresses cross-location, consisting of twelve load cell pairs for product tracking and a single camera for customer tracking. In this study, the time series vision data underwent further processing with R-CNN and StrongSORT. An NTP server enabled the synchronization of timestamps between the weight and vision subsystems. Multiple particle filtering predicted the trajectory of each customer's centroid and wrist keypoints relative to the location of each product. RANSAC modeling was implemented on the particles to associate a customer with each event. Comparing system-generated customer-product interaction history with the shopping lists given to each participant, the system had a general average recall rate of 76.33% and 79% for cross-location instances over five runs.
Keywords: computer vision; retail analytics; sensor fusion; smart shelves; visual analytics.