Predicting the location of moving objects in noisy environments is essential to everyday behavior, like when participating in traffic. Although many objects provide multisensory information, it remains unknown how humans use multisensory information to localize moving objects, and how this depends on expected sensory interference (e.g., occlusion). In four experiments, we systematically investigated localization for auditory, visual, and audiovisual targets (AV). Performance for audiovisual targets was compared to performance predicted by maximum likelihood estimation (MLE). In Experiment 1A, moving targets were occluded by an audiovisual occluder, and their final locations had to be inferred from target speed and occlusion duration. Participants relied exclusively on the visual component of the audiovisual target, even though the auditory component demonstrably provided useful location information when presented in isolation. In contrast, when a visual-only occluder was used in Experiment 1B, participants relied exclusively on the auditory component of the audiovisual target, even though the visual component demonstrably provided useful location information when presented in isolation. In Experiment 2, although localization estimates were in line with MLE predictions, no multisensory precision benefits were found when participants localized moving audiovisual target. In Experiment 3, a substantial multisensory benefit was found when participants localized static audiovisual target, showing near-MLE integration. In sum, observers use both hearing and vision when localizing static objects, but use only unisensory input when localizing moving objects and predicting motion under occlusion. Moreover, observers can flexibly prioritize one sense over the other, in anticipation of modality-specific interference. (PsycInfo Database Record (c) 2025 APA, all rights reserved).