Ground-level and aerial perspectives in virtual space provide simplified conditions for investigating differences between exploratory navigation and map reading in large-scale environmental learning. General similarities and differences in ground-level and aerial encoding have been identified, but little is known about the specific characteristics that differentiate them. One such characteristic is the need to process orientation; ground-level encoding (and navigation) typically requires dynamic orientations, whereas aerial encoding (and map reading) is typically conducted in a fixed orientation. The present study investigated how this factor affected spatial processing by comparing ground-level and aerial encoding to a hybrid condition: aerial-with-turns. Experiment 1 demonstrated that scene recognition was sensitive to both perspective (ground-level or aerial) and orientation (dynamic or fixed). Experiment 2 investigated brain activation during encoding, revealing regions that were preferentially activated perspective as in previous studies (Shelton and Gabrieli in J Neurosci 22:2711-2717, 2002), but also identifying regions that were preferentially activated as a function of the presence or absence of turns. Together, these results differentiated the behavioral and brain consequences attributable to changes in orientation from those attributable to other characteristics of ground-level and aerial perspectives, providing leverage on how orientation information is processed in everyday spatial learning.