Simulation of the planetary boundary layer (PBL) is key for forecasting air quality and estimating greenhouse gas (GHG) emissions in cities. Here we conducted the first long-term and continuous study of PBL heights (PBLHs) in Boston, MA, using a compact lidar instrument. We developed an image recognition algorithm to estimate PBLHs from the lidar measurements and evaluated simulations of the PBL from seven numerical weather prediction (NWP) model versions, which showed different systematic errors and variability in simulating the PBLHs (discrepancies from -2.5 to 4.0 km). The NWP model with the best overall agreement for the fully developed PBL had R2 = 0.72 and a bias of only 0.128 km. However, this model predicted a notable number of anomalously high carbon dioxide concentrations at ground stations, because it occasionally significantly underestimated the PBLH. We also developed a novel method that combines lidar data with footprints from a Lagrangian particle dispersion model to identify long-range transport of air pollution in the nocturnal residual layer. Our framework was powerful in evaluating the performance of models used to estimate air pollution and GHG emissions in cities, which is critical to track progress on emission reduction targets and guide effective policies.