An information-theoretic data mining method is employed to analyze categorical spatiotemporal Geographic Information System land use data. Reconstructability Analysis (RA) is a maximum-entropy-based data modeling methodology that works exclusively with discrete data such as those in the National Land Cover Database (NLCD). The NLCD is organized into a spatial (raster) grid and data are available in a consistent format for every five years from 2001 to 2021. An NLCD tool reports how much change occurred for each category of land use; for the study area examined, the most dynamic class is Evergreen Forest (EFO), so the presence or absence of EFO in 2021 was chosen as the dependent variable that our data modeling attempts to predict. RA predicts the outcome with approximately 80% accuracy using a sparse set of cells from a spacetime data cube consisting of neighboring lagged-time cells. When the predicting cells are all Shrubs and Grasses, there is a high probability for a 2021 state of EFO, while when the predicting cells are all EFO, there is a high probability that the 2021 state will not be EFO. These findings are interpreted as detecting forest clear-cut cycles that show up in the data and explain why this class is so dynamic. This study introduces a new approach to analyzing GIS categorical data and expands the range of applications that this entropy-based methodology can successfully model.
Keywords: GIS; categorical data; forest prediction; reconstructability analysis; space–time data cube; spatiotemporal.