Mutual information-based image registration, shown to be effective in registering a range of medical images, is a computationally expensive process, with a typical execution time on the order of minutes on a modern single-processor computer. Accelerated execution of this process promises to enhance efficiency and therefore promote routine use of image registration clinically. This paper presents details of a hardware architecture for real-time three-dimensional (3-D) image registration. Real-time performance can be achieved by setting up a network of processing units, each with three independent memory buses: one each for the two image memories and one for the mutual histogram memory. Memory access parallelization and pipelining, by design, allow each processing unit to be 25 times faster than a processor with the same bus speed, when calculating mutual information using partial volume interpolation. Our architecture provides superior per-processor performance at a lower cost compared to a parallel supercomputer.