Cell segmentation and classification are critical tasks in spatial omics data analysis. We introduce CelloType, an end-to-end model designed for cell segmentation and classification of biomedical microscopy images. Unlike the traditional two-stage approach of segmentation followed by classification, CelloType adopts a multi-task learning approach that connects the segmentation and classification tasks and simultaneously boost the performance of both tasks. CelloType leverages Transformer-based deep learning techniques for enhanced accuracy of object detection, segmentation, and classification. It outperforms existing segmentation methods using ground-truths from public databases. In terms of classification, CelloType outperforms a baseline model comprised of state-of-the-art methods for individual tasks. Using multiplexed tissue images, we further demonstrate the utility of CelloType for multi-scale segmentation and classification of both cellular and non-cellular elements in a tissue. The enhanced accuracy and multi-task-learning ability of CelloType facilitate automated annotation of rapidly growing spatial omics data.