Because of their close relationship with humans, non-human apes (chimpanzees, bonobos, gorillas, orangutans, and gibbons, including siamangs) are of great scientific interest. The goal of understanding their complex behavior would be greatly advanced by the ability to perform video-based pose tracking. Tracking, however, requires high-quality annotated datasets of ape photographs. Here we present OpenApePose, a new public dataset of 71,868 photographs, annotated with 16 body landmarks of six ape species in naturalistic contexts. We show that a standard deep net (HRNet-W48) trained on ape photos can reliably track out-of-sample ape photos better than networks trained on monkeys (specifically, the OpenMonkeyPose dataset) and on humans (COCO) can. This trained network can track apes almost as well as the other networks can track their respective taxa, and models trained without one of the six ape species can track the held-out species better than the monkey and human models can. Ultimately, the results of our analyses highlight the importance of large, specialized databases for animal tracking systems and confirm the utility of our new ape database.
Keywords: apes; behavior tracking; dataset; deep learning; neuroscience; pose estimation.
All animals carry out a wide range of behaviors in everyday life, such as feeding and communicating with one another. Understanding the complex behavior of non-human apes such as chimpanzees, bonobos, gorillas, orangutans, and various gibbons is of great interest to scientists due to their close relationship with humans. Each behavior is made up of a string of poses that an animal makes with its body. To analyze them in a reliable and consistent way, scientists have developed automated pose estimation methods that determine the position of body parts from photographs and videos. While these systems require minimal external input to perform, they need to be trained on a large dataset of high-quality annotated images of the target animals to teach the system what to look for. So far, scientists have relied on systems trained on monkey and human images to analyze ape data. However, apes are particularly challenging to track because their body textures are uniform, and they have a large number of poses. Therefore, for the most accurate tracking of ape behaviors, a dedicated training dataset of annotated ape images is required. Desai et al. filled this gap by creating the “OpenApePose” dataset, which contains 71,868 photographs of apes from six species, annotated using 16 body landmarks. To test the dataset, the researchers trained an artificial intelligence network on separate monkey, human and ape datasets. The findings showed that the network is better at tracking apes when trained on ape images rather than those of monkeys or humans. It is also equally good at tracking apes as other monkey and human networks are at tracking their own species. This is contrary to optimistic expectations that monkey and human models could be generalized to apes. Training the network without images of one of the six ape species showed that it can still track the excluded species better than monkey and human models can. These experiments highlight the importance of species and family-specific datasets. OpenApePose is a valuable resource for researchers from various fields. It can aid tracking of animal behavior in the wild using large quantities of footage recorded by camera traps and drones. Artificial intelligence models trained on the OpenApePose dataset could also help scientists – such as neuroscientists – link movement with other types of data, including brain activity measurements, to gain deeper insights into behavior.
© 2023, Desai et al.