1. Detection and tracking of animals is an important first step for automated behavioral studies using videos. Animal tracking is currently done mostly using deep learning frameworks based on keypoints, which show remarkable results in lab settings with fixed cameras, backgrounds, and lighting. However, multi-animal tracking in the wild presents several challenges such as high variability in background and lighting conditions, complex motion, and occlusion. 2. We propose a multi-animal tracking model, PriMAT, for nonhuman primates in the wild. The model learns to detect and track primates and other objects of interest from labeled videos or single images using bounding boxes instead of keypoints. Using bounding boxes significantly facilitates data annotation and robustness. Our one-stage model is conceptually simple but highly flexible, and we add a classification branch that allows us to train individual identification. 3. To evaluate the performance of our model, we applied it in two case studies with Assamese macaques (Macaca assamensis) and redfronted lemurs (Eulemur rufifrons) in the wild. We show that with only a few hundred frames labeled with bounding boxes, we can achieve robust tracking results. Combining these results with the classification branch for the lemur videos, our model shows an accuracy of 84% in predicting lemur identities. 4. Our approach presents a promising solution for accurately tracking and identifying animals in the wild, offering researchers a tool to study animal behavior in their natural habitats. Our code, models, training images, and evaluation video sequences are publicly available, facilitating their use for animal behavior analyses and future research in this field.