Background: Distractibility and attentiveness are cognitive states that are expressed through observable behavior. The effective use of behavior observed in videos to diagnose periods of distractibility and attentiveness is still not well understood. Video-based tools for classifying cognitive states from behavior have high potential to serve as versatile diagnostic indicators of maladaptive cognition. New method: We describe an analysis pipeline that classifies cognitive states using a 2-camera set-up for video-based estimation of attentiveness and screen engagement in nonhuman primates performing cognitive tasks. The procedure reconstructs 3D poses from 2D labeled DeepLabCut videos, reconstructs the head/yaw orientation relative to a task screen, and arm/hand/wrist engagements with task objects, to segment behavior into an attentiveness and engagement score. Results: Performance of different cognitive tasks were robustly classified from video within a few frames, reaching >90% decoding accuracy with [≤]3min time segments. The analysis procedure allows setting subject-specific thresholds for segmenting subject specific movements for a time-resolved scoring of attentiveness and screen engagement. Comparison with existing methods: Current methods also extract poses and segment action units; however, they haven\'t been combined into a framework that enables subject-adjusted thresholding for specific task contexts. This integration is needed for inferring cognitive state variables and differentiating performance across various tasks. Conclusion: The proposed method integrates video segmentation, scoring of attentiveness and screen engagement, and classification of task performance at high temporal resolution. This integrated framework provides a tool for assessing attention functions from video.