The accurate classification of coffee bean roast levels is crucial for delivering high-quality coffee drinks tailored to consumer preferences. Roast levels significantly impact the flavor profile of coffee, with different roasting stages catering to specific brewing methods and taste expectations. Despite the importance of precise classification, many coffee shops still rely on manual processes that are time-consuming and prone to human error. This study presents an automated approach using computer vision and deep learning techniques to classify coffee beans into eight distinct roast levels: Extremely Light, Very Light, Light, Medium Light, Medium, Moderately Dark, Dark, and Very Dark. A Vision Transformer (ViT) model was employed due to its state-of-the-art performance in image classification tasks. The model was trained on a custom dataset containing 3,600 images, evenly distributed across the eight classes. To ensure robust performance, preprocessing techniques, including histogram matching and normalization, were applied. The ViT achieved exceptional results, with a testing accuracy of 0.9778, precision of 0.9791, recall of 0.9778, and F1-score of 0.9777. These findings demonstrate the ViT’s effectiveness in distinguishing subtle visual differences between roast levels. This approach offers a scalable and cost-effective solution for automating coffee bean classification, enhancing efficiency and reducing operational errors in the coffee industry