Breast cancer remains a significant public health challenge, prompting extensive research to develop accurate and efficient diagnostic methods. In recent years, the application of artificial intelligence and deep learning techniques has garnered considerable attention for its potential to enhance medical image analysis. Specifically, the utilization of convolutional neural networks (CNNs) has shown promising results in various diagnostic tasks, including breast cancer classification. However, the emergence of transformer-based architectures, exemplified by Visions Transformer (ViT) and Swin Transformer, introduces a novel approach to improving the accuracy and effectiveness of breast cancer classification, particularly when dealing with histopahtological and cytological images.
This thesis delves into the exploration of the ViT and Swin models for breast cancer classification using cytological images, contributing to the broader objective of enhancing early detection and diagnosis. Vision transformers, originally designed for natural image classification, possesses the ability to capture complex spatial relationships and contextual information, making it a suitable candidate for the analysis of histopathological and cytological images. The primary aim of this study is to evaluate vision transformers’ performance in contrast to traditional CNN models, shedding light on its potential for accurate and data-efficient breast cancer classification.
The research methodology involves the collection of a dataset of histopathological and cytological breast cancer images, preprocessing the data to ensure quality, and fine-tuning the vision transformers through transfer learning. Performance assessment employs a range of metrics, including accuracy, precision, recall, and the area under the receiver operating characteristic curve (AUC-ROC), offering a comprehensive evaluation of the model’s classification ability. Comparative analyses are conducted against well-established CNN architectures, enabling a thorough evaluation of vision transformers efficacy.
Results obtained from this study unveil vision transformers’ competence in breast cancer classification tasks, specifically when dealing with cytological images. It demonstrates their adaptability in effectively extracting long range features and patterns from the samples, which CNNs were not able to effectively do. This research highlights the potential of transformer-based models in the realm of medical image analysis, signaling their ability to outperform traditional boundaries.
The proposed method demonstrates a 3.07% improvement in classification accuracy over the current state-of-the-art image-level classification studies, which rely on traditional machine learning and deep learning techniques. Our approach even surpasses previous patch-level classification studies, showing 10.47% increase in test accuracy, ultimately achieving 95.02% on the test set and 100% on the validation set. Experimental results suggest that our method, despite utilizing a very limited number of training images, achieves performance comparable to that of experienced pathologists and holds promise for clinical application.
In conclusion, this thesis underscores the importance of cutting-edge deep learning architectures like ViT and Swin to elevate the accuracy and efficiency of breast cancer classification. The demonstrated success of vision transformers in handling medical images underscores the necessity for further exploration of transformer-based models in medical image analysis, reshaping the landscape of computer-aided diagnosis and contributing to advancements in breast cancer detection.