Thesis defences

PhD Oral Exam - Mahsa Raeiszadeh, Information and Systems Engineering

Machine Learning for Anomalies Detection in Real-time Cloud

Date & time

Monday, March 24, 2025
10:30 a.m. – 1:30 p.m.

Cost

This event is free

Organization

School of Graduate Studies

Contact

Dolly Grewal

Where

Engineering, Computer Science and Visual Arts Integrated Complex
1515 St. Catherine W.
Room 1.162

Accessible location

Yes

When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.

Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.

Abstract

Cloud computing enables on-demand access to shared resources hosted in data centers and managed by cloud service providers. However, as cloud environments scale in size and complexity, they become increasingly prone to anomalies—deviations from expected behavior—that can disrupt reliability and availability. In real-time clouds, where operations must be completed within strict time constraints, anomalies pose a greater risk, potentially causing cascading failures, degraded performance, and increased maintenance costs. To address these issues, efficient methods for anomaly detection in real-time cloud environments are essential for maintaining service quality and operational efficiency.

Machine Learning (ML) has emerged as a promising approach for detecting anomalies in real-time clouds. By analyzing high-dimensional data such as system logs, traces, and performance metrics, ML models can identify complex patterns and deviations in dynamic, large-scale, and heterogeneous environments. However, employing ML in real-time clouds introduces several challenges, including handling sequential performance metrics, where evolving system behaviors cause concept drift, degrading model accuracy and requiring rapid adaptation to maintain low-latency anomaly detection; analyzing distributed traces, where inter-service dependencies and dynamic workloads introduce latency-sensitive bottlenecks, making timely anomaly detection difficult; and detecting anomalies in contextual logs, where log instability, class imbalance, and labeling dependency hinder model learning, further complicating real-time anomaly response under strict time constraints.

This thesis addresses these challenges with three key contributions for real-time cloud environments, where low-latency, adaptive, and scalable anomaly detection is critical. First, we propose a concept drift adaptation algorithm that integrates prediction-driven anomaly detection and adaptive window-based methods. This approach ensures effective handling of concept drift by dynamically adjusting to changes in the data distribution, enhancing detection accuracy over time. Second, we introduce a graph-based learning approach that captures inter-service dependencies while leveraging collaborative learning to reduce computational overhead and enable real-time updates. Third, we present a self-supervised log anomaly detection that adapts to evolving log structures without requiring labeled data, improving detection efficiency in dynamic cloud environments.

Events