When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Abstract
The evolution of streaming data during long periods of time presents significant challenges for maintaining the accuracy and efficiency of predictive models due to concept drift — where changes in data distribution can lead to performance degradation. In this research, we study the problems of concept drift detection (CDD) and adaptation (CDA). Unlike traditional approaches that treat CDD and CDA independently and in isolation, often under non-streaming, static conditions, we propose a novel methodology based on multivariate vector error-correction analysis of feature importance measures (FIMs). The FIMs provided a solid foundation that allowed us to reformulate concept drift detection and adaptation in streaming data.
We additionally introduce, formalize, and develop the notion of concept drift resolution (CDR) as an innovative model preference technique. This solution further enhances the overall performance by effectively using multiple models undergoing concept drift, including the main learner and the proposed CDA model. The results of our numerous experiments and analyses indicate that the proposed CDD method significantly reduces computation time, particularly in applications experiencing abrupt drifts, while our CDA model delivers notable improvements in prediction accuracy and F1 score on both gradual drift and abrupt drift datasets, outperforming existing methods on varying drift rates and characteristics of concept drift.
By utilizing FIMs as a common basis, we develop a unified framework that integrates CDD, CDA, and CDR tasks, thus bridging the gap between detection and adaptation. Extensive experiments validate the effectiveness of our proposed methods, demonstrating their applicability in various real-world and synthetic benchmark datasets. This work not only advances the understanding of concept drift in streaming data but also provides a general solution framework that balances performance with interpretability, thus paving the way for development of more reliable and explainable data-driven applications and systems.