Skip to main content
Thesis defences

PhD Oral Exam - Alaa Alsaig, Software Engineering

A Tight Coupling Context-Based Framework for Dataset Discovery


Date & time
Monday, May 15, 2023
2 p.m. – 4 p.m.
Cost

This event is free

Organization

School of Graduate Studies

Contact

Daniela Ferrer

Where

Online

When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.

Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.

Abstract

Discovering datasets of relevance to meet research goals is at the core of different analysis tasks in order to prove proposed hypothesis and theories. In particular, researchers in Artificial Intelligence (AI) and Machine Learning (ML) research domains where relevant datasets are essential for precise predictions have identified how the absence of methods to discover quality datasets are leading to delay and in many cases failure, of ML projects. Many research reports have brought out the absence of dataset discovery methods that fills the gap between analysis requirements and available datasets, and have given statistics to show how it hinders the process of analysis, with completion rate less than 2%. To the best of our knowledge, removing the above inadequacies remains “an open problem of great importance”. It is in this context that the thesis is making a contribution on context-based tightly coupled framework that will tightly couple dataset providers and data analytic teams. Through this framework, dataset providers publish the metadata descriptions of their datasets and analysts formulate and submit rich queries with goal specifications and quality requirements. The dataset search engine component tightly couples the query specification with metadata specifications datasets through a formal contextualized semantic matching and quality-based ranking and discover all datasets that are relevant to analyst requirements. The thesis gives a proof of concept prototype implementation and reports on its performance and efficiency through a case study.

Back to top

© Concordia University