notice
Master Thesis Defense: Soudabeh Barghi
Speaker: Soudabeh Barghi
Supervisor: Dr. T. Glatard
Examining Committee: Drs. G. Butler, A. Krzyzak, J. Yang (Chair)
Title: Predicting Computational Reproducibility of Data Analysis Pipelines in Large Population Studies Using Collaborative Filtering
Date: Monday, November 19, 2018
Time: 12:30PM
Place: EV 11.119
ABSTRACT
Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training
set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, “Random File Numbers (Uniform)” is able to predict computational reproducibility with a good accuracy. We also analyse the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speed-up reproducibility evaluations substantially, with a reduced accuracy loss.