notice
Doctoral Seminar: Bahareh Goodarzi
Speaker: Bahareh Goodarzi
Supervisor: Dr. D. Goswami
Supervisory Committee:
Drs. A. Agarwal, H. Harutyunyan, B. Jaumard
Title: Pattern-based Scheduling of Parallel Applications on Heterogeneous Accelerator-based Systems
Date: Thursday, March 24, 2016
Time: 10:15AM
Place: EV 3.309
ABSTRACT
Heterogeneous clusters and multi-core environments are gradually surpassing the homogeneous systems due to their high performance and flexibility. Task scheduling in these systems is an extensively studied subject. However, in a heterogeneous cluster consisting of a set of multi-core CPUs and different accelerators e.g., GPGPUs (General Purpose Graphics Processor Units) or FPGAs, task mapping becomes much more complex due to differences in architectures and programming models among the processors. Consequently, designing a universal scheduler which facilities an efficient distribution of loads by taking full advantage of the processing power of all the available devices in the heterogeneous system is non-trivial.
In this research, we address the scheduling problem in the accelerated-based heterogeneous environments from the perspective of software patterns. We believe that by identifying the algorithmic patterns of different parallel applications and considering the characteristics of underlying heterogeneous architecture, we can design pattern-specific scheduling techniques for a group of applications that belong to the same pattern. The main goal is to maximize the potential computational powers of the all processing units in the heterogeneous environment and minimize the makespan, with minimal user involvement. In similar lines to the use of patterns in software development, we believe that use of patterns in scheduling can facilitate separation of concerns by which a scheduler (for a pattern) can be pre-implemented generically without any application-specific details, thus relieving the user from low-level scheduling burdens.
Previously we designed and implemented an adaptive dynamic scheduler for the independent task-parallel pattern (e.g. task farm) which achieves better to similar performances as compared to some of the well-known scheduling algorithms for the CPU-GPGPU systems. As its next step, in the current work we consider the dependent task-parallel pattern (e.g. a task interaction graph) and design and implement a parallel multilevel graph partitioner for irregular graphs on a CPU-GPU system. The partitioner is an integral part of the scheduler. The partitioner aims to overcome some of the challenges arising due to memory constraints on GPUs and maximizes the utilization of GPU threads through suitable load-balancing schemes. We present a lock-free shared-memory scheme since fine-grained synchronization among thousands of threads imposes too high a performance overhead. The partitioner, implemented in CUDA, outperforms the best serial graph partitioner Metis and parallel MPI-based partitioner ParMetis. It performs similar to the shared-memory CPU-based parallel graph partitioner mt-metis.