Skip to main content
notice

Seminar by Dr. Faraz Hach (Simon Fraser University)

February 12, 2016
|


Speaker: Dr. Faraz Hach
                Simon Fraser University

Title: Scalable mapping and compression of high throughput sequencing data

Date: Friday, February 12th, 2016

Time: 10:30AM - 12PM

Place: EV1.162

ABSTRACT

Modern high-throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing, downstream analysis, and computational infrastructure. HTS has become an invaluable technology for many applications, e.g. the detection of single-nucleotide polymorphisms and structural variations. In most of these applications, mapping sequenced "reads" to their potential genomic origin is the first fundamental step for subsequent analyses. Many tools have been developed to address this problem. Because of the large amount of HTS data availability, much emphasis has been placed on speed and memory. In fact, as HTS data grow in size, data management and storage are becoming major logistical obstacles for adopting HTS-platforms. One way to solve storage requirements for HTS data is compression. Currently, most HTS data is compressed through general purpose algorithms such as gzip. These algorithms are not specifically designed for compressing data generated by the HTS-platforms. Recently, a number of fast and efficient compression algorithms have been designed specifically for HTS data to address some of the issues in data management, storage and communication. In this talk, I will address about both of these computational problems: Sequence Mapping and Sequence Compression. I will describe two novel methods, mrsFAST and mrsFAST-Ultra, for mapping HTS short-reads to the reference genome. These methods are cache oblivious and guarantee perfect sensitivity. Both are specifically designed to address the bottleneck of multi-mapping for the purpose of structural variation detection. Moreover, I will show how these tools are employed in a recent study to analyze TCGA and CPTAC consortium data. Finally, I will address the storage and communication problems in HTS data by introducing SCALCE, a "boosting" scheme based on Locally Consistent Parsing technique. SCALCE re-orders the data in order to increase the locality of reference and subsequently improve the performance of well-known compression methods in terms of speed and space.

BIO

Faraz Hach is a University Research Associate in the School of Computing Science at Simon Fraser University and Visiting Research Scientist at the Vancouver Prostate Center. In addition, he is a member of Standards Council of Canada (SCC) in MPEG-Genome Compression Group. In 2013 he received his PhD from Simon Fraser University. Faraz’s core research is focused on designing novel and high performance algorithms for analyzing next-generation sequencing data. His works have been internationally recognized with several awards and honors. Among these include the Ian Lawson Van Toch Memorial Award for outstanding student paper at the 20th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB2012) and the Governor General’s Gold Medal for the best doctoral thesis award from Simon Fraser University in 2014.




Back to top

© Concordia University