Skip to main content
notice

Seminar by Dr. Anthony Kusalik (University of Saskatchewan)

August 17, 2015
|


Speaker: Dr. Anthony Kusalik
                University of Saskatchewan

Title: A Better Sequence-read Simulator Program for Metagenomics

Date: Monday, August 17th, 2015

Time: 2:00PM

Place: EV3.309

ABSTRACT

Background: There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their realism. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. There are few programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data.

Results: We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions.  BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and can extract useful statistics
from the metagenomic data itself, such as quality-error models.  Many existing simulators are specific to a particular sequencing technology.  BEAR is not restricted in this way, and can emulate reads from any sequencing platform. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which few dedicated sequencing simulators are currently available. BEAR is also one of the first metagenomic sequencing simulator programs to automate the process of generating abundances, which can be an arduous task.

Conclusions: BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.

BIO

Dr. Anthony (Tony) Kusalik received his B.Sc. in Mathematics from the University of Lethbridge in Lethbridge, Alberta in 1978.  He subsequently completed M.Sc. and Ph.D. degrees at the University of British Columbia  in 1982 and 1988, respectively.  Dr. Kusalik began a faculty position at the University of Saskatchewan in 1985.  Dr. Kusalik is now a full professor in the Department of Computer Science at the University of Saskatchewan, a member of the university's Division of Biomedical Engineering and School of Public Health, and the Director of the undergraduate bioinformatics program there.  Dr. Kusalik has served on NIH/NIAAD grant review panels, and on program committees of numerous bioinformatics conferences.  His current research interests span many areas of bioinformatics, including genomics, functional genomics, epi-genetics, meta-genomics, and proteomics.  Many of the problems he tackles come from health-related fields such as immunology, vaccinology, and oncology.




Back to top

© Concordia University