Skip to main content
notice

Master Thesis Defense: Mohammadreza Ebrahimi

April 8, 2016
|


Speaker: Mohammadreza Ebrahimi

Supervisors: Drs. O. Ormandieva, C. Y. Suen

Examining Committee:
Drs. L. Kosseim, R. Witte, C. Constantinides (Chair)

Title:  Automatic Identification of Online Predators in Chat Logs by Anomaly Detection and Deep Learning

Date: Friday, April 8, 2016

Time: 10:00 a.m.

Place: EV 2.260

ABSTRACT

Providing a safe environment for juveniles and children in online social networks is considered as one of the major factors of improving public safety. Due to the prevalence of the online conversations, mitigating the undesirable effects of juvenile abuse in cyber space has become inevitable. Using automatic ways to combat this kind of crime is challenging and demands efficient and scalable data mining techniques. The problem can be casted as a combination of textual preprocessing in data/text mining and binary classification in machine learning. This thesis proposes two machine learning approaches to deal with the following two issues in the domain of online predator identification: 1) The first problem is concerned with unavailability of positive samples which cannot be easily accessed due to the nature of the problem. This problem is addressed by applying an existing method for semi-supervised anomaly detection that allows the training process based on only one class label. The method was tested on two datasets (PAN and SQ datasets); 2) The second issue is improving the performance of current binary classification methods in terms of classification accuracy and F1-score. In this regard, we have customized a deep learning approach called Convolutional Neural Network to be used in this domain. Using this approach, we show that the classification performance (F1-score) is improved by almost 1.7% compared to the current common binary classification method (Support Vector Machine). Two different datasets were used in the empirical experiments: PAN-2012 and SQ. The former is a huge public dataset that has been used extensively in the literature and the latter is a small dataset collected from Sûreté de Quebec police department.




Back to top

© Concordia University