Skip to main content
notice

Doctoral Seminar: Majid Laali

April 7, 2016
|


Speaker: Majid Laali

Supervisor: Dr. L. Kosseim

Supervisory Committee:
Drs. S. Bergler, F. Khendek, A. Krzyzak

Title:  Discourse Annotation Projection

Date: Thursday, April 7, 2016

Time: 16:00

Place: EV 3.309

ABSTRACT

An important aspect of natural language understanding and generation involves the recognition and processing of discourse relations. Building applications such as text summarization, question answering and natural language generation needs human language technology beyond the level of the sentence. To address this need, large scale discourse annotated corpora such as the Penn Discourse Treebank (PDTB; Prasad et al., 2008) have been developed.

Manually constructing discourse annotated corpora is expensive, both in terms of time and expertise. As a consequence, such corpora are only available for a few languages. In this report, we propose an approach that automatically creates a PDTB-style discourse annotated corpus from parallel texts. Our approach is based on annotation projection where linguistic annotations are projected from a source language to a target language in parallel texts.

Using English-French parallel texts from the Europarl corpus, we have automatically created a French corpus annotated with discourse connective and discourse relations. Our experiments show that this discourse annotated corpus can be useful for many applications. In particular, the corpus has been used to improve LEXCONN (Roze et al., 2012), the largest lexicon of French discourse connectives. In addition, we also showed how a reliable classifier for the disambiguation of French discourse connectives can be learned from the extracted dataset. Our preliminary results show that the classifier that we developed can disambiguate the French discourse connective ‘aussi’ with an accuracy of 74%. We believe the induced corpus can be helpful for discourse studies in a cross-linguistic perspective and can provide insight on how explicit discourse relations are affected by the translation process.




Back to top

© Concordia University