This thesis aims to explore the relationship between discourse information and the CEFR-level (Common European Framework of Reference for Languages) in argumentative English learner essays. The study leverages two prominent frameworks: the Rhetorical Structure Theory (RST) and the Penn Discourse TreeBank (PDTB), to analyze essays obtained from The International Corpus Network of Asian Learners (ICNALE) and the Corpus and Repository of Writing (CROW). The research investigates the influence of different discourse relations and connectives on the language proficiency level of the writers, and further explores the potential of using discourse information as additional features for automated CEFR-level determination.
The analysis of the collected essays reveals significant findings regarding the utilization of discourse relations by English learners. Notably, the RST relations of EXPLANATION and BACKGROUND are statistically used more often by writers with a CEFR level below fluency. In addition, as the CEFR level increases, the use of the PDTB relation of CONTINGENCY decreases. These results provide empirical evidence of the relationship between discourse relations and language proficiency, highlighting the differential usage patterns among learners at various CEFR levels.
To validate these findings computationally, discourse relations and connectives are employed as supplementary features for machine learning models. The experimental results indicate that incorporating discourse information into the automated CEFR-level determination process leads to a mild increase in performance compared to relying solely on lexical and grammatical features. However, it is important to note that the proposed approach does not outperform the use of large language models, such as RoBERTa, which have demonstrated superior performance in various natural language processing tasks.
Nevertheless, this study contributes valuable insights into the relationship between discourse relations and argumentative English learner essays. The findings highlight the potential influence of discourse relations on language proficiency and suggest avenues for further research and development in language assessment methodologies.