Chatbots are envisioned to bring about a significant shift in the realm of Software Engineering (SE), enabling practitioners to engage in conversations and interact with various services using natural language. At the heart of each chatbot is a Natural Language Understanding (NLU) component that enables the chatbots to comprehend the user’s queries. However, the NLU requires extensive, high-quality training data (examples) to accurately interpret user queries. Prior work shows that the creation and augmentation of SE datasets are resource-intensive and time-consuming. To address this gap, we explore the potential of using ChatGPT to augment the SE chatbot training dataset. Specifically, we evaluate the impact of retraining the NLU on ChatGPT’s augmented dataset on the NLU’s performance using four widely used SE datasets. Moreover, we assess the syntactic and semantic aspects of the generated examples compared to human-written examples. Additionally, we conduct an ablation study to investigate the impact of each component in the prompt on the NLU’s performance and the diversity of the generated examples. The results show that ChatGPT significantly
improves the NLU’s performance, with F1-score improvements ranging from 3.9% to 11.6%. Moreover, we find that ChatGPT-generated examples exhibit syntactic diversity while maintaining consistent semantics (2.2% on average) across all datasets. Additionally, the results indicate that including a few human-written examples and a description of the intent’s objective in the prompt impacts the quality of the generated examples. Finally, we provide implications for practitioners