Cross-Lingual and Cross-Modal Limitations of Large Language Models
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
Large Language Models (LLMs), including Vision Large Language Models (VLLMs), herald the coming of a new research epoch in machine learning and computational linguistics. Despite most LLMs being predominantly trained on English, their proficiency in various languages has been confirmed by many studies. Nonetheless, critical questions remain about their performance consistency across different languages. A similar concern is raised for VLLMs regarding their performance disparities across various modalities. Moreover, while the remarkable competence of LLMs in solving downstream tasks is widely acknowledged, they still fall short of satisfactory performance in several tasks, requiring further experimentation for deeper insights. In this thesis, we investigate the phenomenon of cross-language generalization in LLMs by employing a novel prompt back-translation method. We investigate the interactions and comparisons between text and image modalities by introducing a new concept called cross-modal consistency and propose a quantitative evaluation framework based on this concept. Additionally, we evaluate the performance of an LLM on two specific linguistic tasks: Lexicalization Generation and Lexical Gap Detection. We have also developed a novel algorithmic approach for comparative analysis. The findings reveal that LLMs face challenges in providing accurate results for translation-variant tasks, reveal a significant inconsistency between vision and language modalities within GPT, and show that ChatGPT underperforms in the two evaluated downstream tasks, being significantly outperformed by our rule-based method.
