Research Authors & Affiliations
Hasin Almas Sifat1, Koushik Biswas Arko1, Abida Afrin1, K. M. Tahsin Kabir1, Abdullah Rakib Akand2, & Md. Mortuza Ahmmed3
1Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh
2Department of Computer Science and Engineering, Asian University of Bangladesh, Dhaka, Bangladesh
3Department of Mathematics, American International University-Bangladesh, Dhaka, Bangladesh
Conference Information
Published in: 2025 IEEE WIECON-ECE
Publisher: IEEE | Location: Dhaka, Bangladesh
Conference Date: 21-22 December 2025
Added to IEEE Xplore: 27 May 2026
DOI: 10.1109/WIECON-ECE69386.2025.11525909
Abstract
Code-mixed languages, especially Banglish (a mix of Bengali and English), are extremely difficult for natural language processing (NLP) due to various syntactic irregularities, lexical borrowing, and regional variation. Current methods generally only address isolated tasks like translation or region classification and frequently do not incorporate features of regional speech that impact model capabilities. In this work, we present a RegionAware Multi-Task Transformer that jointly performs region classification and translation quality prediction on BanglishEnglish parallel data. The model combines separate Banglish and English BERT encoders, pooling via attention, and cross-attention fusion to represent both intra-lingual and cross-lingual contextual dependencies. Our model achieves 83 % accuracy and a macro F1 score of 0.84 for region classification, while the translation quality prediction task achieves a Pearson correlation of 0.78. Both models significantly outperform traditional baselines based on machine learning (TF-IDF + Logistic Regression) and neural sequence models (BiLSTM). The results illustrate that regionaware multi-task learning improves representation learning and enhances generalization through the region's dialectal variation in Banglish. The research presented in this paper takes one step closer towards building contextualized and robust NLP systems for low-resource, code-mixed languages.
IEEE XPLORE | NATURAL LANGUAGE PROCESSING (NLP) | 2026