Home   >   CSC-OpenAccess Library   >    Manuscript Information
Sentiment Sensitive Debiasing: A Learning-Based Approach to Remove Ethnic Stereotypes in Word Embeddings
Audhav N Durai, Aditya Vasantharao, Sauman Das
Pages - 26 - 35     |    Revised - 31-08-2022     |    Published - 01-10-2022
Volume - 13   Issue - 3    |    Publication Date - October 2022  Table of Contents
MORE INFORMATION
KEYWORDS
Natural Language Processing, Bias Mitigation, Deep Learning, Word2Vec, Sentiment Analysis.
ABSTRACT
Word vectorization models are used to represent vocabulary in a vector space in a manner that captures semantic relationships between words. However, the state-of-the-art word vectorization models are shown to contain biases in their word embeddings due to ethnic prejudices and under representation in the corpora they are trained on. This paper proposes a novel sentiment sensitive, learning-based debiasing algorithm for multiclass bias mitigation. In this study, this algorithm is used for ethnic debiasing in CBOW Word2Vec models. Unlike other debiasing algorithms, this methodology accounts for the fact that not all ethnic correlations are biased and proper debiasing should also preserve unbiased ethnic information, such as cultural knowledge. Furthermore, it does not require a pre-defined, finite set of correlations to perform debiasing. Rather, models are penalized for making ethnic correlations towards non-neutral words and are allowed to make ethnic correlations towards neutral words, performing a thorough debiasing without losing ethnic knowledge. This study also proposes a new metric to evaluate bias called SMAC (Sentiment-Aware Mean Average Cosine Similarity) which accounts for sentiment in bias measurement. We train both the baseline and debiased CBOW models on the WikiCorpus. The Debiased model achieved are duction in bias by39.48% using the S-MAC metric in comparison to the baseline model.
Alhazmi, S., Black, W., & McNaught, J. (2013). Arabic SentiWordNet in relation to SentiWordNet 3.0. International Journal of Computational Linguistics (IJCL), 4(1), 1-11.
Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.
Bordia, S., & Bowman, S.R. (2019). Identifying and Reducing Gender Bias in Word-Level Language Models. NAACL.
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), E3635-E3644.
Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., & Ureña-López, L. A. (2021). A survey on bias in deep NLP. Applied Sciences, 11(7), 3184.
Gonen, H., & Goldberg, Y. (2019). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. NAACL.
Hube, C., Idahl, M., & Fetahu, B. (2020, January). Debiasing word embeddings from sentiment associations in names. In Proceedings of the 13th International Conference on Web Search and Data Mining (pp. 259-267).
Hutto, C., & Gilbert, E. (2014, May). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media (Vol. 8, No. 1, pp. 216-225).
Jentzsch, S., Schramowski, P., Rothkopf, C., & Kersting, K. (2019). Semantics derived automatically from language corpora contain human-like moral choices. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 37-44).
Kumar, V., Bhotia, T. S., & Chakraborty, T. (2020). Nurse is closer to woman than surgeon? mitigating gender-biased proximities in word embeddings. Transactions of the Association for Computational Linguistics, 8, 486-503.
Liapakis, A., Tsiligiridis, T., Yialouris, C., & Maliappis, M. (2020). A Corpus Driven, Aspect-based Sentiment Analysis to Evaluate in Almost Real-time, a Large Volume of Online Food & Beverage Reviews. International Journal of Computational Linguistics (IJCL), 11(2), 49-60.
Lu, K., Mardziel, P., Wu, F., Amancharla, P., & Datta, A. (2020). Gender bias in neural natural language processing. In Logic, Language, and Security (pp. 189-202). Springer, Cham.
Manzini, T., Lim, Y. C., Tsvetkov, Y., & Black, A. W. (2019). Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. arXiv preprint arXiv:1904.04047.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
Popović, R., Lemmerich, F., & Strohmaier, M. (2020, September). Joint multiclass debiasing of word embeddings. In International Symposium on Methodologies for Intelligent Systems (pp. 79-89). Springer, Cham.
Ruggles, S., Flood, S., Goeken, R., Grover, J., Meyer, E., Pacas, J.,& Sobek, M.(2019). IPUMS USA: Version 9.0 [dataset]. IPUMS.
Siddiqui, M. A., Dahab, M. Y., & Batarfi, O. A. (2015). Building a sentiment analysis corpus with multifaceted hierarchical annotation. International Journal of Computational Linguistics (IJCL), 6(2), 11-25.
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K. W. (2017). Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457.
Zhao, J., Zhou, Y., Li, Z., Wang, W., & Chang, K. W. (2018). Learning gender-neutral word embeddings. arXiv preprint arXiv:1809.01496.
Mr. Audhav N Durai
Thomas Jefferson High School for Science and Technology, Alexandria, 22312 - United States of America
2023adurai@tjhsst.edu
Mr. Aditya Vasantharao
Thomas Jefferson High School for Science and Technology, Alexandria, 22312 - United States of America
Mr. Sauman Das
Thomas Jefferson High School for Science and Technology, Alexandria, 22312 - United States of America