EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Sentiment Sensitive Debiasing: A Learning-Based Approach to Remove Ethnic Stereotypes in Word Embeddings

Audhav N Durai, Aditya Vasantharao, Sauman Das

Pages - 26 - 35 | Revised - 31-08-2022 | Published - 01-10-2022

Published in International Journal of Computational Linguistics (IJCL)

Volume - 13 Issue - 3 | Publication Date - October 2022 Table of Contents

MORE INFORMATION

References | Abstracting & Indexing

KEYWORDS

Natural Language Processing, Bias Mitigation, Deep Learning, Word2Vec, Sentiment Analysis.

ABSTRACT

Word vectorization models are used to represent vocabulary in a vector space in a manner that captures semantic relationships between words. However, the state-of-the-art word vectorization models are shown to contain biases in their word embeddings due to ethnic prejudices and under representation in the corpora they are trained on. This paper proposes a novel sentiment sensitive, learning-based debiasing algorithm for multiclass bias mitigation. In this study, this algorithm is used for ethnic debiasing in CBOW Word2Vec models. Unlike other debiasing algorithms, this methodology accounts for the fact that not all ethnic correlations are biased and proper debiasing should also preserve unbiased ethnic information, such as cultural knowledge. Furthermore, it does not require a pre-defined, finite set of correlations to perform debiasing. Rather, models are penalized for making ethnic correlations towards non-neutral words and are allowed to make ethnic correlations towards neutral words, performing a thorough debiasing without losing ethnic knowledge. This study also proposes a new metric to evaluate bias called SMAC (Sentiment-Aware Mean Average Cosine Similarity) which accounts for sentiment in bias measurement. We train both the baseline and debiased CBOW models on the WikiCorpus. The Debiased model achieved are duction in bias by39.48% using the S-MAC metric in comparison to the baseline model.

REFERENCES

Alhazmi, S., Black, W., & McNaught, J. (2013). Arabic SentiWordNet in relation to SentiWordNet 3.0. International Journal of Computational Linguistics (IJCL), 4(1), 1-11.

Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.

Bordia, S., & Bowman, S.R. (2019). Identifying and Reducing Gender Bias in Word-Level Language Models. NAACL.

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), E3635-E3644.

Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., & Ureña-López, L. A. (2021). A survey on bias in deep NLP. Applied Sciences, 11(7), 3184.

Gonen, H., & Goldberg, Y. (2019). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. NAACL.

Hube, C., Idahl, M., & Fetahu, B. (2020, January). Debiasing word embeddings from sentiment associations in names. In Proceedings of the 13th International Conference on Web Search and Data Mining (pp. 259-267).

Hutto, C., & Gilbert, E. (2014, May). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media (Vol. 8, No. 1, pp. 216-225).

Jentzsch, S., Schramowski, P., Rothkopf, C., & Kersting, K. (2019). Semantics derived automatically from language corpora contain human-like moral choices. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 37-44).

Kumar, V., Bhotia, T. S., & Chakraborty, T. (2020). Nurse is closer to woman than surgeon? mitigating gender-biased proximities in word embeddings. Transactions of the Association for Computational Linguistics, 8, 486-503.

Liapakis, A., Tsiligiridis, T., Yialouris, C., & Maliappis, M. (2020). A Corpus Driven, Aspect-based Sentiment Analysis to Evaluate in Almost Real-time, a Large Volume of Online Food & Beverage Reviews. International Journal of Computational Linguistics (IJCL), 11(2), 49-60.

Lu, K., Mardziel, P., Wu, F., Amancharla, P., & Datta, A. (2020). Gender bias in neural natural language processing. In Logic, Language, and Security (pp. 189-202). Springer, Cham.

Manzini, T., Lim, Y. C., Tsvetkov, Y., & Black, A. W. (2019). Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. arXiv preprint arXiv:1904.04047.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).

PopoviÄ‡, R., Lemmerich, F., & Strohmaier, M. (2020, September). Joint multiclass debiasing of word embeddings. In International Symposium on Methodologies for Intelligent Systems (pp. 79-89). Springer, Cham.

Ruggles, S., Flood, S., Goeken, R., Grover, J., Meyer, E., Pacas, J.,& Sobek, M.(2019). IPUMS USA: Version 9.0 [dataset]. IPUMS.

Siddiqui, M. A., Dahab, M. Y., & Batarfi, O. A. (2015). Building a sentiment analysis corpus with multifaceted hierarchical annotation. International Journal of Computational Linguistics (IJCL), 6(2), 11-25.

Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K. W. (2017). Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457.

Zhao, J., Zhou, Y., Li, Z., Wang, W., & Chang, K. W. (2018). Learning gender-neutral word embeddings. arXiv preprint arXiv:1809.01496.

MANUSCRIPT AUTHORS

Mr. Audhav N Durai

Thomas Jefferson High School for Science and Technology, Alexandria, 22312 - United States of America

2023adurai@tjhsst.edu

Mr. Aditya Vasantharao

Thomas Jefferson High School for Science and Technology, Alexandria, 22312 - United States of America

Mr. Sauman Das

Thomas Jefferson High School for Science and Technology, Alexandria, 22312 - United States of America

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS