Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(483.15KB)
This is an Open Access publication published under CSC-OpenAccess Policy.
Building A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation
Muazzam Ahmed Siddiqui, Mohamed Yehia Dahab, Omar Abdullah Batarfi
Pages - 11 - 25     |    Revised - 31-05-2015     |    Published - 30-06-2015
Volume - 6   Issue - 2    |    Publication Date - May / June 2015  Table of Contents
MORE INFORMATION
KEYWORDS
Multifaceted Text Categorization, Hierarchical Text Categorization, Sentiment Analysis, Corpus Linguistics, Arabic Natural Language Processing, Text Mining.
ABSTRACT
A corpus is a collection of documents. An annotated corpus consists of documents or entities annotated with some task related labels such as part of speech tags, sentiment etc. While it is customary to annotate a document for a specific task, it is also possible to annotate it for multiple tasks, resulting in a multifaceted annotation scheme. These annotations can be organized in a hierarchical fashion, if such a scheme naturally occurred in the data, resulting in a hierarchical text categorization problem. We developed a multifaceted, multilingual corpus for hierarchical sentiment analysis. The different facets include hierarchical nominal sentiment labels, a numerical sentiment score, language, and the dialect. Our corpus consists of 191K reviews of hotels in Saudi Arabia. The reviews are divided into eleven different categories. Within each category, the reviews are further divided into two positive and negative categories. The corpus contains 1.8 million tokens. Reviews are mostly written in Arabic and English but there are instances of other languages too.
CITED BY (0)  
1 Google Scholar
2 CiteSeerX
3 refSeek
4 Scribd
5 slideshare
6 PdfSR
1 A. D. Gordon, "A Review of Hierarchical Classification," Journal of the Royal Statistical Society. Series A (General), vol. 150, no. 2, pp. 119-137, 1987.
2 "DMOZ," Open Directory Project, [Online]. Available: http://www.dmoz.org/. [Accessed 20 4 2015].
3 "Internet Public Library," ipl2, [Online]. Available: http://www.ipl.org/. [Accessed 20 4 2015].
4 W. Dakka, P. Ipeirotis and K. Wood, "Automatic construction of multifaceted browsing interfaces," in Proceedings of the 14th ACM international conference on Information and knowledge management , 2005.
5 C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Pres, 1999.
6 M. Diab, N. Habash, O. Rambow, M. Altantawy and Y. Benajiba, COLABA: Arabic dialect annotation and processing., LREC Workshop on Semitic Language Processing, 2010.
7 H. Elfardy and M. Diab, "Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations.," in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), 2012.
8 R. Al-Sabbagh and R. Girju, "YADAC: Yet another Dialectal Arabic Corpus," in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), 2012.
9 R. Cotterell and C. Callison-Burch, "A multi-dialect, multi-genre corpus of informal written Arabic," in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014.
10 O. Zaidan and C. Callison-Burch, "The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2 (HLT '11), Vol. 2, 2011.
11 H. Elfardy, M. Al-Badrashiny and M. Diab, "Code Switch Point Detection in Arabic," in Natural Language Processing and Information Systems, Springer, 2013, pp. 412-416.
12 M. Alrabiah , A. Al-Salman, A. Al-Salman and E. Atwell, "An Empirical Study On The Holy Quran Based On A Large Classical Arabic Corpus," International Journal of Computational Linguistics (IJCL), vol. 5, no. 1, pp. 1-13, 2014.
13 M. Abdul-Mageed, M. Diab and S. Kubler, "SAMAR: Subjectivity and sentiment analysis for Arabic social media.," Computer Speech & Language, vol. 28, no. 1, pp. 20-37, 2014.
14 J. Wiebe, T. Wilson and C. Cardie, "Annotating Expressions of Opinions and Emotions in Language," Language Resources and Evaluation, vol. 39, no. 2-3, pp. 165-210, 2005.
15 A. Abbasi, C. Hsinchun and A. Salem, "Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums.," ACM Transactions on Information Systems, vol. 26, no. 2, 2008.
16 M. Abdul-Mageed and M. Diab, "Subjectivity and sentiment annotation of modern standard arabic newswire.," in Proceedings of the 5th Linguistic Annotation Workshop (LAW V '11), 2011.
17 F. Mahyoub, M. Siddiqui and M. Dahab, "Building an Arabic Sentiment Lexicon Using Semisupervised Learning," Journal of King Saud University - Computer and Information Sciences, vol. 26, no. 4, pp. 417-424, 2014.
18 S. Alhazmi, W. Black and J. McNaught, "Arabic SentiWordNet in Relation to SentiWordNet 3.0," International Journal of Computational Linguistics (IJCL), vol. 4, no. 1, pp. 1-11, 2013.
19 J. Garrett, "Ajax: A New Approach to Web Applications," 18 2 2005. [Online]. Available: http://www.adaptivepath.com/ideas/ajax-new-approach-web-applications/. [Accessed 20 4 2015].
20 A. Pasha, M. Al-Badrashiny, M. Diab, A. El Kholy, R. Eskander, N. Habash, M. Pooleery, O. Rambow and R. Roth, "MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic," in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014.
21 "Arabic numerals," Wikipedia, [Online]. Available: http://en.wikipedia.org/wiki/Arabic_numerals. [Accessed 20 4 2015].
22 "Eastern Arabic numerals," Wikipedia, [Online]. Available: http://en.wikipedia.org/wiki/Eastern_Arabic_numerals. [Accessed 20 4 2015].
23 A. Kapiszewski, Arab Vs Asian Migrant Workers in the GCC countries, 2006.
Dr. Muazzam Ahmed Siddiqui
King Abdulaziz University - Saudi Arabia
maasiddiqui@kau.edu.sa
Dr. Mohamed Yehia Dahab
Department of Computer Science Faculty of Computing and Information Technology King Abdulaziz University Saudi Arabia - Saudi Arabia
Dr. Omar Abdullah Batarfi
Department of Information Technology Faculty of Computing and Information Technology King Abdulaziz University Saudi Arabia - Saudi Arabia