EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

An Empirical Study On The Holy Quran Based On A Large Classical Arabic Corpus

Maha Alrabiah, Nawal Alhelewh, AbdulMalik Al-Salman, Eric Atwell

Pages - 1 - 13 | Revised - 31-03-2014 | Published - 30-04-2014

Published in International Journal of Computational Linguistics (IJCL)

Volume - 5 Issue - 1 | Publication Date - April 2014 Table of Contents

MORE INFORMATION

References | Cited By (5) | Abstracting & Indexing

KEYWORDS

Distributional Lexical Semantics, Quran, Classical Arabic Corpus, Collocation Extraction, Association Measures.

ABSTRACT

Distributional semantics is one of the empirical approaches to natural language processing and acquisition, which is mainly concerned by modeling word meaning using words distribution statistics gathered from huge corpora. Many distributional semantic models are available in the literature, but none of them have been applied so far to the Quran nor to Classical Arabic in general. This paper reports the construction of a very large corpus of Classical Arabic that will be used as a base to study distributional lexical semantics of the Quran and Classical Arabic. It also reports the results of two empirical studies; the first is applying a number of probabilistic distributional semantic models to automatically identify lexical collocations in the Quran and the other is applying those same models on the Classical Arabic corpus in an attempt to test their ability of capturing lexical collocations and co occurrences for a number of the corpus words. Results show that the MI.log_freq association measure achieved the highest results in extracting significant co-occurrences and collocations from small and large Classical Arabic corpora, while mutual information association measure achieved the worst results.

CITED BY (5)

1	Dzulkifli, M. A., bin Abdul Rahman, A. W., Badi, J. A. B., & Solihu, A. K. H. (2016). Routes to Remembering: Lessons from al Huffaz. Mediterranean Journal of Social Sciences, 7(3 S1), 121.

2	Siddiqui, M. A., Dahab, M. Y., & Batarfi, O. A. (2015). Building A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation.

3	Alrabiah, M., Al-Salman, A., & Atwell, E. (2014, October). The refined MI: A significant improvement to mutual information. In Asian Language Processing (IALP), 2014 International Conference on (pp. 132-135). IEEE.

4	Atwell, E., & Alfaifi, A. Arabic corpus linguistics research at the University of Leeds.

5	Alrabiah, M., Al-salman, A., & Atwell, E. A New Distributional Semantic Model for Classical Arabic.

ABSTRACTING & INDEXING

1	Google Scholar

2	CiteSeerX

3	refSeek

4	Scribd

5	SlideShare

6	PdfSR

REFERENCES

A. Elewa, "Did they translate the Qur'an or its exegesis?." 3rd Languages and Translation Conference and Exhibition on Translation and Arbization in Saudi Arabia, Riyadh, Saudi Arabia, 2009.

A. Ibn Ashoor, Al-Tahreer wa Al-tanweer, in Arabic, Dar Sahnoon, Tunisia, 1997.

A. Saif, and M. Ab Aziz, "An Automatic Collocation Extraction from Arabic Corpus." Journal of Computer Science, vol. 7, pp. 6-11, 2011.

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval.Cambridge, UK: Cambridge University Press, 2008.

C.D. Manning, and H. Schuetze, Foundations of Statistical Natural Language Processing,1st ed., The MIT Press, 1999.

H. Duhainah, "Linguistic Collocations and Their Significance in Determining The Semantics of The Holy Quran A Theoretical and Applied Study." in Arabic, PhD dissertation, Al-Azhar University, Cairo, Egypt, 2007.

H. Rubenstein, and J. Goodenough, "Contextual correlates of synonymy." Communications of the ACM, vol. 8, pp. 627–633, 1965.

I. Bounhas, and Y. Slimani, "A hybrid approach for Arabic multi-word term extraction." In IEEE, pp. 1-8, 2009.

J. Sinclair, "Corpus and Text - Basic Principles." In Developing Linguistic Corpora: a Guide to Good Practice, ed. M. Wynne. Oxford: Oxbow Books, 2005.

J. Weeds, and D. Weir, “Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity.” Computational Linguistics, vol. 31(4), pp. 439-475, 2005.

K. Church, W. Gale, P. Hanks, and D. Hindle, "Using statistics in lexical analysis." In: Uri Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. Lawrence Erlbaum, New Jersey, pp. 115-164, 1991.

K. Dukes, and N. Habash, (2010). "Morphological annotation of Quranic Arabic." The seventh international conference on Language Resources and Evaluation (LREC-2010),Valletta, Malta, 2010.

L. Al-Sulaiti, and E. Atwell, "The design of a corpus of contemporary Arabic." International Journal of Corpus Linguistics, vol. 11, pp. 135-171, 2006.

L. Burnard, "British National Corpus: User's reference guide for the British National Corpus".Oxford, Oxford University Computing Service, 1995.

M. Alrabiah, A. Al-Salman and E. Atwell, “The design and construction of the 50 million words KSUCCA King Saud University Corpus of Classical Arabic”, In Second Workshop on Arabic Corpus Linguistics (WACL-2), Monday 22nd July 2013, Lancaster University, UK,2013.

M. Eid, Manifestations Emerging on Arabic. in Arabic, A'alam Alkutub, Cairo, pp. 20, 1980.

M. Sahlgren, "The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces." Ph.D. dissertation, Department of Linguistics, Stockholm University, 2006.

P. Pantel, "Inducing ontological co-occurrence vectors." In Proceedings of the 43rd Conference of the Association for Computational Linguistics, ACL’05, pp. 125–132, 2005.

P. Rychly, "A lexicographer-friendly association score". In Sojka, P. & Horák, A. (eds.)Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2008 , 6-9. Brno: Masaryk University, 2008.

S. Boulaknadel, B. Daille and D. Aboutajdine, “A multi-word term extraction program for Arabic language”, the 6th international Conference on Language Resources and Evaluation LREC 2008, Marrakech, Morocco, pp. 1485-1488, 2008.

S. Gries, "Useful statistics for corpus linguistics." In Aquilino Sánchez & Moisés Almela (ed.),A mosaic of corpus linguistics: selected approaches, pp. 269-291, 2010.

S. Johansson, E. Atwell, R. Garside and G. Leech, "The Tagged LOB Corpus: Users' manual." ICAME, The Norwegian Computing Centre for the Humanities, Bergen University,Norway, 1986.

T. Dunning, "Accurate methods for the statistics of surprise and coincidence." Computational Linguistics, vol. 19, pp. 61-74, 1993.

W. N. Francis, and H. Kucera, "Brown Corpus Manual: Manual Of Information To Accompany A Standard Corpus of Present-Day Edited American English, for use with Digital Computers." Internet: http://khnt.hit.uib.no/icame/manuals/brown/INDEX.HTM [Feb. 20,2014].

Z. Harris, and H. Hiz, "Papers on syntax", Springer, pp. 3-22, 1981.

MANUSCRIPT AUTHORS

Mr. Maha Alrabiah

King Saud University - Saudi Arabia

msrabiah@gmail.com

Associate Professor Nawal Alhelewh

Princess Nora bint Abdul Rahman University - Saudi Arabia

Professor AbdulMalik Al-Salman

King Saud University - Saudi Arabia

Associate Professor Eric Atwell

Leeds University - United Kingdom

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS