Home   >   CSC-OpenAccess Library   >    Manuscript Information
A Comparative Evaluation of POS Tagging and N-Gram Measures in Arabic Corpus Resources and Tools
Sultan Almujaiwel
Pages - 1 - 17     |    Revised - 31-01-2020     |    Published - 29-02-2020
Volume - 11   Issue - 1    |    Publication Date - February 2020  Table of Contents
Arabic Corpus Resources, Arabic Corpus Analysis Tools, Corpus Linguistics, Confusion Matrices, Association Algorithms.
The purpose of this evaluation is twofold: an overview of the extent to which the functioning of the large-scale Arabic corpus resources examined serves the criteria of parts-of-speech tagging in the corpus design of linguistic data and to evaluate Arabic corpus analysis tools in terms of natural language processing statistics. The confusion matrix statistical method shows that some Arabic monitor corpora need further development, and the International Corpus of Arabic scores high levels on confusion matrices. There are nine Arabic corpus analysis tools under evaluation, and the attested reliable statistical outcomes are retrieved in terms of statistical algorithms for association measures. This is done by relying on one million empirically designated clean Arabic data to evaluate the association measures among the nine Arabic corpus analysis tools. The results presented at the end of this article indicate that the limitations could be tackled by evaluating the Arabic monitor Corpus resources rather than trusting them, and by implementing the new forms of programming rather than depending on the already-built natural Arabic language resources and tools.
Associate Professor Sultan Almujaiwel
College of Arts/Arabic Language Department, King Saud University, Riyadh, Saudi Arabia - Saudi Arabia