EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Joint Alignment of Segmentation and Labelling for Arabic Morphosyntactic Taggers

Abdulrahman Alosaimy, Eric Atwell

Pages - 1 - 12 | Revised - 31-01-2018 | Published - 30-04-2018

Published in International Journal of Computational Linguistics (IJCL)

Volume - 9 Issue - 1 | Publication Date - April 2018 Table of Contents

MORE INFORMATION

References | Abstracting & Indexing

KEYWORDS

Arabic, POS-Tagging, Segmentation, Tokenisation, Morphological Alignment.

ABSTRACT

We present and compare three methods of alignment between morphemes resulting from four different Arabic POS-taggers as well as one baseline method using only provided labels. We combined four Arabic POS-taggers: MADAMIRA (MA), Stanford Tagger (ST), AMIRA (AM), Farasa (FA); and as the target output used two Classical Arabic gold standards: Quranic Arabic Corpus (QAC) and SALMA Standard Arabic Linguistics Morphological Analysis (SAL). We justify why we opt to use label for aligning instead of word form. The problem is not trivial as it is tackling six different tokenisation and labelling standards. The supervised learning using a unigram model scored the best segment alignment accuracy, correctly aligning 97% of morpheme segments. We then evaluated the alignment methods extrinsically, in terms of their effect in improving accuracy of ensemble POS-taggers, merging different combinations of the four Arabic POS-taggers. Using the best approach to align input POS taggers, ensemble tagger has correctly segmented and tagged 88.09% of morphemes. We show how increasing the number of input taggers raise the accuracy, suggesting that input taggers make different errors.

ABSTRACTING & INDEXING

1	Google Scholar

2	BibSonomy

3	ResearchGate

4	Doc Player

5	White Rose Research Online

6	Scribd

7	SlideShare

REFERENCES

Adda, G., J. Mariani, J. Lecomte, P. Paroubek, and M. Rajman. "The GRACE French Part-of-Speech Tagging Evaluation Task.". International Conference on Language Resources and Evaluation, Granada, May. vol. 1 1998, pp. 433-441.

Alabbas, M.A.S. "Textual Entailment for Modern Standard Arabic", 2013.

Alashqar, A.M. "A Comparative Study on Arabic POS Tagging Using Quran Corpus". Informatics and Systems (INFOS), 2012, pp. NLP-29-NLP-33.

Atwell, E., J. Hughes, and C. Souter. "AMALGAM: Automatic Mapping Among Lexico-Grammatical Annotation Models". Proceedings of ACL Workshop on The Balancing Act: Combining Symbolic and Statistical Approaches to Language, 1994, pp. 11-20.

Breiman, L. "Random Forests". Machine Learning. vol. 45, 2001, pp. 5-32.

Diab, M. "Second Generation AMIRA Tools for Arabic Processing: Fast and Robust Tokenization, POS Tagging, and Base Phrase Chunking". ed. by Khalid Choukri and Bente Maegaard. Conference on Arabic Language Resources and Tools, 2009pp. 285-88.

Dukes, K., E. Atwell, and N. Habash. "Supervised Collaboration for Syntactic Annotation of Quranic Arabic". Language Resources and Evaluation, 2013.

Dyer, C., V. Chahuneau, and N.A. Smith. "A Simple, Fast, and Effective Reparameterization of Ibm Model 2", 2013.

Hughes, J., C. Souter, and E. Atwell. "Automatic Extraction of Tagset Mappings from Parallel-Annotated Corpora", 1995, pp. 8.

Katz, S., L. Lamel, and G. Adda. "Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer". IEEE Transactions on Acoustics, Speech, and Signal Processing. vol. 35, 1987,pp. 400-401.

M. Kurimo S. Virpioja, V.T.E.A. "Overview and Results of Morpho Challenge 2009". Access Evaluation, 2009.

Needleman, S.B., and C.D. Wunsch. "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins". Journal of Molecular Biology. vol. 48, 1970, pp. 443-53.

Paroubek, P. "Evaluating Part-of-Speech Tagging and Parsing Patrick Paroubek". Evaluation of Text and Speech Systems, 2007.

Pasha, A., M. Al-Badrashiny, M. Diab, A. El Kholy, R. Eskander, N. Habash, and others. "Madamira: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic". in Proceedings of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland, 2014.

Sawalha, M., E. Atwell, and M. a M. Abushariah. "SALMA: Standard Arabic Language Morphological Analysis". 2013 1st International Conference on Communications, Signal Processing and Their Applications, ICCSPA, 2013, 2013.

Toutanova, K., D. Klein, and C.D. Manning. "Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network". In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 (NAACL '03), 2003, pp. 252-59.

Zhang, Y., C. Li, R. Barzilay, and K. Darwish. "Randomized Greedy Inference for Joint Segmentation, POS Tagging and Dependency Parsing". Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, pp. 42-52.

MANUSCRIPT AUTHORS

Mr. Abdulrahman Alosaimy

University of Leeds - United Kingdom

scama@leeds.ac.uk

Professor Eric Atwell

- United Kingdom

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS