Home   >   CSC-OpenAccess Library   >    Manuscript Information
Rule-based Information Extraction for Airplane Crashes Reports
Sarah H.Alkadi
Pages - 1 - 36     |    Revised - 01-03-2017     |    Published - 01-04-2017
Volume - 8   Issue - 1    |    Publication Date - April 2017  Table of Contents
Information Extraction, Text Mining, NLP, Airplane Crashes, Rule-Based.
Over the last two decades, the internet has gained a widespread use in various aspects of everyday living. The amount of generated data in both structured and unstructured forms has increased rapidly, posing a number of challenges. Unstructured data are hard to manage, assess, and analyse in view of decision making. Extracting information from these large volumes of data is time-consuming and requires complex analysis. Information extraction (IE) technology is part of a text-mining framework for extracting useful knowledge for further analysis.

Various competitions, conferences and research projects have accelerated the development phases of IE. This project presents in detail the main aspects of the information extraction field. It focused on specific domain: airplane crash reports. Set of reports were used from 1001 Crash website to perform the extraction tasks such as: crash site, crash date and time, departure, destination, etc. As such, the common structures and textual expressions are considered in designing the extraction rules.

The evaluation framework used to examine the system's performance is executed for both working and test texts. It shows that the system's performance in extracting entities and relations is more accurate than for events. Generally, the good results reflect the high quality and good design of the extraction rules. It can be concluded that the rule-based approach has proved its efficiency of delivering reliable results. However, this approach does require an intensive work and a cycle process of rules testing and modification.
1 Google Scholar 
2 CiteSeerX 
3 BibSonomy 
4 Doc Player 
5 Scribd 
6 SlideShare 
7 PdfSR 
Alkadi, S. "Information Extraction." Master thesis, University of Manchester, U.K., 2013.
Appelt, D. and Israel, D. (1999) "Introduction to Information Extraction Technology: IJCAI-99 tutorial', [Online] Available from: http://www.ai.sri.com/~appelt/ie-tutorial/IJCAI99.pdf [Accessed on 28/05/2016].
Appelt, D.E. (1999) "Introduction to information extraction", [Online] Available from: http://philarts.spbu.ru/Members/lida_pivovarova/Appelt.pdf. [Accessed on: 3/03/2016].
Ben-Dov, M. and Feldman, R. (2005) "Text Mining and Information Extraction". In: Maimon, O. and Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Springer Science + Business Media, Inc., pp. 801-831.
Black, B. (2007) "Cafetiere Users' Guide". [Accessed on 13/04/2016].
Black, B. (2013) "Cafetiere Users' Guide". [Accessed on 20/06/2016].
Black, W.J., McNaught, J., Vasilakopoulos, A., Zervanou, K., Theodoulidis, B., and Rinaldi, F. (2005) "CAFETIERE: Conceptual Annotations for Facts, Events, Terms, Individual Entities, and RElations", Parmenides Technical Report TR-U4.3.1, [Online] Available from: http://www.nactem.ac.uk/files/phatfile/cafetiere-report.pdf [Accessed on 13/04/2016].
Català, N., Castell, N., and Martín, M. 2000. ESSENCE: A portable methodology for acquiring information extraction patterns. In Proceedings of the 14th European Conference on Artificial Intelligence (ECAI), pp.411-415. [Online] Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi= [Accessed: 28/01/2017].
Chinchor, N. (1992) "MUC-4 Evaluation Metrics", [Online] Available from: http://acl.ldc.upenn.edu /M/M92/M92-1002.pdf. [Accessed: 13/06/2016].
CMS (2008) "Selecting a Development Approach", [Online] Available from: http://www.cms.gov/ Research-Statistics-Data-and-Systems/CMS-Information-Technology /XLC/Downloads/SelectingDevelopmentApproach.pdf. [Accessed: 8/04/2016].
Cowie, J. and Lehnert, W. (1996) "Information Extraction", Communication of the ACM, 39(1), pp. 80-91.
Feldman, R. and Sanger, J. (2007) The Text Mining Handbook: Advanced Approaches In Analyzing Unstructured Data. New York: Cambridge University Press.
Grishman, R. (1997) "Information Extraction: Techniques and Challenges." In: Pazienza, M.T. (ed.) Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology. Berlin, Heidelberg: Springer-Verlag, pp. 10-27.
Jones K.S. and Gallier J.R. (1996) "Evaluating Natural Language Processing Systems: An Analysis and Review", Springer, Berlin.
Kao, A. and Poteet, S.R. (Eds.). (2006)"Natural Language Processing and Text Mining" London, UK: Springer-Verlag, pp. 12-40.
Lehnert, W., Cardie, C., Fisher, D., McCarthy, J., Riloff, E., and Soderland, S. (1994) "Evaluating an Information Extraction System," Journal of Integrated Computer-Aided Engineering, 1(6).
Lewis, David D. (1995). Evaluating and Optimizing Autonomous Text Classification Systems. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York: ACM, pp. 246-254.
McDonald, D., Kelly, U., McNicoll, L. and Weir, G. (2012) "The Value and Benefit of Text Mining to UK Further and Higher Education. Digital Infrastructure." [Online] Available from: http://bit.ly/jisc -textm. [Accessed: 01/03/2016].
Poibeau,T., Saggion, H., Piskorski, J. and Yangarber, R. (2013) "Information Extraction: Past, Present and Future" in Multi-source, Multilingual Information Extraction and Summarization, 1st ed. Berlin: Springer- Verlag Berlin Heidelberg, pp.23-49. [Online] Available from: http://www.springer.com/cda/content/d...1.pdf?SGWID=0-0-45-1342906-p174307561. [Accessed: 2/02/2017].
Redfearn, J., JISC Communications team. and Nactem. (2006), "Text Mining," [Online] Available from: http://www.nactem.ac.uk/files/papers/JISC-BP-TextMining-v1-final.pdf. [Accessed: 28/02/2016].
Riloff, E. and Lorenzen, J. (1998) "Extraction-Based Text Categorization: Generating Domain-Specific Role Relationships Automatically" [Online] Available from: http://www.cs.utah.edu/~riloff/ pdfs/nlp-ir-chapter.pdf. [Accessed: 2/03/2016].
Sitter, A.D., Caldersy, T., and Daelemans, W. (2004) "A Formal Framework for Evaluation of Information Extraction" [Online] Available from: http://wwwis.win.tue.nl/~tcalders/pubs /DESITTERTR04.pdf. [Accessed: 17/04/2016].
Turmo, J., Ageno, A., and Catala, N. (2006) "Adaptive Information Extraction," ACM Computing Surveys, 38(2), pp. 1-47.
Zhong, N., Li, Y., and Wu, S.T. (2012) "Effective Pattern Discovery for Text Mining," IEEE Transaction on Knowledge and Data Engineering, 24(1), pp. 30-44.
Mrs. Sarah H.Alkadi
College of Science and Health Professions /Basic Science Department King Saud bin Abdulaziz University for Health Sciences Riyadh, 14611 - Saudi Arabia

View all special issues >>