Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(406.5KB)
This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 74 countries worldwide.
Rule-based Information Extraction from Disease Outbreak Reports
Wafa N. Alshowaib
Pages - 37 - 58     |    Revised - 01-06-2014     |    Published - 01-07-2014
Volume - 5   Issue - 3    |    Publication Date - July 2014  Table of Contents
MORE INFORMATION
KEYWORDS
Information Extraction, Disease Outbreak, Rule-based, NLP.
ABSTRACT
Information extraction (IE) systems serve as the front end and core stage in different natural language programming tasks. As IE has proved its efficiency in domain-specific tasks, this project focused on one domain: disease outbreak reports. Several reports from the World Health Organization were carefully examined to formulate the extraction tasks: named-entities, such as disease name, date and location; the location of the reporting authority; and the outbreak incident. Extraction rules were then designed, based on a study of the textual expressions and elements found in the text that appeared before and after the target text.

The experiment resulted in very high performance scores for all the tasks in general. The training corpora and the testing corpora were tested separately. The system performed with higher accuracy with entities and events extraction than with relationship extraction.

It can be concluded that the rule-based approach has been proven capable of delivering reliable IE, with extremely high accuracy and coverage results. However, this approach requires an extensive, time-consuming, manual study of word classes and phrases.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
1 J. Cowie, and W. Lehnert. (1996, Jan). Information Extraction. Communications of the ACM. [On-line]. 39(1), pp. 8091. Available: http://dl.acm.org/citation.cfm?id=234209 [Apr.16, 2014].
2 A. De Sitter, et al. A formal framework for evaluation of information extraction. Technical report no. 2004-4. University of Antwerp Dept. of Mathematics and Computer Science, 2004.[On-line]. Available: http://wwwis.win.tue.nl/~tcalders/pubs/DESITTERTR04.pdf [Apr. 16,2014].
3 M. Moens. (2006). Information extraction: Algorithms and prospects in a retrieval context.[On-line]. 21. NewYork: Springer, 2006. Available:http://link.springer.com/book/10.1007%2F978-1-4020-4993-4 [Apr. 16, 2014].
4 A. McCallum. (2005, Nov). "Information Extraction: Distilling Structured Data from Unstructured Text". ACM Queue. [On-Line]. 3(9), pp.48 -57. Available:http://dl.acm.org/citation.cfm?id=1105679 [Apr. 16, 2014].
5 S. Acharya, and S. Parija. The Process of Information extraction through natural language processing. International Journal of Logic and Computation. 1(1), pp. 40-51, Oct. 2010.
6 R. Grishman, and B. Sundheim. Message understanding conference - 6: A brief history. In Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, 1996, pp. 466-471.
7 H. Cunningham. Information Extraction, Automatic. in Encyclopedia of language and linguistics, 2nd ed. vol. 5. Amsterdam: Elsevier Science, 2006, pp. 665-677.
8 S. Sarawagi Information extraction. Foundations and Trends Databases, 1(3), pp. 261-377,March. 2008.
9 S. Esparcia, et al. Integrating information extraction agents into a tourism recommender system, In Hybrid Artificial Intelligence Systems, vol. 6077. Springer Berlin Heidelberg,2010, pp.193 200.
10 J. Piskorski, and R. Yangarber. Information extraction: Past, present and future. In Multisource,multilingual information extraction and summarization, Part 1. Springer Berlin Heidelberg, 2013, pp. 23-49.
11 Ahn, D. "The stages of event extraction" . In the Proceedings of the Workshop on Annotating and Reasoning about Time and Events, Sydney, Australia, 2006, pp.1-8.
12 R. Grishman et al. Information extraction for enhanced access to disease outbreak reports. BMC Bioinformatics, 35 (4), pp. 236246, Aug. 2002.
13 W.J. Black et al. Parmenides Technical Report. Internet:http://www.nactem.ac.uk/files/phatfile/cafetiere-report.pdf , Jan. 11, 2005 [Apr. 29, 2013].
14 W.J. Black et al. A data and analysis resource for an experiment in text mining collection of micro-blogs on a political topic. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012, pp. 2083-2088.
15 Maynard, D. et al. "Metrics for Evaluation of Ontology-based Information Extraction." In Proceedings of WWW 2006 Workshop on Evaluation of Ontologies for the Web(EON),2006.
16 M. Keller et al. (2009, Dec.). Automated vocabulary discovery for geo-parsing online epidemic intelligence. Journal of Biomedical Informatics. [On-line]. 10(1): 385. Available:http://www.ncbi.nlm.nih.gov/pubmed/19930702, [Jun. 6,2014].
17 W. Alshowaib. Information Extraction. Master thesis, University of Manchester, U.K., 2013.
Miss Wafa N. Alshowaib
KACST - Saudi Arabia
wafa.cs1@gmail.com