|
| Similarity-Based Estimation for Document Summarization using Fuzzy Sets
|
|
Full
text: |
PDF(135.3KB) |
|
|
Source |
International Journal of Computer Science and Security (IJCSS) |
|
Table of Contents |
|
|
Download
Complete Issue PDF(1.01MB) |
|
Volume: 1 Issue: 4 |
| |
Pages: 1-47 |
|
Publication
Date: December 2007 |
|
ISSN
(Online): 1985-1553 |
|
|
|
|
|
Pages |
1 - 12 |
|
Author(s) |
|
|
|
Published
Date |
30-12-2007 |
|
Publisher |
CSC
Journals, Kuala Lumpur,
Malaysia |
|
ADDITIONAL
INFORMATION |
| Keywords Abstract References Cited by Related Articles Collaborative
Colleague |
| |
|
| |
KEYWORDS: fuzzy sets, mass assignment, asymmetric word similarity, topic similarity, summarization |
|
|
| |
|
|
| This Manuscript is indexed in the following databases/websites:- |
|
| 1. Directory of Open Access Journals (DOAJ) |
| 2. Docstoc |
| 3. Scribd |
| 4. PDFCAST |
| 5. Google Scholar |
| 6. WorldCat |
| 7. ScientificCommons |
| 8. Bielefeld Academic Search Engine (BASE) |
| 9. ResearchGATE |
| 10. iSEEK |
| 11. Microsoft Academic Search |
| 12. Academic Journals Database |
| 13. Libsearch |
| 14. slideshare |
| |
|
| |
|
|
| Information is increasing every day and thousands of documents are produced
and made available in the Internet. The amount of information available in
documents exceeds our capacity to read them. We need access to the right
information without having to go through the whole document. Therefore,
documents need to be compressed and produce an overview so that these
documents can be utilized effectively. Thus, we propose a similarity model with
topic similarity using fuzzy sets and probability theories to extract the most
representative sentences. Sentences with high weights are extracted to form a
summary. On average, our model (known as MySum) produces summaries that
are 60% similar to the manually created summaries, while tf.isf algorithm
produces summaries that are 30% similar. Two human summarizers, named P1
and P2, produce summaries that are 70% similar to each other using similar sets
of documents obtained from TREC. |
| |
|
| |
|
| |
| 1 |
K. Sparck Jones. “Automatic Summarizing: Factors and Directions”. In I. Mani and M.T. Maybury, Editors, Advances in Automatic Text Summarization, Cambridge, MA: The MIT Press, pp 1-12, 1999 |
|
|
| 2 |
S.H. Lo, H. Meng, and W. Lam. “Automatic Bilingual Text Document Summarization”. In Proceedings of the Sixth World Multiconference on Systematic, Cybernetics and Informatics. Orlando, Florida, USA, 2002 |
|
|
| 3 |
S. Yohei ‘‘Sentence Extraction by tf/idf and Position Weighting from Newspaper Articles (TSC-8)’’ NTCIR Workshop 3 Meeting TSC, pp 55-59, 2002 |
|
|
| 4 |
J. Larocca Neto, A.D. Santos, C.A.A. Kaestner, and A.A. Freitas. “Document Clustering and Text Summarization”. In Proceedings of the 4th Int. Conf. Practical Applications of Knowledge Discovery and Data Mining (PADD-2000), London: The Practical Application Company, pp 41---55, 2000b |
|
|
| 5 |
M. Amini and P. Gallinari. “The Use of Unlabeled Data to Improve Supervised Learning for Unsupervised for Text Summarization”. In SIGIR, Tampere, Finland, 2002 |
|
|
| 6 |
H. Luhn “The Automatic Creation of Literature Abstracts”. IBM Journal of Research and Development, 2(92):159 - 165, 1958 |
|
|
| 7 |
G. Salton and C. Buckley. “Term-weighting Approaches in Automatic Text Retrieval”. Information Processing and Management 24, pp 513-523, 1988. Reprinted in: Sparck Jones K. and Willet P. (eds). Readings in Information Retrieval, Morgan Kaufmann, pp 323-328, 1997 |
|
|
| 8 |
G.J. Klir and B. Yuan. “Fuzzy Sets and Fuzzy Logic - Theory and Applications”. Prentice- Hall, Inc., Englewood Cliffs, New Jersey, 1995 |
|
|
| 9 |
J.F. Baldwin. “Fuzzy and Probabilistic Uncertainties”. In Encyclopedia of AI, 2nd ed., S.C. Shapiro, Editor 1992, Wiley, New York, pp. 528-537, 1992 |
|
|
| 10 |
J.F. Baldwin. “Combining Evidences for Evidential Reasoning”. International Journal of Intelligent Systems, 6(6), pp. 569-616, 1991a |
|
|
| 11 |
J.F. Baldwin, J. Lawry, and T.P. Martin. “A Mass Assignment Theory of the Probability of Fuzzy Events”. Fuzzy Sets and Systems, (83), pp. 353-367, 1996 |
|
|
| 12 |
J.F. Baldwin, T.P. Martin and B.W. Pilsworth. “Fril - Fuzzy and Evidential Reasoning in Artificial Intelligence”. Research Studies Press Ltd, England, 1995 |
|
|
| 13 |
M.F. Porter. “An Algorithm for Suffix Stripping”. Program, 14(3):130-137, 1980 |
|
|
| 14 |
D. Lin. “Extracting Collocations from Text Corpora”. Workshop on Computational Terminology, Montreal, Canada, 1998 |
|
|
| 15 |
Z. Harris. “Distributional Structure”. In: Katz, J. J. (ed.) The Philosophy of Linguistics. New York: Oxford University Press, pp. 26-47, 1985 |
|
|
| 16 |
M.A. Azmi-Murad. “Fuzzy Text Mining for Intelligent Information Retrieval”. PhD Thesis, University of Bristol, April 2005 |
|
|
| 17 |
DUC. “Document Understanding Conferences”. http://duc.nist.gov, 2002 |
|
|
| |
|
| |
|
| |
| 1 |
M. S. Binwahlan, N. Salim and L. Suanmalui, “Fuzzy Swarm Diversity Hybrid Model for Text Summarization”, Information Processing & Management, 46(5), pp. 571–588, 2010. |
|
|
| 2 |
W. A. Ahmed and S. M. Shamsuddin , “Integration of Least Recently Used Algorithm and Neuro-Fuzzy System into Client-side Web Caching” , International Journal of Computer Science and Security (IJCSS), 3(1), pp. 1 – 15, 2009. |
|
|
| 3 |
S. Mansor , R. B. Din and A. Samsudin , “Analysis of Natural Language Steganography”, International Journal of Computer Science and Security (IJCSS), 3(2), pp. 113 – 125, 2009. |
|
|
| 4 |
R. Ahmad and A. Khanum , “Document Topic Generation in Text Mining by Using Cluster Analysis with EROCK”, International Journal of Computer Science and Security (IJCSS), 4(2), pp. 176 – 182, 2010. |
|
|
| |
|
| |
|
| |
| 1 |
citeulike |
| 2 |
UNIVERSITY PUTRA MALAYSIA |
| 3 |
UNIVERSITY PUTRA MALAYSIA |
| 4 |
yasni |
| 5 |
Live DNA |
| 6 |
lw20 |
| 7 |
Electronic Theses Dissertations Services |
| |
|
| |
|
| |
|
| Masrah Azrifah Azmi Murad : Colleagues
|
|
| Trevor Martin : Colleagues
|
|