Home   >   CSC-OpenAccess Library   >    Manuscript Information
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on Combinatorial Explosion
Michal Ptaszynski, Rafal Rzepka, Yoshio Momouchi
Pages - 24 - 36     |    Revised - 01-07-2011     |    Published - 05-08-2011
Volume - 2   Issue - 1    |    Publication Date - July / August 2011  Table of Contents
Computational Linguistics, Information Retrieval and Extraction, Corpus Linguistis
A \"sentence pattern\" in modern Natural Language Processing is often considered as a subsequent string of words (n-grams). However, in many branches of linguistics, like Pragmatics or Corpus Linguistics, it has been noticed that simple n-gram patterns are not sufficient to reveal the whole sophistication of grammar patterns. We present a language independent architecture for extracting from sentences more sophisticated patterns than n-grams. In this architecture a \"sentence pattern\" is considered as n-element ordered combination of sentence elements. Experiments showed that the method extracts significantly more frequent patterns than the usual n-gram approach.
CITED BY (15)  
1 NAKAJIMA, Y., PTASZYNSKI, M., HONMA, H., & MASUI, F. (2016). An Extraction Method for Future Reference Expressions Using Morphological and Semantic Patterns.
2 Sakuta, H., & Adachi, E. How Differently Do We Talk? A Study of Sentence Patterns in Groups of Different Age, Gender and Social Status.
3 NAKAJIMA, Y., PTASZYNSKI, M., HONMA, H., & MASUI, F. (2014). FAN-14-029 Extraction of Future Reference Expressions in Trend Information. ? nn te ri ji e nn Suites ? su Te Rousseau · ? nn Polyster ji ? Rousseau Lecture Proceedings, 2014 (24) , 129-134.
4 Ptaszynski, M., Masui, F., Rzepka, R., & Araki, K. (2014). First Glance on Pattern-based Language Modeling. Language Acquisition and Understanding Research Group (LAU), Technical Reports, Summer.
5 Nakajima, Y., Ptaszynski, M., Honma, H., & Masui, F. (2014, March). Investigation of Future Reference Expressions in Trend Information. In Proceedings of the 2014 AAAI Spring Symposium Series (pp. 31-38).
6 Ptaszynski, M., Masui, F., Rzepka, R., & Araki, K. (2014). Detecting emotive sentences with pattern-based language modelling. Procedia Computer Science, 35, 484-493.
7 D'hondt, E. K. L. (2014). Cracking the patent: using phrasal representations to aid patent classfication. [Sl: sn].
8 Ptaszynski, M., Masui, F., Rzepka, R., & Araki, K. (2014). Automatic Extraction of Emotive and Non-emotive Sentence Patterns. In Proceedings of The Twentieth Annual Meeting of The Association for Natural Language Processing (NLP2014) (pp. 868-871).
9 Ptaszynski, M., Masui, F., Dybala, P., Rzepka, R., & Araki, K. Open Source Affect Analysis System with Extensions.
10 Nakajima, Y., Ptaszynski, M., Honma, H., & Masui, F. Extracting References to the Future from News using Morphosemantic Patterns.
11 Ptaszynski, M., Dokoshi, H., Oyama, S., Rzepka, R., Kurihara, M., Araki, K., & Momouchi, Y. (2013). Affect analysis in context of characters in narratives. Expert Systems with Applications, 40(1), 168-176.
12 Ptaszynski, M., Hasegawa, D., & Masui, F. Women Like Backchannel, But Men Finish Earlier: Pattern Based Language Modeling of Conversations Reveals Gender and Social Distance Differences.
13 D’hondt, E., Verberne, S., Weber, N., Koster, C., & Boves, L. (2012). Using skipgrams and pos-based feature selection for patent classification. Computational Linguistics in the Netherlands Journal, 2, 52-70.
14 Lempa, P., Ptaszynski, M., & Masui, F. Cyberbullying Blocker Application for Android.
15 Ptaszynski, M., Masui, F., Kimura, Y., Rzepka, R., & Araki, K. Extracting Patterns of Harmful Expressions for Cyberbullying Detection.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
B. Pang, L. Lee, S. Vaithyanathan. “Thumbs up?: sentiment classification using machine learning techniques”. In Proc. of EMNLP'02, pp. 79-86, 2002.
B. Roark, M. Saraclar, M. Collins, “Discriminative n-gram language modeling”, Computer Speech & Language, Vol. 21, Issue 2, pp. 373-392, 2007.
Burkhanov. “Pragmatic specifications: Usage indications, labels, examples; dictionaries of style, dictionaries of collocations”, In Piet van Sterkenburg (Ed.). A practical guide to lexicography, John Benjamins Publishing Company, 2003.
C. E. Shannon, “A Mathematical Theory of Communication”, The Bell System Technical Journal, Vol. 27, pp. 379-423 (623-656), 1948.
C. Potts and F. Schwarz. “Exclamatives and heightened emotion: Extracting pragmatic generalizations from large corpora”. Ms., UMass Amherst, 2008.
D. E. Knuth, The Art of Computer Programming, Volume 4, Fascicle 3: Generating All Combinations and Partitions. Addison Wesley Professional, 2005.
D. Guthrie, B. Allison, W. Liu, L. Guthrie, Y. Wilks, Y. “A Closer Look at Skip-gram Modelling”. In Proc. Fifth International Conference on Language, Resources and Evaluation(LREC'06), pp. 1222-1225, 2006.
D. Knight, and S. Adolphs, “Multi-modal corpus pragmatics: The case of active listenership”,Pragmatics and Corpus Linguistics, pp. 175-190, Berlin, New York (Mouton de Gruyter),2008.
E. Riloff, “Automatically Generating Extraction Patterns from Untagged Text”, In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pp.1044-1049, 1996.
F. Sebastiani. “Machine learning in automated text categorization”. ACM Comput. Surv.,34(1), pp. 1-47, 2002.
G. Forman. “An extensive empirical study of feature selection metrics for text classification”.J. Mach. Learn. Res., 3 pp. 1289-1305, 2003.
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. “Text classification using string kernels”, The Journal of Machine Learning Research, 2, pp. 419-444, 2002.
H. Uchino, S. Shirai, S. Ikehara, M. Shintami, “Automatic Extraction of Template Patterns Using n-gram with Tokens” [in Japanese], IEICE Technical Report on Natural Language Understanding and Models of Communication, 96(157), pp. 63-68, 1996.
K. Krippendorff, “Combinatorial Explosion”, Web Dictionary of Cybernetics and Systems.Princia Cybernetica Web.
K. Sasai, “The Structure of Modern Japanese Exclamatory Sentences: On the Structure of the Nanto-Type Sentence”. Studies in the Japanese Language, Vol, 2, No. 1, pp. 16-31,2006.
M. Ptaszynski, P. Dybala, R. Rzepka K. and Araki, “Affecting Corpora: Experiments with Automatic Affect Annotation System - A Case Study of the 2channel Forum”, Proceedings of PACLING-09, pp. 223-228, 2009.
M. Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki and Y. Momouchi, “In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis”, International Journal of Computational Linguistics Research, Vol. 1 , Issue 3, pp.135-154, 2010.
N. Constant, C. Davis, C. Potts and F. Schwarz, “The pragmatics of expressive content:Evidence from large corpora”. Sprache und Datenverarbeitung, 33(1-2):5-21, 2009.
P. F. Brown, P. V. de Souza, R. L. Mercer, V. J. Della Pietra, and J. C. Lai. “Class-based ngram models of natural language”. Computational Linguistics, Vol. 18, No. 4 (December 1992), 467-479, 1992.
P. H. Grice, Studies in the Way of Words. Cambridge (MA): Harvard University Press, 1989.
P. P. Talukdar, T. Brants, M. Liberman and F. Pereira, “A Context Pattern Induction Method for Named Entity Extraction”, In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pp. 141-148, 2006.
P. Pantel and M. Pennacchiotti, “Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations”, In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp. 113-120, 2006.
S. C. Levinson, Pragmatics. Cambridge University Press, 1983.
S. Chen, J. Goodman, “An empirical study of smoothing techniques for language modeling”,Comp. Speech & Language, Vol. 13, Issue 4, pp. 359-393, 1999.
T. Kudo. MeCab: Yet Another Part-of-Speech and Morphological Analyzer, 2001.http://mecab.sourceforge.net/ [July 27, 2011].
Dr. Michal Ptaszynski
- Japan
Dr. Rafal Rzepka
- Japan
Dr. Yoshio Momouchi
- Japan

View all special issues >>