Language Combinatorics: A Sentence Pattern Extraction Architecture Based on Combinatorial Explosion

Michal Ptaszynski; Rafal Rzepka; Yoshio Momouchi

Call for Papers - Ongoing round of submission, notification and publication.

Home | Login or Register | Contact CSC

Home > CSC-OpenAccess Library > Manuscript Information

Full Text Available
(no registration required)

(823.76KB)

-- CSC-OpenAccess Policy

-- Creative Commons Attribution NonCommercial 4.0 International License

>> COMPLETE LIST OF JOURNALS

EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Language Combinatorics: A Sentence Pattern Extraction Architecture Based on Combinatorial Explosion

Michal Ptaszynski, Rafal Rzepka, Yoshio Momouchi

Pages - 24 - 36 | Revised - 01-07-2011 | Published - 05-08-2011

Published in International Journal of Computational Linguistics (IJCL)

Volume - 2 Issue - 1 | Publication Date - July / August 2011 Table of Contents

MORE INFORMATION

References | Cited By (15) | Abstracting & Indexing

KEYWORDS

Computational Linguistics, Information Retrieval and Extraction, Corpus Linguistis

ABSTRACT

A \"sentence pattern\" in modern Natural Language Processing is often considered as a subsequent string of words (n-grams). However, in many branches of linguistics, like Pragmatics or Corpus Linguistics, it has been noticed that simple n-gram patterns are not sufficient to reveal the whole sophistication of grammar patterns. We present a language independent architecture for extracting from sentences more sophisticated patterns than n-grams. In this architecture a \"sentence pattern\" is considered as n-element ordered combination of sentence elements. Experiments showed that the method extracts significantly more frequent patterns than the usual n-gram approach.

CITED BY (15)

1	NAKAJIMA, Y., PTASZYNSKI, M., HONMA, H., & MASUI, F. (2016). An Extraction Method for Future Reference Expressions Using Morphological and Semantic Patterns.

2	Sakuta, H., & Adachi, E. How Differently Do We Talk? A Study of Sentence Patterns in Groups of Different Age, Gender and Social Status.

3	NAKAJIMA, Y., PTASZYNSKI, M., HONMA, H., & MASUI, F. (2014). FAN-14-029 Extraction of Future Reference Expressions in Trend Information. ? nn te ri ji e nn Suites ? su Te Rousseau · ? nn Polyster ji ? Rousseau Lecture Proceedings, 2014 (24) , 129-134.

4	Ptaszynski, M., Masui, F., Rzepka, R., & Araki, K. (2014). First Glance on Pattern-based Language Modeling. Language Acquisition and Understanding Research Group (LAU), Technical Reports, Summer.

5	Nakajima, Y., Ptaszynski, M., Honma, H., & Masui, F. (2014, March). Investigation of Future Reference Expressions in Trend Information. In Proceedings of the 2014 AAAI Spring Symposium Series (pp. 31-38).

6	Ptaszynski, M., Masui, F., Rzepka, R., & Araki, K. (2014). Detecting emotive sentences with pattern-based language modelling. Procedia Computer Science, 35, 484-493.

7	D'hondt, E. K. L. (2014). Cracking the patent: using phrasal representations to aid patent classfication. [Sl: sn].

8	Ptaszynski, M., Masui, F., Rzepka, R., & Araki, K. (2014). Automatic Extraction of Emotive and Non-emotive Sentence Patterns. In Proceedings of The Twentieth Annual Meeting of The Association for Natural Language Processing (NLP2014) (pp. 868-871).

9	Ptaszynski, M., Masui, F., Dybala, P., Rzepka, R., & Araki, K. Open Source Affect Analysis System with Extensions.

10	Nakajima, Y., Ptaszynski, M., Honma, H., & Masui, F. Extracting References to the Future from News using Morphosemantic Patterns.

11	Ptaszynski, M., Dokoshi, H., Oyama, S., Rzepka, R., Kurihara, M., Araki, K., & Momouchi, Y. (2013). Affect analysis in context of characters in narratives. Expert Systems with Applications, 40(1), 168-176.

12	Ptaszynski, M., Hasegawa, D., & Masui, F. Women Like Backchannel, But Men Finish Earlier: Pattern Based Language Modeling of Conversations Reveals Gender and Social Distance Differences.

13	D’hondt, E., Verberne, S., Weber, N., Koster, C., & Boves, L. (2012). Using skipgrams and pos-based feature selection for patent classification. Computational Linguistics in the Netherlands Journal, 2, 52-70.

14	Lempa, P., Ptaszynski, M., & Masui, F. Cyberbullying Blocker Application for Android.

15	Ptaszynski, M., Masui, F., Kimura, Y., Rzepka, R., & Araki, K. Extracting Patterns of Harmful Expressions for Cyberbullying Detection.

ABSTRACTING & INDEXING

1	Google Scholar

2	CiteSeerX

3	refSeek

4	Scribd

5	SlideShare

6	PdfSR

REFERENCES

B. Pang, L. Lee, S. Vaithyanathan. “Thumbs up?: sentiment classification using machine learning techniques”. In Proc. of EMNLP'02, pp. 79-86, 2002.

B. Roark, M. Saraclar, M. Collins, “Discriminative n-gram language modeling”, Computer Speech & Language, Vol. 21, Issue 2, pp. 373-392, 2007.

Burkhanov. “Pragmatic specifications: Usage indications, labels, examples; dictionaries of style, dictionaries of collocations”, In Piet van Sterkenburg (Ed.). A practical guide to lexicography, John Benjamins Publishing Company, 2003.

C. E. Shannon, “A Mathematical Theory of Communication”, The Bell System Technical Journal, Vol. 27, pp. 379-423 (623-656), 1948.

C. Potts and F. Schwarz. “Exclamatives and heightened emotion: Extracting pragmatic generalizations from large corpora”. Ms., UMass Amherst, 2008.

D. E. Knuth, The Art of Computer Programming, Volume 4, Fascicle 3: Generating All Combinations and Partitions. Addison Wesley Professional, 2005.

D. Guthrie, B. Allison, W. Liu, L. Guthrie, Y. Wilks, Y. “A Closer Look at Skip-gram Modelling”. In Proc. Fifth International Conference on Language, Resources and Evaluation(LREC'06), pp. 1222-1225, 2006.

D. Knight, and S. Adolphs, “Multi-modal corpus pragmatics: The case of active listenership”,Pragmatics and Corpus Linguistics, pp. 175-190, Berlin, New York (Mouton de Gruyter),2008.

E. Riloff, “Automatically Generating Extraction Patterns from Untagged Text”, In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pp.1044-1049, 1996.

F. Sebastiani. “Machine learning in automated text categorization”. ACM Comput. Surv.,34(1), pp. 1-47, 2002.

G. Forman. “An extensive empirical study of feature selection metrics for text classification”.J. Mach. Learn. Res., 3 pp. 1289-1305, 2003.

H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. “Text classification using string kernels”, The Journal of Machine Learning Research, 2, pp. 419-444, 2002.

H. Uchino, S. Shirai, S. Ikehara, M. Shintami, “Automatic Extraction of Template Patterns Using n-gram with Tokens” [in Japanese], IEICE Technical Report on Natural Language Understanding and Models of Communication, 96(157), pp. 63-68, 1996.

K. Krippendorff, “Combinatorial Explosion”, Web Dictionary of Cybernetics and Systems.Princia Cybernetica Web.

K. Sasai, “The Structure of Modern Japanese Exclamatory Sentences: On the Structure of the Nanto-Type Sentence”. Studies in the Japanese Language, Vol, 2, No. 1, pp. 16-31,2006.

M. Ptaszynski, P. Dybala, R. Rzepka K. and Araki, “Affecting Corpora: Experiments with Automatic Affect Annotation System - A Case Study of the 2channel Forum”, Proceedings of PACLING-09, pp. 223-228, 2009.

M. Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki and Y. Momouchi, “In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis”, International Journal of Computational Linguistics Research, Vol. 1 , Issue 3, pp.135-154, 2010.

N. Constant, C. Davis, C. Potts and F. Schwarz, “The pragmatics of expressive content:Evidence from large corpora”. Sprache und Datenverarbeitung, 33(1-2):5-21, 2009.

P. F. Brown, P. V. de Souza, R. L. Mercer, V. J. Della Pietra, and J. C. Lai. “Class-based ngram models of natural language”. Computational Linguistics, Vol. 18, No. 4 (December 1992), 467-479, 1992.

P. H. Grice, Studies in the Way of Words. Cambridge (MA): Harvard University Press, 1989.

P. P. Talukdar, T. Brants, M. Liberman and F. Pereira, “A Context Pattern Induction Method for Named Entity Extraction”, In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pp. 141-148, 2006.

P. Pantel and M. Pennacchiotti, “Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations”, In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp. 113-120, 2006.

S. C. Levinson, Pragmatics. Cambridge University Press, 1983.

S. Chen, J. Goodman, “An empirical study of smoothing techniques for language modeling”,Comp. Speech & Language, Vol. 13, Issue 4, pp. 359-393, 1999.

T. Kudo. MeCab: Yet Another Part-of-Speech and Morphological Analyzer, 2001.http://mecab.sourceforge.net/ [July 27, 2011].

MANUSCRIPT AUTHORS

Dr. Michal Ptaszynski

- Japan

ptaszynski@hgu.jp

Dr. Rafal Rzepka

- Japan

Dr. Yoshio Momouchi

- Japan

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS