Home   >   CSC-OpenAccess Library   >    Manuscript Information
On Tracking Behavior of Streaming Data: An Unsupervised Approach
Sattar Hashemi, Ali Hamzeh, Niloofar Mozafari
Pages - 16 - 26     |    Revised - 31-03-2011     |    Published - 04-04-2011
Volume - 2   Issue - 1    |    Publication Date - March / April 2011  Table of Contents
Data Stream, Concept Change, Precision, Recall, F1 Measure, Cumulative Density Function
In the recent years, data streams have been in the gravity of focus of quite a lot number of researchers in different domains. All these researchers share the same difficulty when discovering unknown pattern within data streams that is concept change. The notion of concept change refers to the places where underlying distribution of data changes from time to time. There have been proposed different methods to detect changes in the data stream but most of them are based on an unrealistic assumption of having data labels available to the learning algorithms. Nonetheless, in the real world problems labels of streaming data are rarely available. This is the main reason why data stream communities have recently focused on unsupervised domain. This study is based on the observation that unsupervised approaches for learning data stream are not yet matured; namely, they merely provide mediocre performance specially when applied on multi-dimensional data streams. In this paper, we propose a method for Tracking Changes in the behavior of instances using Cumulative Density Function; abbreviated as TrackChCDF. Our method is able to detect change points along unlabeled data stream accurately and also is able to determine the trend of data called closing or opening. The advantages of our approach are three folds. First, it is able to detect change points accurately. Second, it works well in multi-dimensional data stream, and the last but not the least, it can determine the type of change, namely closing or opening of instances over the time which has vast applications in different fields such as economy, stock market, and medical diagnosis. We compare our algorithm to the state-of-the-art method for concept change detection in data streams and the obtained results are very promising.
1 Google Scholar 
2 CiteSeerX 
3 Scribd 
4 SlideShare 
5 PdfSR 
B. Babcock, S. Babu, R. Datar, R. Motwani and J. Widom. “Models and Issues in Data Stream Systems”, in proceedings of ACM Symp, Principles of Databases Systems (PODS),pp. 1-16, 2002.
C. C. Aggarwal. “A framework for Change Diagnosis of Data Streams”, in proceedings of ACM SIGMOD international conference on Management of Data, pp. 575–586, 2003.
C.C.Aggarwal, J. Han, J. Wang, P.S. Yu. “On Demand Classification of Data Streams”, in proceedings of ACM SIGKDD, pp. 503-508, 2004.
D. Kifer, S. Ben-David, J. Gehrke. “Detecting Change in Data Streams”, in proceedings of 13th international conference on Very Large Data Bases, M. A. Nascimento, M. T. O¨ zsu, D.Kossmann, R. J. Miller, J. A. Blakeley, and K. B. Schiefer, Eds. Morgan Kaufmann, pp. 180–191, 2004.
E. Carlstein, H. G. Muller, D. Siegmund editors. “Change point problems”, Institute of Mathematical Statistics, Hayward, California, 1994.
F. Chu, Y. Wang, C. Zaniolo. “An Adaptive Learning Approach for Noisy Data Streams”, in proceedings of 4th IEEE international conference on Data Mining. IEEE Computer Society,pp. 351–354, 2004
G. Widmer, M. Kubat. “Learning in the Presence of Concept Drift and Hidden Contexts”, in Machine Learning, vol. 23, no. 1, pp. 69–101, 1996.
H. Wang, W. Fan, P. S. Yu, J. Han. “Mining Concept Drifting Data streams Using Ensemble Classifiers”, in proceedings of 9th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, L. Getoor, T. E. Senator, P. Domingos, and C. Faloutsos, Eds.ACM, pp. 226–235, 2003.
J. Glaz, J. Naus, S. Wallenstein. “Scan Statistics”, Springer, New York, 2001.
J. Glaz, N. Balakrishnan Editors. “Scan Statistics and Applications”, Boston, 1999.
J. Han, M. Kamber. “Data Mining: Concepts and Techniques”, Morgan Kaufmann, 2001.
J. Z. Kolter, M. A. Maloof. “Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift”, in proceedings of 3th IEEE international conference on Data Mining,IEEE Computer Society, pp. 123–130, 2003.
M. Scholz, R. Klinkenberg. “Boosting Classifiers for Drifting Concepts”, in Intelligent Data Analysis, vol. 11, no. 1, pp. 3-28, 2007.
O. Bousquet, M. Warmuth. “Tracking a Small Set of Experts by Mixing Past Posteriors”, in Journal of Machine Learning Research, vol. 3, pp. 363-396, 2002.
O. Nasraoui, C. Rojas. “Robust Clustering for Tracking Noisy Evolving Data Streams”, in Proceedings of Sixth SIAM International Conference of Data Mining (SDM), 2006.
P.J. Bickel, K. Doksum, “Mathematical Statistics: Basic Ideas and Selected Topics”, HoldenDay,Inc., 1977.
R. Klinkenberg, T. Joachims. “Detecting Concept Drift with Support Vector Machines”, in Proceedings of 17th International Conference on Machine Learning, P. Langley, Ed. Morgan Kaufmann, pp. 487–494, 2000.
R. Klinkenberg. “Learning Drifting Concepts: Examples Selection VS Example Weighting”, in Intelligent Data Analysis, Special Issue on Incremental Learning Systems capable of dealing with concept drift, vol. 8, no. 3, pp. 281–300, 2004.
S. S. Ho, H. Wechsler. “A Martingale Framework for Detecting Changes in Data Streams by Testing Exchangeability”, in IEEE transactions on pattern analysis and machine intelligence,2010.
S. S. Ho. “A Martingale Framework for Concept Change Detection in Time Varying Data Streams”, in Proceeding of 22th International Conference on Machine Learning, L. D. Raedt and S. Wrobel, Eds., ACM, pp. 321–327, 2005.
S.-S. Ho, H. Wechsler. “Detecting Changes in Unlabeled Data Streams Using Martingale”, in Proceeding 20th International Joint Conference on Artificial Intelligence, M. Veloso, pp. 1912–1917, 2007.
T. Dasu, S. Krishnan, S. Venkatasubramanian, K. Yi. “An Information Theoretic Approach to Detecting Changes in Multi Dimensional Data Streams”, in Interface, 2006.
T. Jiang, Y. Feng, B. Zhang. “Online Detecting and Predicting Special Patterns over Financial Data Streams”, in Journal of Universal Computer Science, vol. 15, pp. 2566-2585, 2009.
W. Fan. “Systematic Data Selection to Mine Concept Drifting Data Streams”, in Proceedings of ACM SIGKDD, pp. 128-137, 2004.
X. Liu, J.Guan, P. Hu. “Mining Frequent Closed Item Sets from a Landmark Window Over Online Data Streams”, in journal of computers and mathematics with applications, vol. 57, pp.927-936, 2009.
Associate Professor Sattar Hashemi
Shiraz University - Iran
Associate Professor Ali Hamzeh
Shiraz University - Iran
Dr. Niloofar Mozafari
- Iran