Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 74 countries worldwide.
Concurrent Matrix Multiplication on Multi-core Processors
Muhammad Ali Ismail, S. H. Mirza, Talat Altaf
Pages - 208 - 220     |    Revised - 01-05-2011     |    Published - 31-05-2011
Volume - 5   Issue - 2    |    Publication Date - May / June 2011  Table of Contents
Multi-Core Processors , Concurrent Programming, Parallel Programming, Matrix Multiplication
With the advent of multi-cores every processor has built-in parallel computational power and that can only be fully utilized only if the program in execution is written accordingly. This study is a part of an on-going research for designing of a new parallel programming model for multi-core architectures. In this paper we have presented a simple, highly efficient and scalable implementation of a common matrix multiplication algorithm using a newly developed parallel programming model SPC3 PM for general purpose multi-core processors. From our study it is found that matrix multiplication done concurrently on multi-cores using SPC3 PM requires much less execution time than that required using the present standard parallel programming environments like OpenMP. Our approach also shows scalability, better and uniform speedup and better utilization of available cores than that the algorithm written using standard OpenMP or similar parallel programming tools. We have tested our approach for up to 24 cores with different matrices size varying from 100 x 100 to 10000 x 10000 elements. And for all these tests our proposed approach has shown much improved performance and scalability
CITED BY (11)  
1 Swamy, V., Sampath, S., Nanjesh, B. R., & Sagar, B. B. (2015). Development of Common Parallel Programming Platform for MPI and PVM. In Artificial Intelligence and Evolutionary Algorithms in Engineering Systems (pp. 95-103). Springer India.
2 Chickerur, S., Rayudu, D. M. K., Hiriyannaiah, S., & Shabalina, O. (2014). Performance Analysis of Alternative Open Source Parallel Computing Approach to OpenMP on Multicore Processors. In Knowledge-Based Software Engineering (pp. 466-476). Springer International Publishing.
3 Sampath, S., Nanjesh, B. R., Sagar, B. B., & Subbaraya, C. K. (2014, February). Performance optimization of PVM based parallel applications using optimal number of slaves. In Optimization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on (pp. 388-392). IEEE.
4 Vignesh, B. (2014). Pipelined Quadratic Equation Based Novel Multiplication Method for Cryptographic Applications. Indian Journal of Science and Technology, 7(4), 34-39.
5 Ismail, M. A. (2013, April). Multi-core processor based parallel implementation for finding distribution vectors in Markov processes. In Electronics, Communications and Photonics Conference (SIECPC), 2013 Saudi International (pp. 1-5). IEEE.
6 Sampath, S., Nanjesh, B. R., & Pramod, H. B. (2012). Evaluation of Parallel Application's Performance Dependency on RAM using Parallel Virtual Machine. International Journal of Computer Science & Communication Networks, 2(6), 641.
7 Ismail, M. A., Altaf, T., & Mirza, S. H. Parallel Matrix Multiplication on Multi-Core Processors using SPC 3 PM.
8 Nimako, G., Otoo, E. J., & Ohene-Kwofie, D. (2012, October). Cache-sensitive MapReduce DGEMM algorithms for shared memory architectures. In Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference (pp. 100-110). ACM.
9 Dash, T., & Nayak, T. (2012). Chain Multiplication of Dense Matrices: Proposing a Shared Memory based Parallel Algorithm. Matrix, 10, 16.
10 vardhan Dwivedi, H. (2011). Analysis and Implementation of Room Assignment Problem and Cannon's Algorithm on General Purpose Programmable Graphical Processing Units with CUDA. Analysis, 1, 1-2011.
11 Ismail, M. A., Mirza, S. H., & Altaf, T. (2011). A Parallel and Concurrent Implementation of Lin-Kernighan Heuristic (LKH-2) for Solving Traveling Salesman Problem for Multi-Core Processors using SPC 3 Programming Model. International Journal of Advanced Computer Science and Applications, USA, 2(6).
1 Google Scholar 
2 Academic Journals Database 
3 CiteSeerX 
4 refSeek 
5 iSEEK 
6 Libsearch 
7 Bielefeld Academic Search Engine (BASE) 
8 Scribd 
9 SlideShare 
10 PdfSR 
1 N. Vachharajani, Y. Zhang and T. Jablin, "Revisiting the sequential programming model for the multicore era", IEEE MICRO, Jan - Feb 2008.
2 M. D. McCool, "Scalable Programming Models for Massively Multicore Processors", Proceedings of the IEEE, vol. 96(5), 2008.
3 G. Goumas, "Performance evaluation of the sparse matrix-vector multiplication on modern architectures", Journal of Supercomputing, pp. 1-42, Nov. 2008.
4 R. Vuduc, H. Moon, "Fast sparse matrix-vector multiplication by exploiting variable block structure", Lecture notes in computer science, vol. 3726, pp. 807-816, 2005.
5 H. T. Kung, C. E. Leiserson, “Algorithms for VLSI processor arrays”; in “Introduction to VLSI Systems”, Addison-Wesley, 1979.
6 G. C. Fox, S. W. Otto and A. J. G. Hey, “Matrix algorithms on a hypercube I: Matrix multiplication”, Parallel Computing, vol. 4(1), pp.17-31, 1987.
7 R. A. van de Geijn, J. Watts,” SUMMA_ Scalable Universal Matrix Multiplication Algorithm”, TECHREPORT, 1997.
8 A. Ziad, M. Alqadi and M. M. El Emary, “Performance Analysis and Evaluation of Parallel Matrix Multiplication Algorithms”, World Applied Sciences Journal, vol. 5 (2), pp. 211- 214, 2008.
9 Z. Alqadi and A. Abu-Jazzar, ”Analysis of program methods used for optimizing matrix Multiplication”, Journal of Engineering, vol. 15(1), pp. 73-78, 2005.
10 J. Choi, “Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers” in Proceeding of 11th International Symposium on Parallel Processing IPPS '97 IEEE, 1997.
11 P. Alonso, R. Reddy, A. Lastovetsky, “Experimental Study of Six Different Implementations of Parallel Matrix Multiplication on Heterogeneous Computational Clusters of Multi-core Processors” in Proceedings of Parallel, Distributed and Network- Based Processing (PDP), Pisa, Feb. 2010.
12 R. C. Agarwal, S. M. Balle, F. G. Gustavson, M. Joshi, P. Palkar, “A three-dimensional approach to parallel matrix multiplication“, IBM Journal of Research and Development, vol. 39(5). pp. 575, 1995.
13 A. Buluc, J. R. Gilbert, “Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication”, in proceedings of 37th International Conference on Parallel Processing, ICPP '08, Portland, Sep 2008.
14 L . Buatois, Caumon, G. Lévy, “Concurrent number cruncher: An efficient sparse linear solver on the GPU”, in Proceedings of the High-Performance Computation Conference (HPCC), Springer LNCS, 2007.
15 S. Sengupta, M. Harris, Y. Zhang, J.D. Owens, “Scan primitives for GPU computing”. In Proceedings of Graphics Hardware”, Aug. 2007.
16 J. A. Stratton, S. S. Stone, Hwu, “M-CUDA: An efficient implementation of CUDA kernels on multicores”, IMPACT Technical Report 08-01, University of Illinois at Urbana-Champaign, 2008.
17 K. Fatahalian, J. Sugerman, P. Hanrahan, “Understanding the efficiency of GPU algorithms for matrix-matrix multiplication” in Proceeding of the conference on Graphics hardware ACM SIGGRAPH/EUROGRAPHICS HWWS '04, 2004.
18 J. Bolz, I. Farmer, E. Grinspun, P. Schröoder, “Sparse matrix solvers on the GPU: conjugate gradients and multigrid”, ACM Transactions on Graphics (TOG), vol. 22(3), 2003.
19 S. Ohshima, K. Kise, T. Katagiri and T. Yuba, “Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment”, High Performance Computing for Computational Science – VECPAR, 2006.
Mr. Muhammad Ali Ismail
NED university of Engineering and technology - Pakistan
Dr. S. H. Mirza
Usman Institute of Technology - Pakistan
Dr. Talat Altaf
NED University of Engineering & Technology - Pakistan