Performance Improvement of 2D Discrete Wavelet Transform using Data-Level Parallelism Technique

Document Type : Original Article

Authors

Department of Computer Engineering, Faculty of Engineering, University of Guilan, Rasht, Iran

Abstract

The two-Dimensional Discrete Wavelet Transform (2D-DWT) is widely used in various applications for multimedia data processing, including image and video compression standards. However, this transform is computational intensive than conventional conversions, such as the discrete cosine transform. In this paper, in order to improve the performance of 2D-DWT,  we use Single Instruction, Multiple Data (SIMD) set instructions including Advanced Vector Extensions (AVX), Fused Multiply-Add (FMA), and AVX2 supported by most General-Purpose Processors (GPP). These technologies capable to process 256-bit data located in SIMD registers. The AVX technology can process eight 32-bit floating point numbers, while AVX2 processes sixteen 16-bit fixed-point numbers. In other words, it is possible to exploit 8- and 16-way data-level parallelism. In addition, two different way of parallelism, Row Column Wavelet Transform (RCWT) which processes rows and columns separately and Line-Based Wavelet Transform (LBWT) that processes both rows and columns in a single loop are used. Experimental results of different wavelet transform with different image sizes on a GPP show that the speedups of up to 28.8x yield. Furthermore, LBWT approach improves performance more than RCWT. This is because it uses memory hierarchy structure more efficiently than RCWT approach.

Keywords


[1]      M. Rabbani, and R. Joshi, “An Overview of the JPEG2000 Still Image Compression Standard,” Signal Processing: Image Communication, vol. 17, no. 1, pp. 3–48, 2002.
[2]      S. Battista, F. Casalino, and C. Lande, “MPEG-4: A Multimedia Standard for the Third Millenium, part 1,” IEEE Multimedia, vol. 6, no. 4, pp.74–83, 1999.
[3]      S. Battista, F. Casalino, and C. Lande, “MPEG-4: A Multimedia Standard for the Third Millenium, part 2,” IEEE Multimedia, vol. 6, no. 4, pp.76–84, 2000.
[4]      M. D.  Adams, and R.K. Ward, “JasPer: A Portable Flexible Open-Source Software Tool Kit for Image Coding Processing,” In Proc. IEEE, Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5, pp. 241-244, 2004.
[5]      A. Shahbahrami, “Improving the Performance of 2D Discrete Wavelet Transform using Data-Level Parallelism”, In IEEE Int. Conf. on High Performance Computing and Simulation, pp. 362-368, 2011.
[6]      A. Shahbahrami, “Algorithms and Architectures for 2D Discrete Wavelet Transform,” The Journal of Supercomputing, vol. 62, no. 2, pp. 1045-1064, May 2012.
[7]      عبدالبصیر تیباش و اسدالله شاه بهرامی، «مقایسه کارایی پیاده‌سازی صحیح و اعشاری فیلترهای مختلف تبدیل موجک گسسته دوبعدی،» هشتمین کنفرانس فناوری اطلاعات و دانش، دانشگاه بوعلی سینا، 17و18 شهریور 1395.
[8]      فرشته صادقی، ابوالفضل جلیلوند و سیدهادی حسینی، «ارائه یک روش ترکیبی مبتنی بر تبدیل موجک گسسته برای پیشبینی بارالکتریکی با استفاده از یک مدل دوبعدی،» مجله مهندسی برق دانشگاه تبریز، جلد 45، شماره 3، صفحه 68-78، پاییز 94.
[9]      حسین شایقی و علی قاسمی، «پیشبینی قیمت روزانه برق با شبکه عصبی بهبودیافته مبتنی بر تبدیل موجک و روش آشوبناک جستجوی گرانشی،» مجله مهندسی برق دانشگاه تبریز، جلد 45، شماره 4، صفحه 103-113، پاییز 94.
[10]      Z. N. Li, M.S. Drew, and J. Liu, Fundamentals of Multimedia, NJ: Prentice Hall, 2014.
[11]      H. Liao, M.K. Mandal, and B.F. Cockburn, “Efficient Architectures for 1-D and 2-D Lifting-Based Wavelet Transforms,” IEEE Trans. on Signal Processing, vol. 52, no. 5, pp. 1315-1326, 2004.
[12]      T. M. Quan, and , W.K. Jeong, “A Fast Discrete Wavelet Transform using Hybrid Parallelism on GPUs,” IEEE Trans. on Parallel and Distributed Systems, vol. 27, no. 11, pp. 3088-3100, 2016.
[13]      D. Barina, and P. Zemcik, “Minimum Memory Vectorisation of Wavelet Lifting,” In Int. Conf. on Advanced Concepts for Intelligent Vision Systems, Springer, pp. 91-101, 2013.
[14]      D. Barina, and P. Zemcik, “Diagonal Vectorisation of 2-D Wavelet Lifting,” In IEEE Int. Conf. on Image Processing, pp. 2978-2982, 2014.
[15]      A. Shahbahrami, and B.H. Juurlink, SIMD Architectural Enhancements to improve the Performance of the 2D Discrete Wavelet Tranform, 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, pp. 497-504, 2009.
[16]      D. Chaver, C. Tenllado, L. Piñuel, M. Prieto, and F. Tirado, “2-D Wavelet Transform Enhancement on General-Purpose Microprocessors: Memory Hierarchy and SIMD Parallelism Exploitation,” In Proc. Int. Conf. on the High Performance Computing, vol. 2552, pp. 9-21, 2002.
[17]      F. Yazdanpanah, “An Approach for Analyzing Auto-vectorization Potential of Emerging Workloads”, Microprocessors and Microsystems, vol. 49, pp. 139-149, 2017.
[18]      H. Saito, S. Preis, N. Panchenko, and X. Tian, Reducing the Functionality Gap between Auto-vectorization and Explicit Vectorization, Int. workshop on OpenMP, pp. 173-186, 2016.
[19]      T. Behrens, V. Rosenfeld, J. Traub, S. Breb, and V. Markl, “Efficient SIMD Vectorization for Hashing in OpenCl”, Int.Conf. On Extending Database Technology, vol. 3, no. 4, 2018.
[20]      O. Reiche, C. Kobylko, F. Hanniq, J. Teich, “Auto-vectorization for Image Processing DSLs”, Proc. in Int. Conf. on Languages, Compilers, and Tools for Embedded Systems, vol. 52, no. 5, pp. 21-30, 2017.
[21]      V. Arunachalam, A.N. J. Raj, N. Hampannavar, C. B. Bidul, “Efficient Dual-precision Floating-point Fused-multiply-add Architecture,” Microprocessors and Microsystems, vol. 57, no. 1, pp. 23-31, 2018 
[22]      D. G. Chaver, C. Tenllado, L. Pinuel, M. Prieto, and F. Tirado, Vectorization of the 2D Wavelet Lifting Transform using SIMD Extensions, IEEE Workshop on Parallel and Distributed Image Processing, Video Processing, and Multimedia, pp. 8-12, 2003.
[23]      R. Kutil, A Single-Loop Approach to SIMD Parallelization of 2D Wavelet Lifting, In IEEE Proc. of the 14th Euromicro Int. Conf. on Parallel, Distributed and Network-Based Processing,  pp. 413–420, 2006.
[24]      A. Shahbahrami, B. Juurlink, and S. Vassiliadis, “Implementing the 2-D Wavelet Transform on SIMD-Enhanced General-Purpose Processors,” IEEE Tran. on Multimedia, vol. 10, no. 1, pp. 43-51, 2008.
[25]      A. Shahbahrami, B. Juurlink, and S. Vassiliadis, Performance Comparison of SIMD Implementations of the Discrete Wavelet Transform, In 16th IEEE Int. Conf. on Application-Specific Systems, Architecture Processors, pp. 393-398, 2005.
[26]      A. Shahbahrami, and B. Juurlink, A Comparison of Two SIMD Implementations of the 2D Discrete Wavelet Transform, In Proc. 18th Annual Workshop on Circuits, Systems and Signal Processing, pp. 169-177, 2007.
[27]      D. Barina, and P. Zemcik, Vectorization and Parallelization of 2-D Wavelet Lifting, Journal of Real-Time Image Processing, pp. 1-13, 2015.
[28]      H. B. Prajapati, and S.K. Vij, Analytical Study of Parallel and Distributed Image Processing, In IEEE Int. Conf. on Image Information Processing, pp. 1-6, 2011.
[29]      P.P. Dang, and P.M. Chau, Integer Fast Wavelet Transform and its VLSI Implementation for Low Power Applications, In IEEE Workshop on Signal Processing Systems, pp. 93-98, 2002.
[30]      Intel Corporation, Intel® 64 and IA-32 Architectures. Software Developer’s Manual, System Programming Guide, vol. 3A, Part 1, 2010.
[31]      D.B. Stewart, Measuring Execution Time and Real-Time Performance, In Embedded Systems Conf., vol. 141, 2001.
[32]      Intel Corporation, Intel® Architecture Code Analyzer, User Guide, Document Number: 321356-001US, 2009.