روشی کارا برای پیاده‌سازی موازی الگوریتم دسته بندی بسته درخت سلسله‌مراتبی بر روی واحد پردازش گرافیکی

رفیعی, میلاد; عباسی, مهدی; نصیری, محمد

روشی کارا برای پیاده‌سازی موازی الگوریتم دسته بندی بسته درخت سلسله‌مراتبی بر روی واحد پردازش گرافیکی

نویسندگان

دانشگاه بوعلی سینا همدان

چکیده

چکیده: دستهبندی بستهها، پردازشی اساسی در پردازندههای شبکهای است. در این فرآیند، بستههای ورودی از طریق تطبیق با مجموعهای از فیلترها به جریانهای مشخص طبقهبندی میشوند. پیاده‌سازی‌های نرم‌افزاری الگوریتمهای دستهبندی با وجود هزینه کم‌تر و توسعه‌پذیری بیش‌تر نسبت به پیاده‌سازیهای سخت‌افزاری، سرعت پایین‌تری دارند. در این مقاله، از قابلیت پردازش موازی پردازنده‌های گرافیکی برای تسریع الگوریتم درخت سلسله‌مراتبی دستهبندی بستهها، استفاده نموده و سناریوهای متفاوتی را بر اساس معماری حافظه‌های سراسری و اشتراکی آن‌ها پیشنهاد مینماییم. نتایج پیاده‌سازی این سناریوها، ضمن تأیید پیچیدگیهای زمانی و حافظهای محاسبهشده، نشان میدهد کارایی سناریوهایی که مجموعه فیلتر را به‌صورت زیردرختهایی کوچک‌تر یا مساوی حافظه اشتراکی تقسیم و به آن کپی میکنند کم‌تر از سناریویی است که کل ساختار داده را در حافظه سراسری نگه میدارد. کارایی این سناریوها، با کاهش تعداد زیردرختها و فیلترهای تکراری افزایش مییابد علاوه بر این، سناریویی که بتواند درخت سلسله‌مراتبی و مجموعه فیلترهای متناظر را، بدون افراز در حافظه اشتراکی جای دهد برترین سناریو است. نتایج آزمـایش نـشان میدهد که نرخ گذرداد حاصله در این سناریو نسبت به روشهای موجود بر روی یک GPU یکسان تا 1/2 برابر بهبود مییابد.

کلیدواژه‌ها

عنوان مقاله [English]

An Efficient Method for Parallel Implementation of H-Trie Packet Classification Algorithm on GPU

چکیده [English]

Abstract: Packet classification is a fundamental process in network processors. In this process, input packets are classified into distinct set of flows via matching against a set of filters. Software implementation of packet classification algorithms, though having lower cost and more scalability as compared with hardware implementations, are slower. In this paper, we use parallel processing capabilities of the graphical processors to accelerate Hierarchical-Trie packet classification algorithm and propose different scenarios based on the architecture of their global and shared memories. Results of implementing these scenarios, conforming computed time and memory complexities, show that the performance of the scenarios that divide the filter set into sub-trees, equal to/ smaller than the shared memory and copy them to it, is lower than that of a scenario which keeps the total data structure in the global memory. The performance of these scenarios increases by decreasing the number of sub-trees and duplicated filters. Moreover, a scenario that can keep hierarchical tree and corresponding filters in shared memory, without any partitioning, is the best scenario. The experimental results show that, on a same GPU, this scenario attains a throughput of approximately 2.1 times compared to the

کلیدواژه‌ها [English]

Keywords: Packet classification
H-trie algorithm
graphical processing unit
CUDA
memory hierarchy
complexity
performance

مراجع

[1] D. Pao and Z. Lu, "A multi-pipeline architecture for high-speed packet classification," Computer Communications,vol. 54, pp. 84-96, 2014.

[2] B. S. Tumari and W. Lakshmipriya, "FPGA Implementation of Binary-tree-based High Speed Packet ClassificationSystem," International Journal of Combined Research & Development (IJCRD ),vol. 2, pp. 17-22, 2014.

[3] K. Zheng, H. Che, Z. Wang and B. Liu, "TCAM-based distributed parallel packet classification algorithm with range-matching solution," in INFOCOM 2005. ²⁴th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, pp. 293-303, 2005.

[4] K. Zheng, H. Che, Z. Wang, B. Liu and X. Zhang, "DPPC-RE: TCAM-based distributed parallel packet classification with range encoding," Computers, IEEE Transactions on,vol. 55, pp. 947-961, 2006.

[5] Z. Cao, M. Kodialam and T. Lakshman, "Traffic steering in software defined networks: planning and online routing," ACM SIGCOMM Computer Communication Review - SIGCOMM'14,vol. 44, pp. 65-70, 2014.

[6] K. Guerra Perez, X. Yang, S. Scott-Hayward and S. Sezer, "A configurable packet classification architecture for Software-Defined Networking," in 27th IEEE International System-on-Chip Conference (SOCC), pp. 353-358, 2014.

[7] S. Han, K. Jang and S. Moon, "PacketShader: a GPU-accelerated software router," ACM SIGCOMM Computer Communication Review,vol. 41, pp. 195-206, 2011.

[8] K. G. Perez, X. Yang, S. Scott-Hayward and S. Sezer, "Optimized packet classification for Software-Defined Networking," in International Conference on Communications (ICC), IEEE, pp. 859-864, 2014.

[9] D. E. Taylor, "Survey and taxonomy of packet classification techniques," ACM Computing Surveys (CSUR), vol. 37, pp. 238-275, 2005.

[10] S. Zhou, Y. R. Qu and V. K. Prasanna, "Multi-core implementation of decomposition-based packet classification algorithms," in Parallel Computing Technologies, vol. 7979, ed: Springer, pp. 105-119, 2013.

[11] V. Srinivasan, S. Suri and G. Varghese, "Packet classification using tuple space search," ACM SIGCOMM Computer Communication Review,vol. 29, pp. 135-146, 1999.

[12] H. Lim, Y. Choe, M. Shim and J. Lee, "A Quad-Trie Conditionally Merged with a Decision Tree for Packet Classification," Communications Letters, IEEE,vol. 18, pp. 676 - 679, 2014.

[13] سعید پارسا و محمد حمزه‌ئی، «کاشی‌بندی حلقه‌های تودرتو با در نظر گرفتن محلیت داده‌ها به‌منظور اجرای موازی بر روی پردازنده‌های چندهسته‌ای»، مجله مهندسی برق دانشگاه تبریز، جلد 45، شماره 3، صفحه 17-26، 1394.

[14] H. Lim, S. Lee and E. E. Swartzlander Jr, "A new hierarchical packet classification algorithm," Computer Networks,vol. 56, pp. 3010-3022, 2012.

[15] NVIDIA. NVIDIA CUDA Compute Unified Device Architecture Programming Guide, version 6.5, August 2015, http://docs.nvidia.com/cuda/pdf/CUDA_C_ Programming_Guide.pdf

[16] AMD: Global Provider of Innovative Graphics, Processors, August 2015 Available: http://www.amd.com

[17] Y. Li, D. Zhang, A. X. Liu and J. Zheng, "GAMT: a fast and scalable IP lookup engine for GPU-based software routers," in Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems, pp. 1-12, 2013.

[18] T.-H. Li, H.-M. Chu and P.-C. Wang, "IP address lookup using GPU," in 14th International Conference on High Performance Switching and Routing (HPSR), IEEE, pp. 177-184, 2013.

[19] R. S. Sinha, S. Singh, S. Singh and V. K. Banga, "Speedup Genetic Algorithm Using C-CUDA," in Fifth International Conference on Communication Systems and Network Technologies (CSNT), pp. 1355-1359, 2015.

[20] M. Beyeler, N. Oros, N. Dutt and J. L. Krichmar, "A GPU-accelerated cortical neural network model for visually guided robot navigation," Neural Networks, In Press,2015.

[21] Y. Lu, Y. Zhu, M. Han, J. S. He and Y. Zhang, "A survey of GPU accelerated SVM," in Proceedings of the 2014 ACM Southeast Regional Conference, pp. 15-23, 2014.

[22] P. Przymus and K. Kaczmarski, "Dynamic compression strategy for time series database using GPU," in New Trends in Databases and Information Systems, ed: Springer, pp. 235-244, 2014.

[23] G. Vasiliadis, E. Athanasopoulos, M. Polychronakis and S. Ioannidis, "PixelVault: Using GPUs for securing cryptographic operations," in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1131-1142, 2014.

[24] C.-L. Hung, Y.-L. Lin, K.-C. Li, H.-H. Wang and S.-W. Guo, "Efficient GPGPU-based parallel packet classification," in Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 1367-1374, 2011.

[25] A. Nottingham and B. Irwin, "GPU packet classification using OpenCL: a consideration of viable classification methods," in Proceedings of the 2009 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists, pp. 160-169, 2009.

[26] Y. Deng, X. Jiao, S. Mu, K. Kang and Y. Zhu, "NPGPU: Network Processing on Graphics Processing Units," in Theoretical and Mathematical Foundations of Computer Science, ed: Springer, pp. 313-321, 2011.

[27] K. Kang and Y. S. Deng, "Scalable packet classification via GPU metaprogramming," in Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1-4, 2011.

[28] D. E. Taylor and J. S. Turner, "Classbench: A packet classification benchmark," in INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies, pp. 2068-2079, 2005.

[29] S. Zhou, S. G. Singapura and V. K. Prasanna, "High-Performance Packet Classification on GPU," in High Performance Extreme Computing Conference (HPEC), pp. 1-6, 2014.

[30] M. Varvello, R. Laufer, F. Zhang and T. Lakshman, "Multi-Layer Packet Classification with Graphics Processing Units," in Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, pp. 109-120, 2014.

[31] J. Zheng, D. Zhang, Y. Li and G. Li, "Accelerate Packet Classification Using GPU: A Case Study on HiCuts," in Computer Science and its Applications, ed: Springer, pp. 231-238, 2015.

[32] احسان اولیائی ترشیزی و حسین شریفی، «ارائه دو الگوریتم دیکدینگ هیبرید جدید با عملکرد بسیار خوب و پیچیدگی بسیار کم برای دیکدینگ کدهایLDPC»، مجله مهندسی برق دانشگاه تبریز، جلد 45، شماره 4، صفحه 27-37، 1394.

[23] L. Ma, K. Agrawal and R. D. Chamberlain, "A memory access model for highly-threaded many-core architectures," Future Generation Computer Systems, vol. 30, pp. 202-215, 2014.

[34] J. S. Kirtzic, O. Daescu, "A parallel algorithm development model for the GPU architecture," in Proc. of International Conf. on Parallel and Distributed Processing Techniques and Applications, pp. 1-9, 2012.

[35] M. Amarıs, D. Cordeiro, A. Goldman and R. Y. de Camargo, "A Simple BSP-based Model to Predict Execution Time in GPU Applications," in 22nd annual IEEE International Conference on High Performance Computing (HiPC 2015), pp. 285-294, 2015.

[36] K. Nakano, "The hierarchical memory machine model for GPUs," in IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 591-600, 2013.

[37] K. Nakano, "Simple memory machine models for GPUs," International Journal of Parallel, Emergent and Distributed Systems,vol. 29, pp. 17-37, 2014.

[38] S. A. Haque and N. Xie, "A many-core machine model for designing algorithms with minimum parallelism overheads," in arXiv preprint arXiv:1402.0264,2014.

[39] S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," in ACM SIGARCH Computer Architecture News, pp. 152-163, 2009.

[40] W. Liu, W. Müller-Wittig and B. Schmidt, "Performance predictions for general-purpose computation on GPUs," in International Conference on Parallel Processing (ICPP), pp. 50-50, 2007.

[41] L. Ma, R. D. Chamberlain and K. Agrawal, "Performance modeling for highly-threaded many-core GPUs," in 25th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 84-91, 2014.

[42] S. H. Roosta, Parallel processing and parallel algorithms: theory and computation, Springer Science & Business Media, 2012.

[43] D. A. Jacobsen, J. C. Thibault and I. Senocak, "An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters," in 48th AIAA aerospace sciences meeting and exhibit, pp. 1-16, 2010.

[44] M. Bernaschi, M. Bisson and M. Fatica, "Colloquium: Large scale simulations on GPU clusters," The European Physical Journal B,vol. 88, pp. 1-10, 2015.

دوره 46، شماره 3 - شماره پیاپی 77
آذر 1395
صفحه 181-196

تعداد مشاهده مقاله: 4,549
تعداد دریافت فایل اصل مقاله: 6,535

روشی کارا برای پیاده‌سازی موازی الگوریتم دسته بندی بسته درخت سلسله‌مراتبی بر روی واحد پردازش گرافیکی

An Efficient Method for Parallel Implementation of H-Trie Packet Classification Algorithm on GPU

مراجع

دوره 46، شماره 3 - شماره پیاپی 77
آذر 1395
صفحه 181-196

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

روشی کارا برای پیاده‌سازی موازی الگوریتم دسته بندی بسته درخت سلسله‌مراتبی بر روی واحد پردازش گرافیکی

An Efficient Method for Parallel Implementation of H-Trie Packet Classification Algorithm on GPU

مراجع

دوره 46، شماره 3 - شماره پیاپی 77آذر 1395صفحه 181-196

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 46، شماره 3 - شماره پیاپی 77
آذر 1395
صفحه 181-196