Searchable Encryption Token Extraction for Deep Packet Inspection over Compressed HTTPS

Document Type : Original Article

Authors

1 Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran

2 Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran,

Abstract

Deep packet inspection is an unavoidable task to prevent the web-based attacks. HTTPS traffic constitutes a significant percentage of web traffic. In order to perform the deep packet inspection on HTTPS traffic, searchable encryption is used. Using searchable encryption, there is no need to decrypt the packet contents; hence the privacy is preserved. Besides, a significant percentage of HTTPS traffic is compressed before reaching the SSL layer. Compression consists of two stages: LZ77 compression and Huffman encoding. For compressed traffic, required tokens for performing searchable encryption are not accessible and the full decompression should be applied to access the tokens. In this case, token extraction is done by parsing a nondeterministic finite automata (NFA) on the decompressed contents. The goal of this paper is to decrease the time complexity of NFA parsing process. In the proposed method, instead of full decompression of the compressed traffic, at first Huffman decoding is applied on the packets to access the LZ77 compressed content. Then, the LZ77 pointers can be used to recognize the repeated substrings and to skip the NFA parsing process. Hence, the token extraction process is accelerated. The evaluations show that the proposed method decreases 65% of time of compressed HTTPS token extraction by skipping the 44% of characters.

Keywords


[1]
Snort: The Open Source Network Intrusion Detection System, https://snort.org/ [Accessed  September 20, 2016].
[2]
HTTPS at Google, https://www.google.com/transparencyreport/ https/ [Accessed September 20, 2016].
[3]
Speed Web Delivery with HTTP Compression, https://www.ibm. com/developerworks/library/wa-httpcomp/ [Accessed January 6, 2018].
[4]
Usage of Site Elements for Websites, https://w3techs.com/ technologies/overview/siteelement/all [Accessed January 6, 2018].
[5]
J. Sherry, C. Lan, R. Ada Popa and S. Ratnasamy, “BlindBox: Deep Packet Inspection over Encrypted Traffic,” SIGCOMM '15 Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, New York, NY, USA, 2015.
[6]
P. Linz, An Introduction to Formal Languages and Automata, Jones & Bartlett Learning, 5th Edition, 2011.
[7]
F. Yu, Z. Chen, Y. Diao, T. V. Lakshman and R. H. Katz, “Fast and memory-efficient regular expression matching for deep packet inspection,” ACM/IEEE Symposium on Architecture for Networking and Communications systems, San Jose, CA, USA, 2006.
[8]
M. Becchi, A. Bremler-Barr, D. Hay, O. Kochba and Y. Koral, “Accelerating regular expression matching over compressed HTTP,” IEEE Conference on Computer Communications (INFOCOM), Hong Kong, 26 April - 1 May 2015.
[9]
P. Deutsch, “RFC 1952: GZIP file format specification version 4.3,” Aladdin Enterprises, Network Working Group, May 1996.
[10]
P. Deutsch, “RFC 1951: DEFLATE compressed data format specification version 1.3,” Aladdin Enterprises, Network Working Group, May 1996.
[11]
Usage Statistics of Compression for Websites,  https://w3techs.com/technologies/details/ce-compression/all/all [Accessed September 20, 2016].
[12]
D. E. Knuth, “Dynamic huffman coding,” Journal of Algorithms, vol. 6, no. 2, pp. 163-180, 1985.
[13]
Z. Jacob and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Transactions on Information Theory, vol. 23, no. 3, pp. 337-343, 1977.
[14]
C. Lan, J. Sherry, R. Ada Popa, S. Ratnasamy and Z. Liu, “Embark: securely outsourcing middleboxes to the cloud,” 13th USENIX Symposium on Networked Systems Design and Implementation, Santa Clara, CA, March 16–18, 2016.
[15]
X. Yuan, X. Wang, J. Lin and C. Wang, “Privacy-preserving deep packet inspection in outsourced middleboxes,” IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, San Francisco, CA, 10-15 April 2016.
[16]
P. Nishad and S. Sankar, “Efficient random sampling statistical method to improve big data compression ratio and pattern matching techniques for compressed data,” International Journal of Computer Science and Information Security, vol. 14, no. 9, pp. 179-184, 2016.
[17]
A. Bremler-Barr and Y. Koral, “Accelerating multipattern matching on compressed HTTP traffic,” IEEE/ACM Transactions on Networking , vol. 20, no. 3, pp. 970 - 983, 2012.
[18]
A. Bremler-Barr, K. Yaron and Z. Victor, “Shift-based pattern matching for compressed web traffic,” 12th International Conference on High Performance Switching and Routing., Cartagena, Spain, 4-6 July 2011.
[19]
W. Sun and U. Manber, “A fast algorithm for multi-pattern searching,” Technical Report TR-94-17, Department of Computer Science, University of Arizona, 1996.
[20]
H. Peng, J. Li, B. Li and M. H. Arif, “Fast multi-pattern matching algorithm on compressed network traffic,” China Communic, vol. 13, no. 5, pp. 141-150, 2016.
[21]
Alexa Site, http://www.alexa.com/, [Accessed January 6, 2018].
[22]
Zlib Site, https://zlib.net/, [Accessed January 6, 2018].
[23]      عاتکه گشوارپور, عطااله عباسی و عاطفه گشوارپور, «بررسی تفاوت‌های پاسخ به تحریکات تصویری دارای بار احساسی در زنان و مردان با استفاده از آزمون آماری ویلکاکسون». مجله مهندسی برق دانشگاه تبریز، دوره 47، شماره 2، صفحات 695-687، 1396.
[24]      وحید رافع و سجاد اسفندیاری, «راهکاری نوین جهت تولید دنباله آزمون کمینه در فرآیند آزمون نرم افزار با ترکیب الگوریتم‌های جستجوی تپه نوردی و جستجوی خفاش»، مجله مهندسی برق دانشگاه تبریز، دوره 46، شماره 3، صفحات 35-25، 1395.