Lossless text compression using GPT-2 language model and Huffman coding

MA Rahman, M Hamada - SHS Web of Conferences, 2021 - shs-conferences.org
SHS Web of Conferences, 2021shs-conferences.org
Modern daily life activities produced lots of information for the advancement of
telecommunication. It is a challenging issue to store them on a digital device or transmit it
over the Internet, leading to the necessity for data compression. Thus, research on data
compression to solve the issue has become a topic of great interest to researchers.
Moreover, the size of compressed data is generally smaller than its original. As a result, data
compression saves storage and increases transmission speed. In this article, we propose a …
Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.
shs-conferences.org
以上显示的是最相近的搜索结果。 查看全部搜索结果