Rate-distortion optimization for cross modal compression

J Gao, C Jia, S Wang, S Ma… - 2023 Data Compression …, 2023 - ieeexplore.ieee.org
2023 Data Compression Conference (DCC), 2023ieeexplore.ieee.org
Recently, cross modal compression (CMC) is proposed to compress highly redundant visual
data into a compact, common, human-comprehensible domain (such as text) to preserve
semantic fidelity for semantic-related applications. However, CMC only achieves a certain
level of semantic fidelity at a constant rate, and the model aims to optimize the probability of
the ground truth text but not directly semantic fidelity. To tackle the problems, we propose a
novel scheme named rate-distortion optimized CMC (RDO-CMC). Specifically, we model the …
Recently, cross modal compression (CMC) is proposed to compress highly redundant visual data into a compact, common, human-comprehensible domain (such as text) to preserve semantic fidelity for semantic-related applications. However, CMC only achieves a certain level of semantic fidelity at a constant rate, and the model aims to optimize the probability of the ground truth text but not directly semantic fidelity. To tackle the problems, we propose a novel scheme named rate-distortion optimized CMC (RDO-CMC). Specifically, we model the text generation process as a Markov decision process and propose rate-distortion reward which is used in reinforcement learning to optimize text generation. In rate-distortion reward, the distortion measures both the semantic fidelity and naturalness of the encoded text. The rate for the text is estimated by the sum of the amount of information of all the tokens in the text since the amount of information of each token is a lower bound of coding bits. Experimentally, RDO-CMC effectively controls the rate in the CMC framework and achieves competitive performance on MSCOCO dataset.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果