nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo searchdiv qikanlogo popupnotification paper paperNew
2025, 05, v.51 1016-1024
基于多任务的图像语义传输方法
基金项目(Foundation): 甘肃省科技重大专项(22ZD6GA041)~~
邮箱(Email):
DOI:
摘要:

近年来,基于Transformer的视觉模型,如Swin Transformer,在视觉任务中展现出良好的前景,然而这些方法通常侧重于减少原始数据与重建数据之间的信号失真,而忽略感知质量。针对传统均方误差(Mean Square Error, MSE)损失难以反映图像感知与语义质量的不足,设计了MSE与学习感知图像块相似度(Learned Perceptual Image Patch Similarity, LPIPS)的加权组合损失函数,从而构建基于Swin Transformer的语义通信框架,称为融合感知损失的联合信源信道编码(Swin Transformer with LPIPS-based Joint Source-Channel Coding, STL-JSCC)方法,显著提升了图像重建质量与语义还原能力。在性能评估方面,设计了图像语义偏差值(Images Semantic Deviation, ISD)与语义相似度(Images Semantic Similarity, ISS)2项指标,构建联合感知-语义评估体系,突破传统评价方法局限。实验结果表明,提出的STL-JSCC在各项指标上均优于其他模型,验证了所提方法在提升图像重建质量和语义提取能力上所具有的显著潜力和优势。

Abstract:

In recent years, Transformer-based visual models(e.g., Swin Transformer) show good prospects in visual tasks, however, these methods usually focus on reducing signal distortion between original and reconstructed data, while ignoring perceptual quality. Considering that the conventional Mean Square Error(MSE) loss fails to reflect perceptual and semantic quality effectively, we propose a weighted loss function combining MSE and Learned Perceptual Image Patch Similarity(LPIPS), and accordingly construct a Swin Transformer-based semantic communication framework, called Swin Transformer with LPIPS-based Joint Source-Channel Coding(STL-JSCC) method, which significantly enhances image reconstruction quality and semantic consistency. For performance evaluation, two semantic-aware metrics are introduced: the Images Semantic Deviation(ISD) value and Iamges Semantic Similarity(ISS). These indicators form a joint perceptual-semantic evaluation system, which breaks through the limitations of traditional evaluation methods. Experimental results show that the proposed STL-JSCC outperforms other models in all the indexes, verifying the significant potential and advantages of the proposed method in improving the image reconstruction quality and semantic extraction capability.

参考文献

[1] YE H,LIANG L,LI G Y,et al.Deep Learning-based End-to-End Wireless Communication Systems with Conditional GANs as Unknown Channels[J].IEEE Transactions on Wireless Communications,2020,19(5):3133-3143.

[2] 陈建侨,马楠,许晓东,等.面向语义通信的信道知识库构建与信道处理研究综述[J].无线电通信技术,2024,50(3):519-527.

[3] SHANNON C E.A Mathematical Theory of Communication[J].The Bell System Technical Journal,1948,27(3):379-423.

[4] FRESIA M,PEREZ-CRUZ F,POOR H V,et al.Joint Source and Channel Coding[J].IEEE Signal Processing Magazine,2010,27(6):104-113.

[5] BOURTSOULATE E,KURKA D B,GUNDUZ D.Deep Joint Source-channel Coding for Wireless Image Transmission[J].IEEE Transactions on Cognitive Communications and Networking,2019,5(3):567-579.

[6] TUNG T Y,KURKA D B,JANKOWSKI M,et al.DeepJSCC-Q:Constellation Constrained Deep Joint Source-Channel Coding[J].IEEE Journal on Selected Areas in Information Theory,2022,3(4):720-731.

[7] TUNG T Y,GUNDUZ D.DeepWiVe:Deep-learning-aided Wireless Video Transmission[J].IEEE Journal on Selected Areas in Communications,2022,40(9):2570-2583.

[8] YANG M Y,BIAN C H,KIM H S.OFDM-guided Deep Joint Source Channel Coding for Wireless Multipath Fading Channels[J].IEEE Transactions on Cognitive Communications and Networking,2022,8(2):584-599.

[9] KURKA D B,GUNDUZ D.Bandwidth-agile Image Transmission with Deep Joint Source-channel Coding[J].IEEE Transactions on Wireless Communications,2021,20(12):8081-8095.

[10] XU J L,AI B,C W,et al.Wireless Image Transmission Using Deep Source Channel Coding with Attention Modules[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(4):2315-2328.

[11] ZHANG P,XU W J,GAO H,et al.Toward Wisdom-evolutionary and Primitive-concise 6G:A New Paradigm of Semantic Communication Networks[J].Engineering,2022,8:60-73.

[12] YANG M Y,KIM H S.Deep Joint Source-channel Coding for Wireless Image Transmission with Adaptive Rate Control[EB/OL].(2021-10-09)[2025-04-29].https://arxiv.org/abs/2110.04456.

[13] ZHANG W Y,ZHANG H J,MA H,et al.Predictive and Adaptive Deep Coding for Wireless Image Transmission in Semantic Communication[J].IEEE Transactions on Wireless Communications,2023,22(8):5486-5501.

[14] HAN K,WANG Y H,CHEN H T,et al.A Survey on Vision Transformer[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(1):87-110.

[15] YOO H J,DAI L L,KIM S K,et al.On the Role of ViT and CNN in Semantic Communications:Analysis and Prototype Validation[J].IEEE Access,2023,11:71528-71541.

[16] 刘铁,段勇.融合CNN和Transformer的机器人室内场景识别[J].电子测量与仪器学报,2023,37(5):223-229.

[17] YANG K,WANG S X,DAI J C,et al.WITT:A Wireless Image Transmission Transformer for Semantic Communications[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics,Speech and Signal Processing.Rhodes Island:ICASSP,2023:1-5.

[18] LIU X Y,WU Y,LIANG W K,et al.High Resolution SAR Image Classification Using Global-local Network Structure Based on Vision Transformer and CNN[J].IEEE Geoscience and Remote Sensing Letters,2022,19:1-5.

[19] BLAU Y,MICHAELI T.Rethinking Lossy Compression:The Rate-distortion-perception Tradeoff[C]//International Conference on Machine Learning.Long Beach:PMLR,2019:675-685.

[20] LEDIG C,THEIS L,HUSZáR F,et al.Photo-realistic Single Image Super-resolution Using a Generative Adversarial Network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:4681-4690.

[21] ZHANG R,ISOLA P,EFROS A A,et al.The Unreasonable Effectiveness of Deep Features as a Perceptual Metric[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:586-595.

[22] LIU Z,LIN Y T,CAO Y,et al.Swin Transformer:Hierarchical Vision Transformer Using Shifted Windows[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV).Montreal:IEEE,2021:9992-10002.

[23] DING K Y,MA K D,WANG S Q,et al.Comparison of Full-reference Image Quality Models for Optimization of Image Processing Systems[J].International Journal of Computer Vision,2021,129(4):1258-1281.

[24] ZHOU W,SIMONCELLI E P,BOVIK A C.Multiscale Structural Similarity for Image Quality Assessment[C]//The Thrity-Seventh Asilomar Conference on Signals,Systems & Computers.Pacific Grove:CIEEE,2003:1398-1402.

[25] ZHANG Z J.Improved Adam Optimizer for Deep Neural Networks[C]//Proceedings of 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS).Banff:IEEE,2018:1-2.

[26] THECKEDATH D,SEDAMKAR R R.Detecting Affect States Using VGG16,ResNet50 and SE-ResNet50 Networks[J].SN Computer Science,2020,1(2):79.

基本信息:

DOI:

中图分类号:TP391.41

引用信息:

[1]伍忠东,甘炳坤,王鹏波等.基于多任务的图像语义传输方法[J].无线电通信技术,2025,51(05):1016-1024.

基金信息:

甘肃省科技重大专项(22ZD6GA041)~~

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文