Vistors:

USTC-TD: A Test Dataset and Benchmark for Image and Video Coding in 2020s

University of Science and Technology of China (USTC),
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition,
Intelligent Visual Data Coding Laboratory (iVC Lab)

Figure 1. Examples of USTC Video Test Dataset.


Abstract

Image/video coding has been a remarkable research area for both academia and industry for many years. Testing datasets, especially high-quality image/video datasets are desirable for the justified evaluation of coding-related research, practical applications, and standardization activities. We put forward a test dataset namely USTC-TD, which has been successfully adopted in the practical end-to-end image/video coding challenge of the IEEE International Conference on Visual Communications and Image Processing (VCIP) in 2022 and 2023. USTC-TD contains 40 images at 4K spatial resolution and 10 video sequences at 1080p spatial resolution, featuring various content due to the diverse environmental factors (e.g. scene type, texture, motion, view) and the designed imaging factors (e.g. illumination, lens, shadow). We quantitatively evaluate USTC-TD on different image/video features (spatial, temporal, color, lightness), and compare it with the previous image/video test datasets, which verifies its excellent compensation for the shortcomings of existing datasets. We also evaluate both classic standardized and recently learned image/video coding schemes on USTC-TD using objective quality metrics (PSNR, MS-SSIM, VMAF) and subjective quality metric (MOS), providing an extensive benchmark for these evaluated schemes. Based on the characteristics and specific design of the proposed test dataset, we analyze the benchmark performance and shed light on the future research and development of image/video coding. All the data are released online on this website.

Image Test Dataset

Our proposed dataset aims to cover various scenarios, and try to collect and simulate the data in the real-world coding transmission scenes, which makes the evaluation of image coding schemes more closer to the actual application.

Considering the various content elements, we combine different environmental conditions (scene type, texture, view, etc) and captured conditions (resolution, illumination, lens, shadow, etc) in the collection process.

Figure 2. Illustration of the image dataset in USTC-TD 2022.


Compared to USTD-TD 2022, USTC-TD 2023 considers more extreme elements in real-world scenes.

Figure 3. Illustration of the image dataset in USTC-TD 2023.


Video Test Dataset

Based on the characteristics of previous video datasets, our proposed dataset aims to cover more typical characteristics of video content. Compared to the image data, temporal-domain properties are unique to video, especially in the diverse motion types with more environmental conditions in natural videos. There are usually multiple moving objects of arbitrary shapes and various motion types in video frames, leading to complex motion fields, which challenge the video coding schemes. Therefore, we simulate the video data with various temporal correlation types, including different kinds of motion types and lens motion.

Figure 4. Illustration of the video dataset in USTC-TD 2022.


Dataset Details

Construction

Based on the characteristics of previous image/video datasets, our proposed dataset aims to cover various scenarios, and try to collect and simulate the data in the real-world coding transmission scenes, which makes the evaluation of image/video coding schemes more closer to the actual application.

TABLE 1. THE CONFIGURATION OF USTC-TD 2022 IMAGE DATASET.

TABLE 2. THE CONFIGURATION OF USTC-TD 2023 IMAGE DATASET.

TABLE 3. THE CONFIGURATION OF USTC-TD 2023 VIDEO DATASET.

Analysis

To comprehensively verify the wide coverage of our proposed dataset for various content elements and qualitatively analyze the superiority of USTC-TD, we evaluate the USTC-TD on different image/video features and compare it with the previous image/video common test datasets (image datasets: Kodak, CLIC, Tecnick, video datasets: HEVC CTC, VVC CTC, MCL-JCV, UVG). For analysis of image/video features, we select the spatial information (SI), colorfulness (CF), lightness information (LI), and temporal information (TI) to characterize each dataset along the dimensions of space, color, lightness, and temporal correlation, which are commonly used to evaluate the quality of dataset.


Figure 5. The visualization of the evaluation of spatial information (SI) and colorfulness (CF) features on different image test datasets. Scatter diagram represents the SI versus CF, and corresponding convex hulls indicates the coverage of different datasets. The histogram represents the number of images under different SI scores.


Figure 6. The visualization of the evaluation of lightness information (LI) and CF features on different image test datasets. Scatter diagram represents the LI versus CF, and corresponding convex hulls indicates the coverage of different
datasets. The histogram represents the number of images under different LI scores.



Figure 7. The visualization of the evaluation of temporal information (TI) and SI features on different video test datasets. Scatter diagram represents the TI versus SI, and corresponding convex hulls indicates the coverage of different datasets. The histogram represents the number of videos under different TI scores.


TABLE 4. QUANTITATIVE RESULTS OF THE USTC-TD 2022 AND 2023 IMAGE DATASETS. NOTE THAT THE HIGHER SCORES ARE REPRESENTED IN RED, AND THE LOWER SCORES ARE REPRESENTED IN BLUE.

TABLE 5. QUANTITATIVE RESULTS OF THE USTC-TD 2023 VIDEO DATASETS. NOTE THAT THE HIGHER SCORES ARE REPRESENTED IN RED, AND THE LOWER SCORES ARE REPRESENTED IN BLUE.

Dataset Recommendation

Here we discuss their desirable collaboration for future compression research, and further put forward the different recommended presets of USTC-TD and existing image/video datasets with the consideration of practical utilization.


TABLE 6. THREE RECOMMENDED PRESETS OF THE EXISTING IMAGE/VIDEO DATASETS AND THE PROPOSED USTC-TD FOR THE PRACTICAL EVALUATION OF IMAGE/VIDEO COMPRESSION SCHEMES.

Dataset Evaluation

In this section, we establish the baselines and evaluate recent advanced state-of-the-art learned image/video compression algorithms, and standardization activities with different metrics (PSNR, MS-SSIM, et al.), and comprehensively benchmark their performance on our proposed datasets.


Rate-distortion Curves

Figure 8. Overall rate-distortion (RD) curves of advanced image compression schemes on different metrics. From left to right, the results are evaluated by PSNR, MS-SSIM, VMAF (MSE model), and VMAF (MS-SSIM model) metrics on USTC-TD image dataset.

Figure 9. Overall rate-distortion (RD) curves of advanced video compression schemes on different metrics. From left to right, the results are evaluated by PSNR, MS-SSIM, VMAF (MSE model), and VMAF (MS-SSIM model) metrics on USTC-TD video dataset.

Figure 10. Overall rate-distortion (RD) curves of INR-based video compression schemes on different metrics. From left to right, the results are evaluated by PSNR and MS-SSIM metrics on USTC-TD video dataset.

BD-RATE for PSNR/MS-SSIM Metrics

TABLE 7. BD-RATE (%) COMPARISON FOR PSNR. SHORT SETTING WITH
INTRA PERIORD = 32. THE ANCHOR IS VTM.
TABLE 8. BD-RATE (%) COMPARISON FOR MS-SSIM. SHORT SETTING WITH
INTRA PERIORD = 32. THE ANCHOR IS VTM.
TABLE 9. BD-RATE (%) COMPARISON FOR VMAF (MSE model), VMAF (MS-SSIM model). SHORT SETTING WITH
INTRA PERIORD = 32. THE ANCHOR IS VTM.
TABLE 10. BD-RATE (%) COMPARISON FOR PSNR, MS-SSIM, VMAF (MSE model), VMAF (MS-SSIM model). SHORT SETTING WITH
INTRA PERIORD = 32. THE ANCHOR IS VTM.
TABLE 11. BD-RATE (%) COMPARISON FOR PSNR, MS-SSIM, VMAF (MSE model), VMAF (MS-SSIM model). SHORT SETTING WITH INTRA PERIORD = -1. THE ANCHOR IS VTM.

MOS results for advanced image and video compression schemes

TABLE 12. THE OVERALL MOS RESULTS OF COMPRESSED IMAGES OF CLASSIC STANDARDIZED AND ADVANCED LEARNED IMAGE COMPRESSION SCHEMES, WHERE BLUE REPRESENTS THE LOWEST SCORE AND RED REPRESENTS THE HIGHEST SCORE.
TABLE 13. THE OVERALL MOS RESULTS OF COMPRESSED VIDEOS OF CLASSIC STANDARDIZED AND ADVANCED LEARNED VIDEO COMPRESSION SCHEMES, WHERE BLUE REPRESENTS THE LOWEST SCORE AND RED REPRESENTS THE HIGHEST SCORE.

BibTeX

@arxiv{USTC-TD,
    author    = {Zhuoyuan Li*, Junqi Liao*, Xihua Sheng, Haotian Zhang, Yuqi Li, Chuanbo Tang, Yifan Bian, Xinmin Feng, Yao Li, Changsheng Gao, Li Li, and Dong Liu},
    title     = {USTC-TD: USTC Test Dataset for Image and Video Coding in 2020s},
    booktitle = {arxiv},
    year      = {2024},
}
            

Contributors

Supervisors

Dong Liu

Li Li

Changsheng Gao



Students

Zhuoyuan Li
Ph.D.

Junqi Liao
Ph.D.

Chuanbo Tang
Ph.D.

Haotian Zhang
Ph.D.

Yuqi Li
M.S.

Yifan Bian
Ph.D.

Xihua Sheng
Ph.D.

Yao Li
Ph.D.

Xinmin Feng
M.S.

Acknowledgement

We appreciate the utilization and support of some organizations, and also thanks to the supervisors and USTC’s volunteers featured in the contribution of this dataset.


Actors: Cunhui Dong, Ziyi Zhuang, Feihong Mei, Qiaoxi Chen, Bojun Liu.


Testers: Jialin Li, Xiongzhuang Liang.


Organizations: IEEE International Conference on Visual Communications and lmage Processing (VCIP) 2022 and 2023.


Thanks to the IEEE Dataport , we have submitted the data to this open-sourced dataset website for the convenient access of the IEEE community's researchers.

Copyright

The released images and sequences are captured and processed by the University of Science and Technology of China (USTC). All intellectual property rights remain with USTC. If the users need our datasets for their works, please cite the datasets paper or this website.


The following uses are allowed for the contributed dataset:
     1. Data (images and videos) may be published in research papers, technical reports, and development events.
     2. Data (images and videos) may be utilized by standardization activities. (e.g., ITU4, MPEG5, AVS6, VQEG7, et al.).


The following uses are NOT allowed for the contributed dataset:
     1. Do not publish snapshots in product brochures.
     2. Do not use video for marketing purposes.
     3. Redistribution is not permitted.
     4. Do not use it in television shows, commercials, or movies.

Contact

If you have any questions or advice on these datasets, please contact us:
Zhuoyuan Li: email-zhuoyuanli@mail.ustc.edu.cn, wechat-ustc_lizhuoyuan
Junqi Liao: email-liaojq@mail.ustc.edu.cn, wechat-liaojq98


If you have any questions or advice on this website, please contact Yifan Bian.