前言

入了编码的坑,逃不开使用目前最前沿的传统编解码器的测试软件。与大多数人不同的是,我的尝试竟然是从Scalable Video Coding开始的。本篇文章记录一下SHM-12.4(HEVC Scalable Extension)的安装和使用过程。

安装

下载

这里提供一下SHM的所有版本的SVN版本管理仓库SHM

SHM所有版本

我是下载了最新的版本,即SHM-12.4。这里需要注意的是,SHM的代码通过svn管理,不能够直接下载,需要使用svn的工具clone仓库来得到,关于svn的工具可以自行搜索啦。

编译

我是在Linux发行版Manjaro的机器上编译的,在根文件夹下使用make命令就可以很简单的编译。这里需要注意的一点是,在linux下的编译SHM或者HM,需要到linux文件夹下删除common文件夹下makefile.base的-Werror选项,再在linux下运行终端make编译,否则会因为warning而阻断编译过程。

对于Windows环境下的编译,我也尝试过,使用visual studio 2019打开build文件夹下的vc2017.sln解决方案,然后和其他cpp项目一样进行编译即可。

使用

添加环境变量可以方便使用编解码器,或者设置alias。
这里主要说一下命令。SHM的编码和HM的编码有一点点不同。

主要配置文件

在HM中,譬如要与端到端学习方法的网络作性能对比,我们会使用low_delay_p的参考配置文件。在SHM中,对yuv444没有提供low_delay_p的配置文件,只能通过提供的Rext的random access配置文件修改得到。其实就是改一下
两个层开头的profile项。我使用的编码yuv444的low_delay_p配置文件如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#======== File I/O =====================
BitstreamFile : str.bin
#ReconFile : rec.yuv

#======== Profile ================
NumProfileTierLevel : 3
Profile0 : main_444 # Profile for BL (NOTE01: this profile applies to whole layers but only BL is outputted)
# (NOTE02: this profile has no effect when NonHEVCBase is set to 1)
Profile1 : main_444 # Profile for BL (NOTE01: this profile applies to HEVC BL only)
# (NOTE02: When NonHEVCBase is set to 1, this profile & associated level should be updated appropriately)
Profile2 : scalable-main_444 # Scalable profile

#======== Unit definition ================
#MaxCUWidth : 64 # Maximum coding unit width in pixel
#MaxCUHeight : 64 # Maximum coding unit height in pixel
#MaxPartitionDepth : 4 # Maximum coding unit depth
#QuadtreeTULog2MaxSize : 5 # Log2 of maximum transform size for
# quadtree-based TU coding (2...6)
#QuadtreeTULog2MinSize : 2 # Log2 of minimum transform size for
# quadtree-based TU coding (2...6)
#QuadtreeTUMaxDepthInter : 3
#QuadtreeTUMaxDepthIntra : 3

#======== Coding Structure =============
#IntraPeriod : -1 # Period of I-Frame ( -1 = only first)
DecodingRefreshType : 0 # Random Accesss 0:none, 1:CRA, 2:IDR, 3:Recovery Point SEI
GOPSize : 4 # GOP Size (number of B slice = GOPSize-1)
# Type POC QPoffset CbQPoffset CrQPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2 temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs
Frame1: P 1 3 0 0 0.4624 0 0 0 4 4 -1 -5 -9 -13 0
Frame2: P 2 2 0 0 0.4624 0 0 0 4 4 -1 -2 -6 -10 1 -1 5 1 1 1 0 1
Frame3: P 3 3 0 0 0.4624 0 0 0 4 4 -1 -3 -7 -11 1 -1 5 0 1 1 1 1
Frame4: P 4 1 0 0 0.578 0 0 0 4 4 -1 -4 -8 -12 1 -1 5 0 1 1 1 1

#=========== Motion Search =============
FastSearch : 1 # 0:Full search 1:TZ search
SearchRange : 64 # (0: Search range is a Full frame)
BipredSearchRange : 4 # Search range for bi-prediction refinement
HadamardME : 1 # Use of hadamard measure for fractional ME
FEN : 1 # Fast encoder decision
FDM : 1 # Fast Decision for Merge RD cost

...

完整的文件之后会提供。

针对每个序列的配置文件

这里我取名为test.cfg是对每个特定序列的配置文件,主要需要提供输入文件路径、分辨率还有帧率等控制信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
FrameSkip                     : 0           # Number of frames to be skipped in input
FramesToBeEncoded : 36 # Number of frames to be coded

Level0 : 4 # Level of the whole bitstream
Level1 : 4 # Level of the base layer
Level2 : 4 # Level of the enhancement layer

#======== File I/O ===============
InputFile0 : E:\dataset\HEVC_yuv444_ds\ClassD\BasketballPass_208x120_50.yuv
FrameRate0 : 50 # Frame Rate per second
InputBitDepth0 : 8 # Input bitdepth for layer 0
InputChromaFormat : 444 # Ratio of luminance to chrominance samples
SourceWidth0 : 208 # Input frame width
SourceHeight0 : 120 # Input frame height
RepFormatIdx0 : 0 # Index of corresponding rep_format() in the VPS
IntraPeriod0 : 12 # Period of I-Frame ( -1 = only first)
ConformanceMode0 : 1 # conformance mode
QP0 : 22
LayerPTLIndex0 : 1

InputFile1 : E:\dataset\HEVC_yuv444\ClassD\BasketballPass_416x240_50.yuv
FrameRate1 : 50 # Frame Rate per second
InputBitDepth1 : 8 # Input bitdepth for layer 1
InputChromaFormat : 444 # Ratio of luminance to chrominance samples
SourceWidth1 : 416 # Input frame width
SourceHeight1 : 240 # Input frame height
RepFormatIdx1 : 1 # Index of corresponding rep_format() in the VPS
IntraPeriod1 : 12 # Period of I-Frame ( -1 = only first)
ConformanceMode1 : 1 # conformance mode
QP1 : 22
LayerPTLIndex1 : 2

Layers配置

这是HM中所没有的配置文件,主要给出scalable的设置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
NumLayers                     : 2
NonHEVCBase : 0
ScalabilityMask1 : 0 # Multiview
ScalabilityMask2 : 1 # Scalable
ScalabilityMask3 : 0 # Auxiliary pictures
AdaptiveResolutionChange : 0 # Resolution change frame (0: disable)
SkipPictureAtArcSwitch : 0 # Code higher layer picture as skip at ARC switching (0: disable (default), 1: enable)
MaxTidRefPresentFlag : 1 # max_tid_ref_present_flag (0=not present, 1=present(default))
CrossLayerPictureTypeAlignFlag: 1 # Picture type alignment across layers
CrossLayerIrapAlignFlag : 1 # Align IRAP across layers
SEIDecodedPictureHash : 1

#============= LAYER 0 ==================
QP0 : 22
MaxTidIlRefPicsPlus10 : 2 # max_tid_il_ref_pics_plus1 for layer0
#============ Rate Control ==============
RateControl0 : 0 # Rate control: enable rate control for layer 0
TargetBitrate0 : 1000000 # Rate control: target bitrate for layer 0, in bps
KeepHierarchicalBit0 : 1 # Rate control: keep hierarchical bit allocation for layer 0 in rate control algorithm
LCULevelRateControl0 : 1 # Rate control: 1: LCU level RC for layer 0; 0: picture level RC for layer 0
RCLCUSeparateModel0 : 1 # Rate control: use LCU level separate R-lambda model for layer 0
InitialQP0 : 0 # Rate control: initial QP for layer 0
RCForceIntraQP0 : 0 # Rate control: force intra QP to be equal to initial QP for layer 0

#============ WaveFront ================
WaveFrontSynchro0 : 0 # 0: No WaveFront synchronisation (WaveFrontSubstreams must be 1 in this case).
# >0: WaveFront synchronises with the LCU above and to the right by this many LCUs.
#=========== Quantization Matrix =================
ScalingList0 : 0 # ScalingList 0 : off, 1 : default, 2 : file read
ScalingListFile0 : scaling_list0.txt # Scaling List file name. If file is not exist, use Default Matrix.

#============= LAYER 1 ==================
QP1 : 20
NumSamplePredRefLayers1 : 1 # number of sample pred reference layers
SamplePredRefLayerIds1 : 0 # reference layer id
NumMotionPredRefLayers1 : 1 # number of motion pred reference layers
MotionPredRefLayerIds1 : 0 # reference layer id
NumActiveRefLayers1 : 1 # number of active reference layers
PredLayerIds1 : 0 # inter-layer prediction layer index within available reference layers

#============ Rate Control ==============
RateControl1 : 0 # Rate control: enable rate control for layer 1
TargetBitrate1 : 1000000 # Rate control: target bitrate for layer 1, in bps
KeepHierarchicalBit1 : 1 # Rate control: keep hierarchical bit allocation for layer 1 in rate control algorithm
LCULevelRateControl1 : 1 # Rate control: 1: LCU level RC for layer 1; 0: picture level RC for layer 1
RCLCUSeparateModel1 : 1 # Rate control: use LCU level separate R-lambda model for layer 1
InitialQP1 : 0 # Rate control: initial QP for layer 1
RCForceIntraQP1 : 0 # Rate control: force intra QP to be equal to initial QP for layer 1

#============ WaveFront ================
WaveFrontSynchro1 : 0 # 0: No WaveFront synchronisation (WaveFrontSubstreams must be 1 in this case).
# >0: WaveFront synchronises with the LCU above and to the right by this many LCUs.
#=========== Quantization Matrix =================
ScalingList1 : 0 # ScalingList 0 : off, 1 : default, 2 : file read
ScalingListFile1 : scaling_list1.txt # Scaling List file name. If file is not exist, use Default Matrix.

NumLayerSets : 2 # Include default layer set, value of 0 not allowed
NumLayerInIdList1 : 2 # 0-th layer set is default, need not specify LayerSetLayerIdList0 or NumLayerInIdList0
LayerSetLayerIdList1 : 0 1

NumAddLayerSets : 0
NumOutputLayerSets : 2 # Include defualt OLS, value of 0 not allowed
DefaultTargetOutputLayerIdc : 1
NumOutputLayersInOutputLayerSet : 1 # The number of layers in the 0-th OLS should not be specified,
# ListOfOutputLayers0 need not be specified
ListOfOutputLayers1 : 1
ListOfProfileTierLevelOls1 : 1 2

编码命令

一次测试编码的命令可以如下:

1
SHMEnc -c my_cfg/test/low_delay_P_scalable.cfg -c my_cfg/test/test.cfg -c my_cfg/test/layers.cfg -q0 22 -q1 22 -b str/test.bin -o0 rec/test.yuv -o1 rec/test1.yuv

其中SHMEnc是我设置的alias,实际使用时替换编译的TAppEncoderStatic即可。上述命令中-c之后接的便是三个配置文件,后覆盖前。-q0-q1是对基本层和增强层的QP参数的覆盖。
在命令行中使用额外配置覆盖配置文件里的参数设置的话,最好使用TAppEncoderStatic -help查看对应的-x选项,而且似乎--X的选项不起作用,最好有-x的选项。

结果

上述命令的编码结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147

SHM software: Encoder Version [12.4 (HM-16.10)][Linux][GCC 12.1.1][64 bit]

Default OLS defined. Ignoring ListOfOutputLayers1
Warning: Level0 is set the same as Level1
Warning: Level0 is set the same as Level1

Total number of layers : 2
Multiview : 0
Scalable : 1
Base layer : HEVC
Auxiliary pictures : 0
Adaptive Resolution Change : 0
Skip picture at ARC switch : 0
Align picture type : 1
Cross layer IRAP alignment : 1
IDR only for IRAP : 1
InterLayerWeightedPred : 0

=== Layer 0 settings ===
Input File : /home/esakak/dataset/HEVC_yuv444_ds/ClassD/BasketballPass_208x120_50.yuv
Reconstruction File : /home/esakak/dataset/HEVC_yuv444_compress/ClassD/BasketballPass_416x240_50/22_BL.yuv
Real Format : 208x120 50Hz
Internal Format : 208x120 50Hz
PTL index : 1
Profile : main-RExt (main_444)
CU size / depth / total-depth : 64 / 4 / 4
RQT trans. size (min / max) : 4 / 32
Max RQT depth inter : 3
Max RQT depth intra : 3
Intra period : 12
QP : 22.00
Max dQP signaling depth : 0
Input bit depth : (Y:8, C:8)
MSB-extended bit depth : (Y:8, C:8)
Internal bit depth : (Y:8, C:8)
PCM sample bit depth : (Y:8, C:8)
Input ChromaFormatIDC : 4:4:4
Output (internal) ChromaFormatIDC : 4:4:4

RateControl : 0

=== Layer 1 settings ===
Input File : /home/esakak/dataset/HEVC_yuv444/ClassD/BasketballPass_416x240_50.yuv
Reconstruction File : /home/esakak/dataset/HEVC_yuv444_compress/ClassD/BasketballPass_416x240_50/22_EL.yuv
Real Format : 416x240 50Hz
Internal Format : 416x240 50Hz
PTL index : 2
Profile : scalable-RExt
CU size / depth / total-depth : 64 / 4 / 4
RQT trans. size (min / max) : 4 / 32
Max RQT depth inter : 3
Max RQT depth intra : 3
Intra period : 12
QP : 22.00
Max dQP signaling depth : 0
Input bit depth : (Y:8, C:8)
MSB-extended bit depth : (Y:8, C:8)
Internal bit depth : (Y:8, C:8)
PCM sample bit depth : (Y:8, C:8)
Input ChromaFormatIDC : 4:4:4
Output (internal) ChromaFormatIDC : 4:4:4

RateControl : 0

=== Common configuration settings ===
Bitstream File : /home/esakak/dataset/HEVC_yuv444_compress/ClassD/BasketballPass_416x240_50/22.bin
Sequence PSNR output : Linear average only
Sequence MSE output : Disabled
Frame MSE output : Disabled
Cabac-zero-word-padding : Enabled
Frame/Field : Frame based coding
Frame index : 0 - 35 (36 frames)
Min PCM size : 8
Motion search range : 64
Decoding refresh type : 0
Cb QP Offset : 0
Cr QP Offset : 0
QP adaptation : 0 (range=0)
GOP size : 4
Intra reference smoothing : Enabled
diff_cu_chroma_qp_offset_depth : -1
extended_precision_processing_flag : Disabled
implicit_rdpcm_enabled_flag : Disabled
explicit_rdpcm_enabled_flag : Disabled
transform_skip_rotation_enabled_flag : Disabled
transform_skip_context_enabled_flag : Disabled
cross_component_prediction_enabled_flag: Disabled
high_precision_offsets_enabled_flag : Disabled
persistent_rice_adaptation_enabled_flag: Disabled
cabac_bypass_alignment_enabled_flag : Disabled
log2_sao_offset_scale_luma : 0
log2_sao_offset_scale_chroma : 0
Cost function: : Lossy coding (default)
WPMethod : 0
Max Num Merge Candidates : 5

Layer0 TOOL CFG: IBD:0 HAD:1 RDQ:1 RDQTS:1 RDpenalty:0 SQP:0 ASR:0 MinSearchWindow:8 RestrictMESampling:0 FEN:1 ECU:0 FDM:1 CFM:0 ESD:0 RQT:1 TransformSkip:1 TransformSkipFast:1 TransformSkipLog2MaxSize:2 Slice: M=0 SliceSegment: M=0 CIP:0 SAO:1 PCM:0 TransQuantBypassEnabled:0 WPP:0 WPB:0 PME:2 WaveFrontSynchro:0 WaveFrontSubstreams:1 ScalingList:0 TMVPMode:1 AQpS:0 SignBitHidingFlag:1 RecalQP:0

Layer1 TOOL CFG: IBD:0 HAD:1 RDQ:1 RDQTS:1 RDpenalty:0 SQP:0 ASR:0 MinSearchWindow:8 RestrictMESampling:0 FEN:1 ECU:0 FDM:1 CFM:0 ESD:0 RQT:1 TransformSkip:1 TransformSkipFast:1 TransformSkipLog2MaxSize:2 Slice: M=0 SliceSegment: M=0 CIP:0 SAO:1 PCM:0 TransQuantBypassEnabled:0 WPP:0 WPB:0 PME:2 WaveFrontSynchro:0 WaveFrontSubstreams:1 ScalingList:0 TMVPMode:1 AQpS:0 SignBitHidingFlag:1 RecalQP:0

SHVC TOOL CFG: ElRapSliceType: P-slice REF_IDX_ME_ZEROMV: 1 ENCODER_FAST_MODE: 1 FIS:0 CGS: 0 CGSMaxOctantDepth: 1 CGSMaxYPartNumLog2: 2 CGSLUTBit:12 CGSAdaptC:1

Non-environment-variable-controlled macros set as follows:

RExt__DECODER_DEBUG_BIT_STATISTICS = 0
RExt__HIGH_BIT_DEPTH_SUPPORT = 0
RExt__HIGH_PRECISION_FORWARD_TRANSFORM = 0
O0043_BEST_EFFORT_DECODING = 0
ME_ENABLE_ROUNDING_OF_MVS = 1
U0040_MODIFIED_WEIGHTEDPREDICTION_WITH_BIPRED_AND_CLIPPING = 1

POC 0 LId: 0 TId: 0 ( I-SLICE IDR_W_RADL, nQP 22 QP 22 ) 45520 bits [Y 42.6642 dB U 45.4023 dB V 45.3845 dB] [ET 0 ] [L0 ] [L1 ] [MD5:fe712c2b61f71e4b5dc7376da4ef6f0a,7c1dbc6a4e43bd5e47846dbc8e8776a9,81d9c108343a39760ba39bd565157ee8]
POC 0 LId: 1 TId: 0 ( P-SLICE IDR_W_RADL, nQP 22 QP 22 ) 130704 bits [Y 42.6773 dB U 46.9517 dB V 47.1605 dB] [ET 1 ] [L0 0(0, {2.00, 2.00}x) ] [L1 ] [MD5:df1ab9ec67d7e69b9dab57f283e136f1,4f94b740807d08b8d2445754bb163b02,73fceea56d5db7d61001e811819c2a66]
...
POC 35 LId: 1 TId: 0 ( P-SLICE TRAIL_R, nQP 25 QP 25 ) 18768 bits [Y 40.5750 dB U 45.5604 dB V 45.1281 dB] [ET 1 ] [L0 34 32 28 24 35(0, {2.00, 2.00}x)c ] [L1 ] [MD5:397a66f8e9a9abafd5c43ae10d15a422,55dc4d7b3f844663396ee9ea69537ca1,07e2c71f64b0af6167fd121f2c37b9e3]


SUMMARY --------------------------------------------------------
Total Frames | Bitrate Y-PSNR U-PSNR V-PSNR YUV-PSNR
L0 36 a 423.0444 41.2625 44.8498 44.6139 43.2006
L1 36 a 1242.2556 41.3297 46.0205 45.9016 43.7850


I Slices--------------------------------------------------------
Total Frames | Bitrate Y-PSNR U-PSNR V-PSNR YUV-PSNR
L0 3 i 2148.9333 42.7306 45.5759 45.6468 44.4244
L1 0 i -nan -nan -nan -nan -nan


P Slices--------------------------------------------------------
Total Frames | Bitrate Y-PSNR U-PSNR V-PSNR YUV-PSNR
L0 33 p 266.1455 41.1290 44.7838 44.5200 43.1047
L1 36 p 1242.2556 41.3297 46.0205 45.9016 43.7850


B Slices--------------------------------------------------------
Total Frames | Bitrate Y-PSNR U-PSNR V-PSNR YUV-PSNR
L0 0 b -nan -nan -nan -nan -nan
L1 0 b -nan -nan -nan -nan -nan

RVM[L0]: 0.000
RVM[L1]: 0.000

Bytes written to file: 154292 (1714.356 kbps)

Total Time: 39.823 sec.

这里其实存在着问题,L0和L1相对于实际写入文件的码率(1714.356 kbps)之间存在着不小的GAP,目前还没有弄明白原因是什么。
而且,我测试了HEVC ClassD上的平均RD性能,使用QP=[42,37,32,27],结果和HM相比,效果竟然好的离谱。我还不清楚是我的psnr和bpp的计算出错了,还是配置文件弄错了。
这些地方还有待考察。

SHM test on ClassD

这里展示了其与DCVC的比较,存在着非长大的GAP,我一度质疑我的结果。但如果真的是如此,基于端到端做到如此的效果是否太过困难,开始担心毕业了有没有……..

后续,这里的问题出现在PSNR的计算上,需要将图像转成浮点数类型计算mse最终得到PSNR,否则PSNR会因为整数取证偏大。

附录

low_delay_P_scalable.cfg
test.cfg
layers.cfg