SHM | EsakaK的微语世界

前言

入了编码的坑，逃不开使用目前最前沿的传统编解码器的测试软件。与大多数人不同的是，我的尝试竟然是从Scalable Video Coding开始的。本篇文章记录一下SHM-12.4（HEVC Scalable Extension）的安装和使用过程。

安装

下载

这里提供一下SHM的所有版本的SVN版本管理仓库SHM。

SHM所有版本

我是下载了最新的版本，即SHM-12.4。这里需要注意的是，SHM的代码通过svn管理，不能够直接下载，需要使用svn的工具clone仓库来得到，关于svn的工具可以自行搜索啦。

编译

我是在Linux发行版Manjaro的机器上编译的，在根文件夹下使用make命令就可以很简单的编译。这里需要注意的一点是，在linux下的编译SHM或者HM，需要到linux文件夹下删除common文件夹下makefile.base的-Werror选项，再在linux下运行终端make编译，否则会因为warning而阻断编译过程。

对于Windows环境下的编译，我也尝试过，使用visual studio 2019打开build文件夹下的vc2017.sln解决方案，然后和其他cpp项目一样进行编译即可。

使用

添加环境变量可以方便使用编解码器，或者设置alias。
这里主要说一下命令。SHM的编码和HM的编码有一点点不同。

主要配置文件

在HM中，譬如要与端到端学习方法的网络作性能对比，我们会使用low_delay_p的参考配置文件。在SHM中，对yuv444没有提供low_delay_p的配置文件，只能通过提供的Rext的random access配置文件修改得到。其实就是改一下
两个层开头的profile项。我使用的编码yuv444的low_delay_p配置文件如下：

#======== File I/O =====================
BitstreamFile                 : str.bin
#ReconFile                     : rec.yuv

#======== Profile ================
NumProfileTierLevel           : 3
Profile0                      : main_444  	          # Profile for BL (NOTE01: this profile applies to whole layers but only BL is outputted)
                                                      #                (NOTE02: this profile has no effect when NonHEVCBase is set to 1)
Profile1                      : main_444              # Profile for BL (NOTE01: this profile applies to HEVC BL only)
                                                      #                (NOTE02: When NonHEVCBase is set to 1, this profile & associated level should be updated appropriately)
Profile2                      : scalable-main_444     # Scalable profile

#======== Unit definition ================
#MaxCUWidth                    : 64          # Maximum coding unit width in pixel
#MaxCUHeight                   : 64          # Maximum coding unit height in pixel
#MaxPartitionDepth             : 4           # Maximum coding unit depth
#QuadtreeTULog2MaxSize         : 5           # Log2 of maximum transform size for
                                            # quadtree-based TU coding (2...6)
#QuadtreeTULog2MinSize         : 2           # Log2 of minimum transform size for
                                            # quadtree-based TU coding (2...6)
#QuadtreeTUMaxDepthInter       : 3
#QuadtreeTUMaxDepthIntra       : 3

#======== Coding Structure =============
#IntraPeriod                   : -1          # Period of I-Frame ( -1 = only first)
DecodingRefreshType           : 0           # Random Accesss 0:none, 1:CRA, 2:IDR, 3:Recovery Point SEI
GOPSize                       : 4           # GOP Size (number of B slice = GOPSize-1)
#        Type POC QPoffset CbQPoffset CrQPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2  temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs
Frame1:  P    1   3        0          0          0.4624   0            0               0           4                4         -1 -5 -9 -13       0
Frame2:  P    2   2        0          0          0.4624   0            0               0           4                4         -1 -2 -6 -10       1      -1       5         1 1 1 0 1
Frame3:  P    3   3        0          0          0.4624   0            0               0           4                4         -1 -3 -7 -11       1      -1       5         0 1 1 1 1
Frame4:  P    4   1        0          0          0.578    0            0               0           4                4         -1 -4 -8 -12       1      -1       5         0 1 1 1 1

#=========== Motion Search =============
FastSearch                    : 1           # 0:Full search  1:TZ search
SearchRange                   : 64         # (0: Search range is a Full frame)
BipredSearchRange             : 4           # Search range for bi-prediction refinement
HadamardME                    : 1           # Use of hadamard measure for fractional ME
FEN                           : 1           # Fast encoder decision
FDM                           : 1           # Fast Decision for Merge RD cost

...

完整的文件之后会提供。

针对每个序列的配置文件

这里我取名为test.cfg是对每个特定序列的配置文件，主要需要提供输入文件路径、分辨率还有帧率等控制信息。

FrameSkip                     : 0           # Number of frames to be skipped in input
FramesToBeEncoded             : 36           # Number of frames to be coded

Level0                        : 4           # Level of the whole bitstream
Level1                        : 4           # Level of the base layer
Level2                        : 4           # Level of the enhancement layer

#======== File I/O ===============
InputFile0                    : E:\dataset\HEVC_yuv444_ds\ClassD\BasketballPass_208x120_50.yuv
FrameRate0                    : 50          # Frame Rate per second
InputBitDepth0                : 8           # Input bitdepth for layer 0
InputChromaFormat			  : 444			# Ratio of luminance to chrominance samples
SourceWidth0                  : 208         # Input  frame width
SourceHeight0                 : 120         # Input  frame height
RepFormatIdx0                 : 0           # Index of corresponding rep_format() in the VPS
IntraPeriod0                  : 12          # Period of I-Frame ( -1 = only first)
ConformanceMode0              : 1           # conformance mode
QP0                           : 22
LayerPTLIndex0                : 1

InputFile1                    : E:\dataset\HEVC_yuv444\ClassD\BasketballPass_416x240_50.yuv
FrameRate1                    : 50          # Frame Rate per second
InputBitDepth1                : 8           # Input bitdepth for layer 1
InputChromaFormat			  : 444 		# Ratio of luminance to chrominance samples
SourceWidth1                  : 416        # Input  frame width
SourceHeight1                 : 240        # Input  frame height
RepFormatIdx1                 : 1           # Index of corresponding rep_format() in the VPS
IntraPeriod1                  : 12          # Period of I-Frame ( -1 = only first)
ConformanceMode1              : 1           # conformance mode
QP1                           : 22
LayerPTLIndex1                : 2

Layers配置

这是HM中所没有的配置文件，主要给出scalable的设置。

NumLayers                     : 2
NonHEVCBase                   : 0
ScalabilityMask1              : 0           # Multiview
ScalabilityMask2              : 1           # Scalable
ScalabilityMask3              : 0           # Auxiliary pictures
AdaptiveResolutionChange      : 0           # Resolution change frame (0: disable)
SkipPictureAtArcSwitch        : 0           # Code higher layer picture as skip at ARC switching (0: disable (default), 1: enable)
MaxTidRefPresentFlag          : 1           # max_tid_ref_present_flag (0=not present, 1=present(default))
CrossLayerPictureTypeAlignFlag: 1           # Picture type alignment across layers
CrossLayerIrapAlignFlag       : 1           # Align IRAP across layers
SEIDecodedPictureHash         : 1

#============= LAYER 0 ==================
QP0                           : 22
MaxTidIlRefPicsPlus10         : 2           # max_tid_il_ref_pics_plus1 for layer0
#============ Rate Control ==============
RateControl0                  : 0           # Rate control: enable rate control for layer 0
TargetBitrate0                : 1000000     # Rate control: target bitrate for layer 0, in bps
KeepHierarchicalBit0          : 1           # Rate control: keep hierarchical bit allocation for layer 0 in rate control algorithm
LCULevelRateControl0          : 1           # Rate control: 1: LCU level RC for layer 0; 0: picture level RC for layer 0
RCLCUSeparateModel0           : 1           # Rate control: use LCU level separate R-lambda model for layer 0
InitialQP0                    : 0           # Rate control: initial QP for layer 0
RCForceIntraQP0               : 0           # Rate control: force intra QP to be equal to initial QP for layer 0

#============ WaveFront ================
WaveFrontSynchro0             : 0           # 0:  No WaveFront synchronisation (WaveFrontSubstreams must be 1 in this case).
                                            # >0: WaveFront synchronises with the LCU above and to the right by this many LCUs.
#=========== Quantization Matrix =================
ScalingList0                  : 0                      # ScalingList 0 : off, 1 : default, 2 : file read
ScalingListFile0              : scaling_list0.txt      # Scaling List file name. If file is not exist, use Default Matrix.
                                            
#============= LAYER 1 ==================
QP1                           : 20
NumSamplePredRefLayers1       : 1           # number of sample pred reference layers
SamplePredRefLayerIds1        : 0           # reference layer id
NumMotionPredRefLayers1       : 1           # number of motion pred reference layers
MotionPredRefLayerIds1        : 0           # reference layer id
NumActiveRefLayers1           : 1           # number of active reference layers
PredLayerIds1                 : 0           # inter-layer prediction layer index within available reference layers

#============ Rate Control ==============
RateControl1                  : 0           # Rate control: enable rate control for layer 1
TargetBitrate1                : 1000000     # Rate control: target bitrate for layer 1, in bps
KeepHierarchicalBit1          : 1           # Rate control: keep hierarchical bit allocation for layer 1 in rate control algorithm
LCULevelRateControl1          : 1           # Rate control: 1: LCU level RC for layer 1; 0: picture level RC for layer 1
RCLCUSeparateModel1           : 1           # Rate control: use LCU level separate R-lambda model for layer 1
InitialQP1                    : 0           # Rate control: initial QP for layer 1
RCForceIntraQP1               : 0           # Rate control: force intra QP to be equal to initial QP for layer 1

#============ WaveFront ================
WaveFrontSynchro1             : 0           # 0:  No WaveFront synchronisation (WaveFrontSubstreams must be 1 in this case).
                                            # >0: WaveFront synchronises with the LCU above and to the right by this many LCUs.
#=========== Quantization Matrix =================
ScalingList1                  : 0                      # ScalingList 0 : off, 1 : default, 2 : file read
ScalingListFile1              : scaling_list1.txt      # Scaling List file name. If file is not exist, use Default Matrix.
                                            
NumLayerSets                  : 2           # Include default layer set, value of 0 not allowed
NumLayerInIdList1             : 2           # 0-th layer set is default, need not specify LayerSetLayerIdList0 or NumLayerInIdList0
LayerSetLayerIdList1          : 0 1

NumAddLayerSets                      : 0
NumOutputLayerSets                   : 2           # Include defualt OLS, value of 0 not allowed
DefaultTargetOutputLayerIdc          : 1
NumOutputLayersInOutputLayerSet      : 1           # The number of layers in the 0-th OLS should not be specified, 
# ListOfOutputLayers0 need not be specified
ListOfOutputLayers1	                 : 1
ListOfProfileTierLevelOls1           : 1 2

编码命令

一次测试编码的命令可以如下：

1	SHMEnc -c my_cfg/test/low_delay_P_scalable.cfg -c my_cfg/test/test.cfg -c my_cfg/test/layers.cfg -q0 22 -q1 22 -b str/test.bin -o0 rec/test.yuv -o1 rec/test1.yuv

其中SHMEnc是我设置的alias，实际使用时替换编译的TAppEncoderStatic即可。上述命令中-c之后接的便是三个配置文件，后覆盖前。-q0和-q1是对基本层和增强层的QP参数的覆盖。
在命令行中使用额外配置覆盖配置文件里的参数设置的话，最好使用TAppEncoderStatic -help查看对应的-x选项，而且似乎--X的选项不起作用，最好有-x的选项。

结果

上述命令的编码结果如下：


SHM software: Encoder Version [12.4 (HM-16.10)][Linux][GCC 12.1.1][64 bit] 

Default OLS defined. Ignoring ListOfOutputLayers1
Warning: Level0 is set the same as Level1
Warning: Level0 is set the same as Level1

Total number of layers            : 2
Multiview                         : 0
Scalable                          : 1
Base layer                        : HEVC
Auxiliary pictures                : 0
Adaptive Resolution Change        : 0
Skip picture at ARC switch        : 0
Align picture type                : 1
Cross layer IRAP alignment        : 1
IDR only for IRAP                 : 1
InterLayerWeightedPred            : 0

=== Layer 0 settings ===
Input          File                    : /home/esakak/dataset/HEVC_yuv444_ds/ClassD/BasketballPass_208x120_50.yuv
Reconstruction File                    : /home/esakak/dataset/HEVC_yuv444_compress/ClassD/BasketballPass_416x240_50/22_BL.yuv
Real     Format                        : 208x120 50Hz
Internal Format                        : 208x120 50Hz
PTL index                              : 1
Profile                                : main-RExt (main_444)
CU size / depth / total-depth          : 64 / 4 / 4
RQT trans. size (min / max)            : 4 / 32
Max RQT depth inter                    : 3
Max RQT depth intra                    : 3
Intra period                           : 12
QP                                     : 22.00
Max dQP signaling depth                : 0
Input bit depth                        : (Y:8, C:8)
MSB-extended bit depth                 : (Y:8, C:8)
Internal bit depth                     : (Y:8, C:8)
PCM sample bit depth                   : (Y:8, C:8)
Input ChromaFormatIDC                  : 4:4:4
Output (internal) ChromaFormatIDC      : 4:4:4

RateControl                            : 0

=== Layer 1 settings ===
Input          File                    : /home/esakak/dataset/HEVC_yuv444/ClassD/BasketballPass_416x240_50.yuv
Reconstruction File                    : /home/esakak/dataset/HEVC_yuv444_compress/ClassD/BasketballPass_416x240_50/22_EL.yuv
Real     Format                        : 416x240 50Hz
Internal Format                        : 416x240 50Hz
PTL index                              : 2
Profile                                : scalable-RExt
CU size / depth / total-depth          : 64 / 4 / 4
RQT trans. size (min / max)            : 4 / 32
Max RQT depth inter                    : 3
Max RQT depth intra                    : 3
Intra period                           : 12
QP                                     : 22.00
Max dQP signaling depth                : 0
Input bit depth                        : (Y:8, C:8)
MSB-extended bit depth                 : (Y:8, C:8)
Internal bit depth                     : (Y:8, C:8)
PCM sample bit depth                   : (Y:8, C:8)
Input ChromaFormatIDC                  : 4:4:4
Output (internal) ChromaFormatIDC      : 4:4:4

RateControl                            : 0

=== Common configuration settings === 
Bitstream      File                    : /home/esakak/dataset/HEVC_yuv444_compress/ClassD/BasketballPass_416x240_50/22.bin
Sequence PSNR output                   : Linear average only
Sequence MSE output                    : Disabled
Frame MSE output                       : Disabled
Cabac-zero-word-padding                : Enabled
Frame/Field                            : Frame based coding
Frame index                            : 0 - 35 (36 frames)
Min PCM size                           : 8
Motion search range                    : 64
Decoding refresh type                  : 0
Cb QP Offset                           : 0
Cr QP Offset                           : 0
QP adaptation                          : 0 (range=0)
GOP size                               : 4
Intra reference smoothing              : Enabled
diff_cu_chroma_qp_offset_depth         : -1
extended_precision_processing_flag     : Disabled
implicit_rdpcm_enabled_flag            : Disabled
explicit_rdpcm_enabled_flag            : Disabled
transform_skip_rotation_enabled_flag   : Disabled
transform_skip_context_enabled_flag    : Disabled
cross_component_prediction_enabled_flag: Disabled
high_precision_offsets_enabled_flag    : Disabled
persistent_rice_adaptation_enabled_flag: Disabled
cabac_bypass_alignment_enabled_flag    : Disabled
log2_sao_offset_scale_luma             : 0
log2_sao_offset_scale_chroma           : 0
Cost function:                         : Lossy coding (default)
WPMethod                               : 0
Max Num Merge Candidates               : 5

Layer0 TOOL CFG: IBD:0 HAD:1 RDQ:1 RDQTS:1 RDpenalty:0 SQP:0 ASR:0 MinSearchWindow:8 RestrictMESampling:0 FEN:1 ECU:0 FDM:1 CFM:0 ESD:0 RQT:1 TransformSkip:1 TransformSkipFast:1 TransformSkipLog2MaxSize:2 Slice: M=0 SliceSegment: M=0 CIP:0 SAO:1 PCM:0 TransQuantBypassEnabled:0 WPP:0 WPB:0 PME:2  WaveFrontSynchro:0 WaveFrontSubstreams:1 ScalingList:0 TMVPMode:1 AQpS:0 SignBitHidingFlag:1 RecalQP:0

Layer1 TOOL CFG: IBD:0 HAD:1 RDQ:1 RDQTS:1 RDpenalty:0 SQP:0 ASR:0 MinSearchWindow:8 RestrictMESampling:0 FEN:1 ECU:0 FDM:1 CFM:0 ESD:0 RQT:1 TransformSkip:1 TransformSkipFast:1 TransformSkipLog2MaxSize:2 Slice: M=0 SliceSegment: M=0 CIP:0 SAO:1 PCM:0 TransQuantBypassEnabled:0 WPP:0 WPB:0 PME:2  WaveFrontSynchro:0 WaveFrontSubstreams:1 ScalingList:0 TMVPMode:1 AQpS:0 SignBitHidingFlag:1 RecalQP:0

SHVC TOOL CFG: ElRapSliceType: P-slice REF_IDX_ME_ZEROMV: 1 ENCODER_FAST_MODE: 1 FIS:0 CGS: 0 CGSMaxOctantDepth: 1 CGSMaxYPartNumLog2: 2 CGSLUTBit:12 CGSAdaptC:1 

Non-environment-variable-controlled macros set as follows: 

                                RExt__DECODER_DEBUG_BIT_STATISTICS =   0
                                      RExt__HIGH_BIT_DEPTH_SUPPORT =   0
                            RExt__HIGH_PRECISION_FORWARD_TRANSFORM =   0
                                        O0043_BEST_EFFORT_DECODING =   0
                                         ME_ENABLE_ROUNDING_OF_MVS =   1
        U0040_MODIFIED_WEIGHTEDPREDICTION_WITH_BIPRED_AND_CLIPPING =   1

POC    0 LId: 0 TId: 0 ( I-SLICE IDR_W_RADL, nQP 22 QP 22 )      45520 bits [Y 42.6642 dB    U 45.4023 dB    V 45.3845 dB] [ET     0 ] [L0 ] [L1 ] [MD5:fe712c2b61f71e4b5dc7376da4ef6f0a,7c1dbc6a4e43bd5e47846dbc8e8776a9,81d9c108343a39760ba39bd565157ee8]
POC    0 LId: 1 TId: 0 ( P-SLICE IDR_W_RADL, nQP 22 QP 22 )     130704 bits [Y 42.6773 dB    U 46.9517 dB    V 47.1605 dB] [ET     1 ] [L0 0(0, {2.00, 2.00}x) ] [L1 ] [MD5:df1ab9ec67d7e69b9dab57f283e136f1,4f94b740807d08b8d2445754bb163b02,73fceea56d5db7d61001e811819c2a66]
...
POC   35 LId: 1 TId: 0 ( P-SLICE    TRAIL_R, nQP 25 QP 25 )      18768 bits [Y 40.5750 dB    U 45.5604 dB    V 45.1281 dB] [ET     1 ] [L0 34 32 28 24 35(0, {2.00, 2.00}x)c ] [L1 ] [MD5:397a66f8e9a9abafd5c43ae10d15a422,55dc4d7b3f844663396ee9ea69537ca1,07e2c71f64b0af6167fd121f2c37b9e3]


SUMMARY --------------------------------------------------------
	Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR 
  L0 	       36    a     423.0444   41.2625   44.8498   44.6139   43.2006
  L1 	       36    a    1242.2556   41.3297   46.0205   45.9016   43.7850


I Slices--------------------------------------------------------
	Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR 
  L0 	        3    i    2148.9333   42.7306   45.5759   45.6468   44.4244
  L1 	        0    i         -nan      -nan      -nan      -nan      -nan


P Slices--------------------------------------------------------
	Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR 
  L0 	       33    p     266.1455   41.1290   44.7838   44.5200   43.1047
  L1 	       36    p    1242.2556   41.3297   46.0205   45.9016   43.7850


B Slices--------------------------------------------------------
	Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR 
  L0 	        0    b         -nan      -nan      -nan      -nan      -nan
  L1 	        0    b         -nan      -nan      -nan      -nan      -nan

RVM[L0]: 0.000
RVM[L1]: 0.000

Bytes written to file: 154292 (1714.356 kbps)

 Total Time:       39.823 sec.

这里其实存在着问题，L0和L1相对于实际写入文件的码率(1714.356 kbps)之间存在着不小的GAP，目前还没有弄明白原因是什么。
而且，我测试了HEVC ClassD上的平均RD性能，使用QP=[42,37,32,27]，结果和HM相比，效果竟然好的离谱。我还不清楚是我的psnr和bpp的计算出错了，还是配置文件弄错了。
这些地方还有待考察。