您好,欢迎来到中国企业库   [请登陆]  [免费注册]
小程序  
APP  
微信公众号  
手机版  
 [ 免责声明 ]     [ 举报 ]
企业库免费B2B网站
搜产品 搜企业
客服电话:400-000-8722
企业库首页>资讯
行业

[翻译]探究为何各种H.264编码器表现良莠不齐| 银河星尘

作者:企业资讯策划团队 来源:rwfb 发布时间:2010-01-16 浏览:324

http://share.dmhy.org/topics/view/hash_id/7c3fae1b8e8b826c7c72b8c283ef5e6e5cef1620
http://x264dev.multimedia.cx/?p=164

Why so many H.264 encoders are bad ?

橘化为枳

——探究为何各种H.264编码器表现良莠不齐



作者:Dark Shikari(作者系x264主要开发者之一)

译者:ssnake

校对:秋月

关键词:H.264、失败、视觉心理学优化、率失真优化

原标题为Why so many H.264 encoders are bad(为什么许多H.264编码器表现糟糕),原文链接:



如果潜心钻研各种H.264编码器,你无疑会发现其中表现糟糕者不在少数。当然,也不必惊讶于此,正如史特金定律告诉我们的:任何事物,其中九成都是垃圾(原文:90% of everything is crap.(Wikipedia作crud,意义接近))。而作为多年来最为公允的视频标准,支持H.264的软件不可胜计,{jd1}数量的巨大使得其中必然有相当数量的劣质实现方案。



但这并不足以解释优劣H.264编码器之间的鸿沟。优秀的H.264编码器——比如x264——在许多案例中可以以仅仅一半的码率击败上一代的编码器(比如XviD);而劣质的H.264编码器甚至糟糕到会输给MPEG-2!不同实现方案的差距会如此之大,这似乎是之前的各种标准从未有过的情况……当然,这里我们找到了一些缘由。



H.264提供了比历代标准都要更加丰富的压缩特性。这也使编码器开发者搬起石头砸自己脚的机会大增。笔者将在下文中概述不同方案的部分差异。绝大多数问题来自一个简单的事实:在用误差均方(mean squared error)作为编码模式决策(mode decision)的度量衡(metric)时,模糊会显得很好。



由于本文不仅仅与技术社区有关联,笔者将在文中对一些与本文观点直接相关的基本术语略加阐释。



RD = λ × bits + distortion(失真),一种衡量决策有多“合理”的量度。λ是数据对于质量(失真)的价值。比如,只给某个(价值低的)部分很少的数据量(以致其失真),或许能避免(其他部分)更大的失真。失真由“编码模式决策的度量衡”来测定,最常见的是“误差平方和”。



视觉能量(Visual Energy)是图像或视频中可见细节的总量。好的编码器会保留能量,不使图像显得模糊。



i16×16宏块

优势:i16×16是非常吸引人的模式:得益于分步的离散余弦变换(校对注:H264标准中对i-MB是分成两级做变换的,其中一种方式是先对162的Hadamard变换。),这个模式有着非凡的码流经济性。在帧中相对平整的区域,通常每个宏块只需要12比特以下甚至只要6比特的数据量。因此,RD编码模式决策会倾向于这种模式。

劣势:它的实际观感太惨了。i16×16模式非常不利于保留视觉能量:当使用该模式时,几乎不会有非0的AC系数,这样它预测部份中的四分之三几乎不会有一点能量被编码;而且解块滤镜会倾向于模糊掉剩下的细节。再加之以自适应量化(AQ)的缺失,这就是低质量的H.264编码器出现丑陋的16×16马赛克的主要原因。尽管这个模式本身并不是坏事,但被(H.264)标准所过分强调,而且给RD挖了一个大陷阱。



双线性Qpel(双线性四分之一像素)

优势:Qpel(四分之一像素)显然对压缩有好处,尤其是H.264的Qpel:它是专门为编码器性能而设计的。(H.264中的)Hpel(半像素)滤镜比较慢(因为它是六阶滤镜),但可以预先计算;而Qpel则比较简单,可以实时运算(双线性的)。

劣势:双线性插值会造成模糊,从而造成视觉能量的损失。而由于RD编码模式决策会更倾向于模糊,于是它会很乐意选择用双线性Qpel。此外,最简单的运动检测(Motion search,或译动态搜寻等)手段(完整像素、迭代半像素、迭代四分之一像素)会更偏向Qpel而非Hpel。尽管十分有用,但如果被编码器过度运用,Qpel会变成又一个陷阱。



4×4变换

优势:4×4变换能有效编码物体边缘,并有助于形成gx的i44变换允许一个更小的可变长编码表。

劣势:它太模糊了!相比使用同样量化系数的8×8变换,4×4变换的量化精度较低;又因为Decimation(校对注:带round-off效果的类量化步骤)的作用,将出现大量未正确处理的区块(uncoded blocks),又一个RD的陷阱。对于编码纹理区域来说,4×4变换十分糟糕,尤其是当纹理细节比变换区域还要大的时候。而且,相比8×8变换,4×4变换会更多的被解块(deblocked,校对注:至于为啥4288)的视频确有优化。



双向预测(Biprediction)

优势:双向预测是一切现代视频编码格式的核心:B帧极大的提升了编码效率,尤其是对相对静态的场景。在常用码率的H.264编码中,仅仅双向预测这一项技术,就使B帧中的区块可以被大量省略(校对注:当帧间预测的模式选择时,如果发现本宏块与参考帧中某个位置的宏块“wq”一致,则不对其进行任何后续的操作(预测、变换、量化),只在此宏块上标注“重建时拷贝参考帧的某个宏块”的信息。)。

劣势:双向预测需要再次进行双线性插值,因此会导致模糊,也就会再次成为俘获RD的甜蜜陷阱。这使得双向预测甚至在图像中的非静态区域也被过度使用了——比如胶片噪点,导致B帧中的模糊噪点和P帧中的清晰噪点(交替出现)。

需要指出的是,B帧以及双向预测并非H.264所独有的技术;多年来,它们已是既知的老问题,并且在低码率下更趋严重。



水平、垂直、DC 帧内预测模式

优势:这些模式对于帧内预测体系来说至关重要。DC预测(用左侧边沿和上方边沿的均值)类似于传统的空间帧内预测前的帧内编码,而另两个则对直棱边非常有帮助。总的来说,这三个应该是最常见的帧内预测模式。

劣势:它们难以保留能量。其他的帧内预测模式(均值、ddl(左下对角线)、ddr(右下对角线)、vr(垂直偏右)、hd(水平偏下)、vl(垂直偏左)、hu(水平偏上))能有效预测那些难以被离散余弦变换编码的频率,从而能够在重构画面时增加视觉能量。但水平、垂直、DC帧内预测则不然。此外,模式预测系统的工作方式决定了最节约的模式更受青睐(从数据量上来说)。



当然,x264有效的使用了所有这些特性——并无上述绝大多数问题的存在。其他编码器的开发者:学着点!(译者:DS你就吹吧!



下期预告:

妇联评论·技术版 《Hi-Vision的错觉》(特约作者:ssnake)


10/04/2009 (4:43 am)

Filed under: H.264, fail, psychovisual optimizations, rate-distortion optimization ::

If one works long enough with a large number of H.264 encoders, one might notice that a large number of them are pretty much awful. This of course shouldns also exacerbated by the fact that H.264 is the most widely-accepted video standard in years and has spawned a huge amount of software that implements it, thus generating more mediocre implementations.

But even this doesnt really explain the massive gap between good and bad H.264 encoders. Good H.264 encoders, like x264, can beat previous-generation encoders like Xvid visually at half the bitrate in many cases. Yet bad H.264 encoders are often so terrible that they lose to MPEG-2! The disparity wasns a good reason for this.

H.264 offers a great variety of compression features, more than any previous standard. This also greatly increases the number of ways that encoder developers can shoot themselves in the foot. In this post Ill go through a sampling of these. Most of the problems stem from the single fact that blurriness seems good when using mean squared error as a mode decision metric.

Since this post has gotten linked a good bit outside the technical community, Ill elaborate slightly on some basic terminology that underlies the concepts in this post.

RD = lambda * bits + distortion, a measure of how a decision is. Lambda is how valuable bits are relative to quality (distortion). If something costs very few bits, for example, it might be able to get away with more distortion. Distortion is measured via a mode decision metric, the most common being sum of squared errors.

Visual energy is the amount of apparent detail in an image or video. Part of the job of a good encoder is to retain energy so that the image doesnt look blurry.

i1616 macroblocks

The good: i1616 is very appealing as a mode: it is phenomenally cheap bit-wise due to its heirarchical DC transform. In flatter areas of the frame, this usually makes it cost less than a dozen or even half a dozen bits per macroblock. As a result, RD mode decision loves this mode.

The bad: It looks like crap. i16s over-emphasized in the spec and makes a great trap for RD to fall into.

Bilinear qpel

The good: Qpel is of course a good thing for compression, and H.264s qpel is particularly unique in that it is designed for encoder performance. The hpel filter is slow (6-tap filter), but can be precalculated, while the qpel is simple and can be done on-the-fly (bilinear).

The bad: Bilinear interpolation is blurry, thus losing visual energy. But of course RD mode decision loves blurriness and so will pick it happily. Furthermore, the most naive motion search method (fullpel, one iteration of hpel, one iteration of qpel) tends to bias towards qpel instead of hpel. While qpel is still very useful, its overuse is yet another trap for encoders.

44 transform

The good: The 48 transform would, thus allowing smaller VLC tables.

The bad: It8 added later) is likely an artifact of the entire specification process being done while optimizing for CIF resolution videos.

Biprediction

The good: Biprediction is at the core of any modern video format: B-frames vastly improve compression efficiency, especially in lower-motion scenes. Biprediction singlehandedly makes possible the high number of skip blocks in B-frames in most sane-bitrate H.264 encodes.

The bad: Its blurry, which acts as a nice RD trap yet again. This makes biprediction get overused even in non-constant areas of the image, such as film grain, ensuring blurry grain in B-frames and clear grain in P-frames (nicely alternating as such).

One should note of course that B-frames and thus biprediction are not at all unique to H.264; this has been an ongoing problem for many years and tends to be exacerbated by lower bitrates.

h/v/dc intra prediction modes

The good: These modes are critical to the intra prediction system. DC is similar to the old-style intra coding before spatial intra prediction, and the latter two are very useful for straight edges. These three tend to be overall the most common intra prediction modes.

The bad: They retain energy terribly. The other intra prediction modes (planar and ddl/ddr/vr/hd/vl/hu) effectively predict frequencies that are difficult to code with a DCT, thus increasing visual energy in the resulting reconstructed image. But h/v/dc dont really do this. Furthermore, because of how the mode prediction system works, they tend to be the cheapest modes to signal (in terms of bits).

Of course, x264 effectively uses all of these features without most of the aforementioned problems. Developers of other encoders: take note.

Tags: , ,

Related posts

分类: 标签: , , 70 views
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

郑重声明:资讯 【[翻译]探究为何各种H.264编码器表现良莠不齐| 银河星尘】由 企业资讯策划团队 发布,版权归原作者及其所在单位,其原创性以及文中陈述文字和内容未经(企业库www.qiyeku.com)证实,请读者仅作参考,并请自行核实相关内容。若本文有侵犯到您的版权, 请你提供相关证明及申请并与我们联系(qiyeku # qq.com)或【在线投诉】,我们审核后将会尽快处理。
会员咨询QQ群:902340051 入群验证:企业库会员咨询.
免费注册只需30秒,立刻尊享
免费开通旗舰型网络商铺
免费发布无限量供求信息
每天查看30万求购信息