1 Mar 2007 02:57
Re: Scalability
Limin Wang <lance.lmwang <at> gmail.com>
2007-03-01 01:57:14 GMT
2007-03-01 01:57:14 GMT
Hi,
* Christian Bienia <cbienia <at> CS.Princeton.EDU> [2007-02-28 13:43:10 -0500]:
> Hi,
>
> > For testing, change the #define X264_THREAD_MAX 16 (in common.h) to some
> > bigger number. Then I'll see about dynamic allocation.
>
> The performance doesn't improve significantly with more than 16 threads. Here's the results with a
sufficiently high X264_THREAD_MAX value:
X264_THREAD_MAX is only one limitation, see below code:
h->param.i_threads = x264_clip3( h->param.i_threads, 1, X264_THREAD_MAX );
h->param.i_threads = X264_MIN( h->param.i_threads, 1 + (h->param.i_height
>> h->param.b_interlaced) / (X264_THREAD_HEIGHT + 16) ); // FIXME exact limit?
So in your case, actual threads will be
interlaced: 1+(1080 >> 1)/(24+16) = 13
not interlaced: 1+1080/(24+16) = 27
In my opinion, it's better for x264 to return error if input threads exceed
X264_THREAD_MAX instead of clip it directly.
Thanks,
Limin
I didn't say fine-granular parallelization is hard, I said it doesn't work
as well. But yes, it would also require much more rearranging of x264
internals than the current threading did.
You can't divide a frame into large independent chunks without slices. And
even if you did use slices, that's completely incompatible with
frame-threading. The only temporal work division compatible with
slice-threading is non-referenced B-frames and GOP-threading.
The sub-frame work division XviD uses is to encode consecutive macroblock
rows in separate threads, making sure each row stays at least 2 MBs behind
the previous. Then run another thread behind them all to do the bitstream
writing. However, this reduces the temporal splitting possible by almost
as much as it increases the spatial splitting, because there's that much
more data in-progress that the next frame has to wait for. It also
prevents bit-exact CABAC RDO, though I haven't simulated how much that
would cost in compression quality.
RSS Feed