Axel Zeuner | 2 May 2008 11:30
Picon
Picon

[PATCH] zigzag SSE2

Hello,
two patches against git HEAD are attached:
- x264-zigzag-sse2.diff contains SSE2 implementations of the zigzag functions. 
- x264-timeasm.diff contains timeasm, a timing code to check the effects of 
the changes made. The program is a hack, it does no checks and was tested 
only on linux x86/x86-64 using gcc.

I would like to see results on other processors in 32-bit and 64-bit mode 
before one may start discuss about inclusion of these functions into git. 

Two results as printed by timeasm follow:

model name	: AMD Turion(tm) 64 X2 Mobile Technology TL-50
Architecture: x86-32
---------------------------------------------
zigzag frame
offset determination: 121 clocks
c - sub_4x4: 43 clocks
ref - sub_4x4: 43 clocks
new - sub_4x4: 37 clocks
offset determination: 121 clocks
c - scan_4x4: 22 clocks
ref - scan_4x4: 22 clocks
new - scan_4x4: 20 clocks
offset determination: 121 clocks
c - scan_8x8: 76 clocks
ref - scan_8x8: 76 clocks
new - scan_8x8: 62 clocks
---------------------------------------------
zigzag field
(Continue reading)

Guillaume Poirier | 2 May 2008 14:30
Picon

Re: [PATCH] zigzag SSE2

Hello,

Axel Zeuner wrote:
> Hello,
> two patches against git HEAD are attached:
> - x264-zigzag-sse2.diff contains SSE2 implementations of the zigzag functions. 
> - x264-timeasm.diff contains timeasm, a timing code to check the effects of 
> the changes made. The program is a hack, it does no checks and was tested 
> only on linux x86/x86-64 using gcc.
>
> I would like to see results on other processors in 32-bit and 64-bit mode 
> before one may start discuss about inclusion of these functions into git. 
>
> Two results as printed by timeasm follow:
>   
I tested your patch on 2 Core2 machines:

The first one has a Penryn core:

Architecture: x86-64
model name      : Intel(R) Core(TM)2 Duo CPU     E8200   <at>  2.66GHz
x264: using random seed 1704265240
---------------------------------------------
zigzag frame
offset determination: 424 clocks
c - sub_4x4: 40 clocks
ref - sub_4x4: 40 clocks
new - sub_4x4: 16 clocks
offset determination: 424 clocks
c - scan_4x4: 16 clocks
(Continue reading)

Holger Lubitz | 2 May 2008 15:33
Picon

Re: [PATCH] zigzag SSE2

hi,

> two patches against git HEAD are attached:

hmm. it seems my patches haven't made it there yet.

i only did the scan_4x4_frame and 8x8_frame so far, but i think something 
similar can be done to the subs. 4x4 was somewhere below 10 cycles
(dark_shikari measured 8 on core2, i had 6.something on amd64, pengvado
saw 10 on opteron). the 8x8 was 40 cycles on my amd64, core2 differed
with alignment due to cache split, but i think the range was 38-57.
(4x4 was pure mmx, 8x8 used pshufw)
so i later did a sse2 version that was 60 cycles on athlon but got rid of
unaligned loads and could get rid of unaligned stores with some extra
i intend to do that once i get one myself, which should be before
the end of the month.

> The 64-bit result was obtained after reverting the patch to reduce the code 
> size (%define movdqa movaps ;%define movdqu movups) of the SSE2 code. Athlons 
> seem to keep some state information about the contents of their sse 
> registers.

they do have some fp state. but that shouldn't matter as long as you only
unpack and shuffle.

ah. might as well take the opportunity to introduce myself to the list:
i am one of the four google summer of code students accepted to work on
x264 this summer. i intend to be looking for optimization possibilities
in x264 wherever i find them. the zigzags were sort of a qualification
task for me.
(Continue reading)

Holger Lubitz | 2 May 2008 15:41
Picon

Re: [PATCH] zigzag SSE2

> - x264-timeasm.diff contains timeasm, a timing code to check the effects of 
> the changes made. The program is a hack, it does no checks and was tested 
> only on linux x86/x86-64 using gcc.

ah, one more comment. you do the rdtsc after synchronizing with cpuid.
you'll see more interesting results without, because the latency of the
final writes is likely to be hidden. and if you want it synchronized
you can have that without cpuid if you use rdtscp.

holger

_______________________________________________
x264-devel mailing list
x264-devel <at> videolan.org
http://mailman.videolan.org/listinfo/x264-devel

Joey Degges | 3 May 2008 21:09
Picon
Gravatar

GSoC: Improve Fast Inter Refinement and Adaptive Quantization

Hello,

I will be working on google summer of code this summer. Below is an overview of my project.

The goal of this project is to improve heuristics and decision-making for inter refinement in order to improve efficiency given average encoding settings. This will involve various early termination heuristics along with methods of deciding which partition modes need to be searched while performing minimal actual searching on them. I also plan to experiment with different methods that can be used to improve psycho-visual optimizations for mode decisions and quantization. This will include improving variance adaptive quantization by experimenting with different methods which could be used to weight the variance in order to select a more optimized quantizer.

You can usually find me in the #x264-dev channel if you want to talk about the project. I promise to be more active in the coming weeks :]

- Joey Degges

_______________________________________________
x264-devel mailing list
x264-devel <at> videolan.org
http://mailman.videolan.org/listinfo/x264-devel
Axel Zeuner | 4 May 2008 08:50
Picon
Picon

Re: [PATCH] zigzag SSE2

Hello,

On Friday 02 May 2008 14:30:02 Guillaume Poirier wrote:
> Hello,
>
> Axel Zeuner wrote:
> > Hello,
> > two patches against git HEAD are attached:
> > - x264-zigzag-sse2.diff contains SSE2 implementations of the zigzag
> > functions. - x264-timeasm.diff contains timeasm, a timing code to check
> > the effects of the changes made. The program is a hack, it does no checks
> > and was tested only on linux x86/x86-64 using gcc.
> >
> > I would like to see results on other processors in 32-bit and 64-bit mode
> > before one may start discuss about inclusion of these functions into git.
> >
> > Two results as printed by timeasm follow:
>
> I tested your patch on 2 Core2 machines:
Thank you very much. I am glad to see that the latencies of all functions are 
less than or equal as the latencies of their MMX/MMXEXT/C counterparts also 
on Core2.

Regards,
Axel
_______________________________________________
x264-devel mailing list
x264-devel <at> videolan.org
http://mailman.videolan.org/listinfo/x264-devel

Axel Zeuner | 4 May 2008 08:57
Picon
Picon

Re: [PATCH] zigzag SSE2

Hello,

On Friday 02 May 2008 15:33:45 Holger Lubitz wrote:
> hi,
>
> > two patches against git HEAD are attached:
>
> hmm. it seems my patches haven't made it there yet.
Sorry, I did not know about it.
>
> i only did the scan_4x4_frame and 8x8_frame so far, but i think something
> similar can be done to the subs. 4x4 was somewhere below 10 cycles
> (dark_shikari measured 8 on core2, i had 6.something on amd64, pengvado
> saw 10 on opteron). the 8x8 was 40 cycles on my amd64, core2 differed
> with alignment due to cache split, but i think the range was 38-57.
> (4x4 was pure mmx, 8x8 used pshufw)
How these measurements were done? oprofile? 

Regards,
Axel
_______________________________________________
x264-devel mailing list
x264-devel <at> videolan.org
http://mailman.videolan.org/listinfo/x264-devel

Axel Zeuner | 4 May 2008 09:13
Picon
Picon

Re: [PATCH] zigzag SSE2

Hello,

On Friday 02 May 2008 15:41:50 Holger Lubitz wrote:
> > - x264-timeasm.diff contains timeasm, a timing code to check the effects
> > of the changes made. The program is a hack, it does no checks and was
> > tested only on linux x86/x86-64 using gcc.
>
> ah, one more comment. you do the rdtsc after synchronizing with cpuid.
> you'll see more interesting results without, because the latency of the
> final writes is likely to be hidden. and if you want it synchronized
> you can have that without cpuid if you use rdtscp.
I want to see the latencies - it shows me when the results are ready, 
otherwise only the decoding speed is measured if the code generates less 
uOps/mOps than entries in the reorder buffers/schedulers exist.
But you are right, one should measure both times.
rdtscp does not exist on all athlons (before F-stepping?, I know it is not 
available on E6 stepping).

Regards
Axel

_______________________________________________
x264-devel mailing list
x264-devel <at> videolan.org
http://mailman.videolan.org/listinfo/x264-devel

Holger Lubitz | 4 May 2008 20:42
Picon

Re: [PATCH] zigzag SSE2

> I want to see the latencies - it shows me when the results are ready, 
> otherwise only the decoding speed is measured if the code generates less 
> uOps/mOps than entries in the reorder buffers/schedulers exist.

Granted. But it's still the more interesting figure, unless you really
use the results immediately after (and in that case one should try to
rewrite the code to avoid the writes as they are likely to be 
unnecessary). The cycle counts i quoted for my code were done with
unsynchronized rdtsc using pengvado's bench.h include.

> But you are right, one should measure both times.
> rdtscp does not exist on all athlons (before F-stepping?, I know it is not 
> available on E6 stepping).

Possibly. Mine is an x2 3800+ ee "family 15, model 67, stepping 2" and it 
supports rdtscp.

Holger

_______________________________________________
x264-devel mailing list
x264-devel <at> videolan.org
http://mailman.videolan.org/listinfo/x264-devel

Gabriel Bouvigne | 5 May 2008 11:29
Favicon

Re: [patch] updated 2 pass vbv handling

Per-line tracking of VBV within 2nd pass.
With this patch, the 2 pass VBV handling is now feature-complete.

--
Gabriel
diff --git a/encoder/encoder.c b/encoder/encoder.c
old mode 100644
new mode 100755
index a328fdf..05dcc55
--- a/encoder/encoder.c
+++ b/encoder/encoder.c
 <at>  <at>  -659,6 +659,7  <at>  <at>  x264_t *x264_encoder_open   ( x264_param_t *param )
           || h->param.rc.i_rc_method == X264_RC_CRF
           || h->param.b_bframe_adaptive
           || h->param.b_pre_scenecut );
+    h->frames.b_have_lowres |= (h->param.rc.b_stat_read && h->param.rc.i_vbv_buffer_size > 0);

     h->frames.i_last_idr = - h->param.i_keyint_max;
     h->frames.i_input    = 0;
diff --git a/encoder/ratecontrol.c b/encoder/ratecontrol.c
old mode 100644
new mode 100755
index a0e0859..7e252fd
--- a/encoder/ratecontrol.c
+++ b/encoder/ratecontrol.c
 <at>  <at>  -43,6 +43,7  <at>  <at>  typedef struct
     int p_tex_bits;
     int misc_bits;
     uint64_t expected_bits;
+    double expected_vbv;
     float new_qscale;
     int new_qp;
     int i_count;
 <at>  <at>  -331,7 +332,7  <at>  <at>  int x264_ratecontrol_new( x264_t *h )
         rc->rate_tolerance = 0.01;
     }

-    h->mb.b_variable_qp = (rc->b_vbv && !rc->b_2pass) || h->param.rc.i_aq_mode;
+    h->mb.b_variable_qp = rc->b_vbv || h->param.rc.i_aq_mode;

     if( rc->b_abr )
     {
 <at>  <at>  -851,17 +852,25  <at>  <at>  double predict_row_size( x264_t *h, int y, int qp )
     return (pred_s + pred_t) / 2;
 }

-double predict_row_size_sum( x264_t *h, int y, int qp )
+double row_bits_so_far( x264_t *h, int y )
 {
     int i;
     double bits = 0;
     for( i = 0; i <= y; i++ )
         bits += h->fdec->i_row_bits[i];
+    return bits;
+}
+
+double predict_row_size_sum( x264_t *h, int y, int qp )
+{
+    int i;
+    double bits = row_bits_so_far(h, y);
     for( i = y+1; i < h->sps->i_mb_height; i++ )
         bits += predict_row_size( h, i, qp );
     return bits;
 }

+
 void x264_ratecontrol_mb( x264_t *h, int bits )
 {
     x264_ratecontrol_t *rc = h->rc;
 <at>  <at>  -873,7 +882,7  <at>  <at>  void x264_ratecontrol_mb( x264_t *h, int bits )
     rc->qpa_rc += rc->qpm;
     rc->qpa_aq += h->mb.i_qp;

-    if( h->mb.i_mb_x != h->sps->i_mb_width - 1 || !rc->b_vbv || rc->b_2pass )
+    if( h->mb.i_mb_x != h->sps->i_mb_width - 1 || !rc->b_vbv)
         return;

     h->fdec->i_row_qp[y] = rc->qpm;
 <at>  <at>  -883,9 +892,8  <at>  <at>  void x264_ratecontrol_mb( x264_t *h, int bits )
         /* B-frames shouldn't use lower QP than their reference frames */
         if( y < h->sps->i_mb_height-1 )
         {
-            rc->qpm = X264_MAX( rc->qp,
-                      X264_MIN( h->fref0[0]->i_row_qp[y+1],
-                                h->fref1[0]->i_row_qp[y+1] ));
+            int avg_qp = (h->fref0[0]->i_row_qp[y+1]+h->fref1[0]->i_row_qp[y+1])*0.5+rc->pb_offset *
((h->fenc->i_type == X264_TYPE_BREF) ? 0.5 : 1);
+            rc->qpm = X264_MIN(X264_MAX( rc->qp, avg_qp), 51); //avg_qp could go higher than 51 due to pb_offset
         }
     }
     else
 <at>  <at>  -901,20 +909,42  <at>  <at>  void x264_ratecontrol_mb( x264_t *h, int bits )
             int i_qp_max = X264_MIN( prev_row_qp + h->param.rc.i_qp_step, h->param.rc.i_qp_max );
             int i_qp_min = X264_MAX( prev_row_qp - h->param.rc.i_qp_step, h->param.rc.i_qp_min );
             float buffer_left_planned = rc->buffer_fill - rc->frame_size_planned;
+            float rc_tol = 1;
+            float headroom = 0;
+
+           /* Don't modify the row QPs until a sufficent amount of the bits of the frame have been processed, in case a
flat */
+           /* area at the top of the frame was measured inaccurately. */
+            if(row_bits_so_far(h,y) < 0.05 * rc->frame_size_planned)
+            {
+                return;
+            }
+ 
+            headroom = buffer_left_planned/rc->buffer_size;
+            if(h->sh.i_type != SLICE_TYPE_I)
+                headroom /= 2;
+            rc_tol += headroom;

             if( !rc->b_vbv_min_rate )
                 i_qp_min = X264_MAX( i_qp_min, h->sh.i_qp );

             while( rc->qpm < i_qp_max
-                   && (b1 > rc->frame_size_planned * 1.15
+                   && (b1 > rc->frame_size_planned * rc_tol
                     || (rc->buffer_fill - b1 < buffer_left_planned * 0.5)))
             {
                 rc->qpm ++;
                 b1 = predict_row_size_sum( h, y, rc->qpm );
             }

+            /* avoid VBV underflow */
+            while( (rc->qpm < h->param.rc.i_qp_max)
+                   && (rc->buffer_fill - b1 < rc->buffer_size * 0.005))
+            {
+                rc->qpm ++;
+                b1 = predict_row_size_sum( h, y, rc->qpm );
+            }
+
             while( rc->qpm > i_qp_min
-                   && buffer_left_planned > rc->buffer_size * 0.4
+                   && ((buffer_left_planned > rc->buffer_size * 0.4) || rc->qpm > h->fdec->i_row_qp[0])
                    && ((b1 < rc->frame_size_planned * 0.8 && rc->qpm <= prev_row_qp)
                      || b1 < (rc->buffer_fill - rc->buffer_size + rc->buffer_rate) * 1.1) )
             {
 <at>  <at>  -1249,7 +1279,7  <at>  <at>  static void update_vbv( x264_t *h, int bits )
         return;

     rct->buffer_fill_final += rct->buffer_rate - bits;
-    if( rct->buffer_fill_final < 0 && !rct->b_2pass )
+    if( rct->buffer_fill_final < 0 )
         x264_log( h, X264_LOG_WARNING, "VBV underflow (%.0f bits)\n", rct->buffer_fill_final );
     rct->buffer_fill_final = x264_clip3f( rct->buffer_fill_final, 0, rct->buffer_size );
 }
 <at>  <at>  -1425,6 +1455,23  <at>  <at>  static float rate_estimate_qscale( x264_t *h )
                 double w = x264_clip3f( time*100, 0.0, 1.0 );
                 q *= pow( (double)total_bits / rcc->expected_bits_sum, w );
             }
+            if( rcc->b_vbv )
+            {
+                double expected_size = qscale2bits(&rce, q);
+                double expected_vbv = rcc->buffer_fill + rcc->buffer_rate - expected_size;
+                if( (expected_vbv < rcc->buffer_size*.4) && (expected_vbv < rce.expected_vbv) )
+                {
+                    double qmax = (expected_vbv < rcc->buffer_size*.15) ? lmax : q*1.5;
+                    double size_constraint = 1 + rce.expected_vbv/rcc->buffer_size;
+                    while( (expected_vbv < rce.expected_vbv/size_constraint) && (q < qmax) )
+                    {
+                        q *= 1.05;
+                        expected_size = qscale2bits(&rce, q);
+                        expected_vbv = rcc->buffer_fill + rcc->buffer_rate - expected_size;
+                    }
+                }
+                rcc->last_satd = x264_rc_analyse_slice( h );
+            }
             q = x264_clip3f( q, lmin, lmax );
         }
         else /* 1pass ABR */
 <at>  <at>  -1509,10 +1556,13  <at>  <at>  static float rate_estimate_qscale( x264_t *h )
         rcc->last_qscale_for[pict_type] =
         rcc->last_qscale = q;

-        if( !rcc->b_2pass && h->fenc->i_frame == 0 )
+        if( !(rcc->b_2pass && !rcc->b_vbv) && h->fenc->i_frame == 0 )
             rcc->last_qscale_for[SLICE_TYPE_P] = q;

-        rcc->frame_size_planned = predict_size( &rcc->pred[h->sh.i_type], q, rcc->last_satd );
+        if( rcc->b_2pass && rcc->b_vbv)
+            rcc->frame_size_planned = qscale2bits(&rce, q);
+        else
+            rcc->frame_size_planned = predict_size( &rcc->pred[h->sh.i_type], q, rcc->last_satd );
         return q;
     }
 }
 <at>  <at>  -1555,6 +1605,133  <at>  <at>  void x264_thread_sync_ratecontrol( x264_t *cur, x264_t *prev, x264_t *next )
     /* the rest of the variables are either constant or thread-local */
 }

+static int find_underflow( x264_t *h, double *fills, int *t0, int *t1, int over )
+{
+    /* find an interval ending on an overflow or underflow (depending on whether
+     * we're adding or removing bits), and starting on the earliest frame that
+     * can influence the buffer fill of that end frame. */
+    x264_ratecontrol_t *rcc = h->rc;
+    const double buffer_min = (over ? .1 : .1) * rcc->buffer_size;
+    const double buffer_max = .9 * rcc->buffer_size;
+    double fill = fills[*t0-1];
+    double parity = over ? 1. : -1.;
+    int i, start=-1, end=-1;
+    for(i=*t0; i<rcc->num_entries; i++)
+    {
+        fill += (rcc->buffer_rate - qscale2bits(&rcc->entry[i], rcc->entry[i].new_qscale)) * parity;
+        fill = x264_clip3f(fill, 0, rcc->buffer_size);
+        fills[i] = fill;
+        if(fill <= buffer_min || i == 0)
+        {
+            if(end >= 0)
+                break;
+            start = i;
+        }
+        else if(fill >= buffer_max && start >= 0)
+            end = i;
+    }
+    *t0 = start;
+    *t1 = end;
+    return start>=0 && end>=0;
+}
+
+static int fix_underflow( x264_t *h, int t0, int t1, double adjustment, double qscale_min, double qscale_max)
+{
+    x264_ratecontrol_t *rcc = h->rc;
+    double qscale_orig, qscale_new;
+    int i;
+    int adjusted = 0;
+    if(t0 > 0)
+        t0++;
+    for(i=t0; i<=t1; i++) {
+        qscale_orig = rcc->entry[i].new_qscale;
+        qscale_new =  qscale_orig * adjustment;
+        qscale_new = x264_clip3f(qscale_new, qscale_min, qscale_max);
+        rcc->entry[i].new_qscale  = qscale_new;
+        adjusted = adjusted || (qscale_new != qscale_orig);
+    }
+    return adjusted;
+}
+
+static double count_expected_bits( x264_t *h )
+{
+    x264_ratecontrol_t *rcc = h->rc;
+    double expected_bits = 0;
+    int i;
+    for(i=0; i<rcc->num_entries; i++)
+    {
+        ratecontrol_entry_t *rce = &rcc->entry[i];
+        rce->expected_bits = expected_bits;
+        expected_bits += qscale2bits(rce, rce->new_qscale);
+    }
+    return expected_bits;
+}
+
+static void vbv_pass2( x264_t *h )
+{
+    /* foreach interval of buffer_full .. underflow
+     *   uniformly increase the qp of all frames in the interval until either
+     *     buffer is full at some intermediate frame
+     *     or the last frame in the interval no longer underflows
+     * recompute intervals and repeat
+     * then do the converse to put bits back into overflow areas until target size is met */
+
+    x264_ratecontrol_t *rcc = h->rc;
+    double *fills = x264_malloc((rcc->num_entries+1)*sizeof(double));
+    double all_available_bits = h->param.rc.i_bitrate * 1000. * rcc->num_entries / rcc->fps;
+    double expected_bits = 0;
+    double adjustment;
+    double prev_bits = 0;
+    int i, t0, t1;
+    double qscale_min = qp2qscale(h->param.rc.i_qp_min);
+    double qscale_max = qp2qscale(h->param.rc.i_qp_max);
+    int iterations = 0;
+    int adj_min, adj_max;
+
+    fills++;
+
+    //adjust overall stream size
+    do {
+        iterations++;
+        prev_bits = expected_bits;
+
+        if (expected_bits != 0) { //not first iteration
+            adjustment = X264_MAX(X264_MIN(expected_bits / all_available_bits, 0.999), 0.9);
+            fills[-1] = rcc->buffer_size * h->param.rc.f_vbv_buffer_init;
+            t0 = 0;
+            //fix overflows
+            adj_min = 1;
+            while(adj_min && find_underflow(h, fills, &t0, &t1, 1))
+            {
+                adj_min = fix_underflow(h, t0, t1, adjustment, qscale_min, qscale_max);
+                t0 = t1;
+            }
+        }
+
+        fills[-1] = rcc->buffer_size * (1. - h->param.rc.f_vbv_buffer_init);
+        t0 = 0;
+        //fix underflows - should be done after overflow, as we'd better undersize target than underflowing VBV
+        adj_max = 1;
+        while(adj_max && find_underflow(h, fills, &t0, &t1, 0))
+        {
+            adj_max = fix_underflow(h, t0, t1, 1.001, qscale_min, qscale_max);
+        }
+
+        expected_bits = count_expected_bits(h);
+    } while(expected_bits < .995*all_available_bits && expected_bits > prev_bits);
+
+    if (!adj_max)
+        x264_log( h, X264_LOG_WARNING, "vbv-maxrate issue, qpmax or vbv-maxrate too low\n");
+
+    //store expected vbv filling values for tracking when encoding
+    for(i=0; i<rcc->num_entries; i++)
+        rcc->entry[i].expected_vbv = rcc->buffer_size - fills[i];
+
+//    x264_log( h, X264_LOG_INFO, "VBV RC initial iterations: %d \n", iterations);
+
+    x264_free(fills-1);
+}
+
 static int init_pass2( x264_t *h )
 {
     x264_ratecontrol_t *rcc = h->rc;
 <at>  <at>  -1643,7 +1820,6  <at>  <at>  static int init_pass2( x264_t *h )
         rcc->last_non_b_pict_type = -1;
         rcc->last_accum_p_norm = 1;
         rcc->accum_p_norm = 0;
-        rcc->buffer_fill = rcc->buffer_size * h->param.rc.f_vbv_buffer_init;

         /* find qscale */
         for(i=0; i<rcc->num_entries; i++){
 <at>  <at>  -1680,18 +1856,11  <at>  <at>  static int init_pass2( x264_t *h )
         /* find expected bits */
         for(i=0; i<rcc->num_entries; i++){
             ratecontrol_entry_t *rce = &rcc->entry[i];
-            double bits;
             rce->new_qscale = clip_qscale(h, rce->pict_type, blurred_qscale[i]);
             assert(rce->new_qscale >= 0);
-            bits = qscale2bits(rce, rce->new_qscale);
-
-            rce->expected_bits = expected_bits;
-            expected_bits += bits;
-            update_vbv(h, bits);
-            rcc->buffer_fill = rcc->buffer_fill_final;
+            expected_bits += qscale2bits(rce, rce->new_qscale);
         }

-//printf("expected:%llu available:%llu factor:%lf avgQ:%lf\n", (uint64_t)expected_bits,
all_available_bits, rate_factor);
         if(expected_bits > all_available_bits) rate_factor -= step;
     }

 <at>  <at>  -1699,6 +1868,10  <at>  <at>  static int init_pass2( x264_t *h )
     if(filter_size > 1)
         x264_free(blurred_qscale);

+    if(rcc->b_vbv)
+        vbv_pass2(h);
+    expected_bits = count_expected_bits(h);
+
     if(fabs(expected_bits/all_available_bits - 1.0) > 0.01)
     {
         double avgq = 0;
 <at>  <at>  -1706,7 +1879,8  <at>  <at>  static int init_pass2( x264_t *h )
             avgq += rcc->entry[i].new_qscale;
         avgq = qscale2qp(avgq / rcc->num_entries);

-        x264_log(h, X264_LOG_WARNING, "Error: 2pass curve failed to converge\n");
+        if ((expected_bits > all_available_bits) || (!rcc->b_vbv))
+            x264_log(h, X264_LOG_WARNING, "Error: 2pass curve failed to converge\n");
         x264_log(h, X264_LOG_WARNING, "target: %.2f kbit/s, expected: %.2f kbit/s, avg QP: %.4f\n",
                  (float)h->param.rc.i_bitrate,
                  expected_bits * rcc->fps / (rcc->num_entries * 1000.),
 <at>  <at>  -1725,7 +1899,7  <at>  <at>  static int init_pass2( x264_t *h )
             else
                 x264_log(h, X264_LOG_WARNING, "try increasing target bitrate\n");
         }
-        else
+        else if(!(rcc->b_2pass && rcc->b_vbv))
             x264_log(h, X264_LOG_WARNING, "internal error\n");
     }

_______________________________________________
x264-devel mailing list
x264-devel <at> videolan.org
http://mailman.videolan.org/listinfo/x264-devel

Gmane