Mauricio Alvarez | 1 May 2006 11:09
Favicon

Re: Compilation problem with x264 on Dual Opteron setup (SSE3)

Loren Merritt wrote:
> No, despite what intel would have you believe, SSE3 provides nothing at
> all for video codecs. We tried it, at there was no measurable speed
> difference.

And , what about the LDDQU (load quad word unaligned) instruction?

Can be this instruction useful in the motion estimation routines in
which there are access to macroblocks/sub-macroblock with misaligned
addresses?

Mauricio Alvarez

> 
> --Loren Merritt
> 

Guillaume POIRIER | 1 May 2006 11:48
Picon
Gravatar

Re: Compilation problem with x264 on Dual Opteron setup (SSE3)

Hi,

On 5/1/06, Mauricio Alvarez <alvarez <at> ac.upc.edu> wrote:
> Loren Merritt wrote:
> > No, despite what intel would have you believe, SSE3 provides nothing at
> > all for video codecs. We tried it, at there was no measurable speed
> > difference.
>
> And , what about the LDDQU (load quad word unaligned) instruction?
>
> Can be this instruction useful in the motion estimation routines in
> which there are access to macroblocks/sub-macroblock with misaligned
> addresses?

I made a patch that made use of this instruction, but since I don't
have any machine that supports SSE3, I wasn't able to test it.

It's available here, if you want:
http://tuxrip.free.fr/transperl/MPlayer/SSE3_lddqu.2.diff

I don't know if it still applies cleanly though.

The main problem of that patch is that it unconditionally replaces all
movdqu with lddqu, which isn't very smart. Intel optimization guide
does state quite clearly that it's not how it should be done.
What should be done is: instrument the code in a way that can tell you
what are the loads that are always badly unaligned, and use lddqu only
in these cases (loads that are sometimes aligned, sometimes not do not
benefit from using lddqu).

(Continue reading)

Loren Merritt | 1 May 2006 23:15

Re: Compilation problem with x264 on Dual Opteron setup (SSE3)

On Mon, 1 May 2006, Guillaume POIRIER wrote:
> The main problem of that patch is that it unconditionally replaces all
> movdqu with lddqu, which isn't very smart. Intel optimization guide
> does state quite clearly that it's not how it should be done.
> What should be done is: instrument the code in a way that can tell you
> what are the loads that are always badly unaligned, and use lddqu only
> in these cases (loads that are sometimes aligned, sometimes not do not
> benefit from using lddqu).

All the variants of SAD are unaligned, SATD and SSD are usually aligned.

--Loren Merritt

Pedro Tumusok | 2 May 2006 16:29
Picon

Re: Compilation problem with x264 on Dual Opteron setup



On 4/29/06, Pedro Tumusok <pedro.tumusok <at> gmail.com> wrote:
Hi,

We are getting the following error msgs when trying to compile x264 on a machine with dual opteron 280 cpu's.

/usr/bin/ld: warning: i386:x86-64 architecture of input file `libx264.a( dct-a.o)' is incompatible with i386 output
/usr/bin/ld: warning: i386:x86-64 architecture of input file `libx264.a(cpu-a.o)' is incompatible with i386 output
/usr/bin/ld: warning: i386:x86-64 architecture of input file `libx264.a(pixel-a.o)' is incompatible with i386 output
/usr/bin/ld: warning: i386:x86-64 architecture of input file `libx264.a(mc-a.o)' is incompatible with i386 output
/usr/bin/ld: warning: i386:x86-64 architecture of input file `libx264.a(mc-a2.o)' is incompatible with i386 output
/usr/bin/ld: warning: i386:x86-64 architecture of input file `libx264.a(predict-a.o)' is incompatible with i386 output
/usr/bin/ld: warning: i386:x86-64 architecture of input file `libx264.a(pixel-sse2.o)' is incompatible with i386 output
/usr/bin/ld: warning: i386:x86-64 architecture of input file `libx264.a(quant-a.o)' is incompatible with i386 output
/usr/bin/ld: warning: i386:x86-64 architecture of input file `libx264.a(deblock-a.o)' is incompatible with i386 output


Seems the problem was that we where running Debian Sarge which is 32bit and compiled x264 as 64 bit.
Did a reinstall of the server from the testing/etch and  then the msg above went away.
But now we get the following msgs from VLC, I know its not just x264, but if anybody got any pointers, it would be appreciated.

cannot load module `/usr/local/lib/vlc/codec/libx264_plugin.so' (libx264.so.46: cannot open shared object file: No such file or directory)

jpt <at> farnsworth:/usr/local/lib/vlc/codec$ ls -l
total 588
-rwxr-xr-x 1 root staff  14722 2006-05-02 18:08 liba52_plugin.so
-rwxr-xr-x 1 root staff  21850 2006-05-02 18:08 libadpcm_plugin.so
-rwxr-xr-x 1 root staff  26261 2006-05-02 18:08 libaraw_plugin.so
-rwxr-xr-x 1 root staff  15232 2006-05-02 18:08 libcinepak_plugin.so
-rwxr-xr-x 1 root staff  44484 2006-05-02 18:08 libcmml_plugin.so
-rwxr-xr-x 1 root staff  15329 2006-05-02 18:08 libcvdsub_plugin.so
-rwxr-xr-x 1 root staff  15985 2006-05-02 18:08 libdts_plugin.so
-rwxr-xr-x 1 root staff 123271 2006-05-02 18:08 libdvbsub_plugin.so
-rwxr-xr-x 1 root staff  13446 2006-05-02 18:08 libfake_plugin.so
-rwxr-xr-x 1 root staff  87327 2006-05-02 18:08 libffmpeg_plugin.so
-rwxr-xr-x 1 root staff  17579 2006-05-02 18:08 libflacdec_plugin.so
-rwxr-xr-x 1 root staff  15502 2006-05-02 18:08 liblibmpeg2_plugin.so
-rwxr-xr-x 1 root staff  11736 2006-05-02 18:08 liblpcm_plugin.so
-rwxr-xr-x 1 root staff  15380 2006-05-02 18:08 libmpeg_audio_plugin.so
-rwxr-xr-x 1 root staff  11772 2006-05-02 18:08 librawvideo_plugin.so
-rwxr-xr-x 1 root staff  16444 2006-05-02 18:08 libspudec_plugin.so
-rwxr-xr-x 1 root staff  24365 2006-05-02 18:08 libsubsdec_plugin.so
-rwxr-xr-x 1 root staff  15457 2006-05-02 18:08 libsvcdsub_plugin.so
-rwxr-xr-x 1 root staff  23110 2006-05-02 18:08 libvorbis_plugin.so
-rwxr-xr-x 1 root staff  29031 2006-05-02 18:08 libx264_plugin.so





--
Best regards / Mvh
Jan Pedro Tumusok

Another fella told me, he had a sister who looked just fine.
Instead of bein' my deliv'rance, she had a strange resemblance
To a cat named Frankenstein
Guillaume POIRIER | 2 May 2006 22:15
Picon
Gravatar

Re: Compilation problem with x264 on Dual Opteron setup (SSE3)

Hi,

On 5/1/06, Loren Merritt <lorenm <at> u.washington.edu> wrote:
> On Mon, 1 May 2006, Guillaume POIRIER wrote:
> > The main problem of that patch is that it unconditionally replaces all
> > movdqu with lddqu, which isn't very smart. Intel optimization guide
> > does state quite clearly that it's not how it should be done.
> > What should be done is: instrument the code in a way that can tell you
> > what are the loads that are always badly unaligned, and use lddqu only
> > in these cases (loads that are sometimes aligned, sometimes not do not
> > benefit from using lddqu).
>
> All the variants of SAD are unaligned, SATD and SSD are usually aligned.

Okay, here's an updated version of the patch that only uses lddqu in
sad routines: http://tuxrip.free.fr/transperl/MPlayer/SSE3_lddqu.3.diff

Please test and report if it helps a bit (I doubt it).

Guillaume
--
I am disillusioned enough to know that no man's opinion on any subject
is worth a damn unless backed up with enough genuine information to
make him really know what he's talking about.

-- H. P. Lovecraft (about the flamewars on FFmpeg and MPlayer-dev mailing lists)
http://www.brainyquote.com/quotes/quotes/h/hplovecr278144.html

Pedro Tumusok | 3 May 2006 01:20
Picon

Re: Compilation problem with x264 on Dual Opteron setup



On 5/2/06, Pedro Tumusok <pedro.tumusok <at> gmail.com> wrote:



Seems the problem was that we where running Debian Sarge which is 32bit and compiled x264 as 64 bit.
Did a reinstall of the server from the testing/etch and  then the msg above went away.
But now we get the following msgs from VLC, I know its not just x264, but if anybody got any pointers, it would be appreciated.

cannot load module `/usr/local/lib/vlc/codec/libx264_plugin.so' (libx264.so.46: cannot open shared object file: No such file or directory)

Okay the solution was to add /usr/local/lib to the /etc/ld.so.conf file and run ldconfig. Then a re-compile of x264 and VLC and it worked.

--
Best regards / Mvh
Jan Pedro Tumusok

Another fella told me, he had a sister who looked just fine.
Instead of bein' my deliv'rance, she had a strange resemblance
To a cat named Frankenstein
Pedro Tumusok | 3 May 2006 11:49
Picon

Another CPU related question

Hi,

Since my innocent SSE3 questions spurned some replies, I tought I should ask another questions.

The Cell cpu from IBM/Sony el, the PPU is some sort of PPC and according to what I read, PPC software should be recompilable and able to run on the Cell. Albeit without using the SPU's, and according to different presentations and concept proofs etc, the Cell with SPU is a streaming device ie it can use the SPU's to offload bits of work.

There has been some articles about it being able to decode 48 mpeg2 streams at once or something like that, so then to my question, do you belive it would be able to show similar kind of performance in encoding?



--
Best regards / Mvh
Jan Pedro Tumusok

Another fella told me, he had a sister who looked just fine.
Instead of bein' my deliv'rance, she had a strange resemblance
To a cat named Frankenstein

Steven Tondeur | 3 May 2006 16:25
Picon
Favicon

Bug in expanding borders to mod16?

Hi all,

I think I found a small bug in x264_frame_expand_border_mod16    (frame.c).
At line 257, the loop for expanding vertically is immediately followed 
by a semicolon :

for( y = i_height; y < i_height + i_pady; y++ );
    memcpy(....)

Shouldn't it be removed?

Regards,
Steven

Andrea Barbieri | 3 May 2006 16:57

Re: Bug in expanding borders to mod16?

Hello,

looking back at the commits history, x264_frame_expand_border_mod16 was
introduced in Revision 327 and never changed since then.

looks like a typo to me...

andrea

ST:=Steven Tondeur

ST> Hi all,
ST> 
ST> I think I found a small bug in x264_frame_expand_border_mod16    (frame.c).
ST> At line 257, the loop for expanding vertically is immediately followed by a
ST> semicolon :
ST> 
ST> for( y = i_height; y < i_height + i_pady; y++ );
ST>    memcpy(....)
ST> 
ST> Shouldn't it be removed?
ST> 
ST> Regards,
ST> Steven
ST> 
ST> 

--

-- 
Andrea Barbieri
KeyID=0x034DFD5A
KeyFingerprint=C1 68 EA 9A 71 89 53 8D  21 4F 12 81 A7 52 9F 32  03 4D FD 5A
Moving Image Research, The Workshop, Hampton Lane, Bristol, BS6 6LE, UK
Tel +44 117 9732200, FAX +44 117 9732210
http://www.movingimageresearch.com/

Loren Merritt | 3 May 2006 19:30

Re: Another CPU related question

On Wed, 3 May 2006, Pedro Tumusok wrote:

> Hi,
>
> Since my innocent SSE3 questions spurned some replies, I tought I should ask
> another questions.
>
> The Cell cpu from IBM/Sony el, the PPU is some sort of PPC and according to
> what I read, PPC software should be recompilable and able to run on the
> Cell. Albeit without using the SPU's, and according to different
> presentations and concept proofs etc, the Cell with SPU is a streaming
> device ie it can use the SPU's to offload bits of work.
>
> There has been some articles about it being able to decode 48 mpeg2 streams
> at once or something like that, so then to my question, do you belive it
> would be able to show similar kind of performance in encoding?

48 mpeg2 streams? You don't need a Cell for that, unless they're 1080p or 
something. A dvd takes 2% cpu on my Athlon64 3400, so the only thing 
preventing me from displaying 48 of them is bandwidth to the harddrive 
and video card.

--Loren Merritt


Gmane