Joakim Tjernlund | 1 Jan 2003 15:44
Picon

gcc optimizes loops badly.


I have spent some time to optimize the crc32 function since JFFS2 uses it heavily. I found that
gcc 2.95.3 optimizes loops badly, even gcc 2.96 RH produces better code for x86 in some cases.

So I optimized the C code a bit and got much better results.
Now I wounder how recent(>= 3.2) gcc performs. Could somebody run gcc -S -O2 -mregnames on
functions below and mail me the results?

 Jocke

These are different version of the same  crc32 function:
#include <linux/types.h>

extern  const __u32 crc32_table[256];

/* Return a 32-bit CRC of the contents of the buffer. */

__u32 crc32org(__u32 val, const void *ss, unsigned int len)
{
        const unsigned char *s = ss;

        while (len--){
          val = crc32_table[(val ^ *s++) & 0xff] ^ (val >> 8);
        }
        return val;
}
__u32 crc32do_while(__u32 val, const void *ss, unsigned int len)
{
        const unsigned char *s = ss;

(Continue reading)

Vishwanath | 1 Jan 2003 07:45

ioremap64 and remap_page_range in 440GP


Hi All,

In PPC Linux, to bring a particular IO page onto Linux addressing space, we
use ioremap64(). Is my understanding correct? If I want to map this remapped
address on user space, I have to use  remap_page_range() of kernel in my
driver.  remap_page_range() takes a 32-bit physical address as argument. But
in 440GP, all addresses are 36-bit with respect to processor. Is there any
64-bit equivalent of remap_page_range(). If it exists, how to use it?

Please help me out since I am stuck while mapping an external peripheral
registers onto user space.

Thanks in advance.

Regards,
vishwa

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

YanMin Qiao | 1 Jan 2003 18:03
Picon

how to put _start at 0x100


We have written an assembly file to initialize our MPC860 custom board.
How can we compile it and objcopy it into a pure binary file with
section _start
beginning at 0x100 from the head?
In our file, VectorTable resides immediately in front of the _start section.
We've looked into as, ld, objcopy man pages and got lost.
Thanks in advance.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

Joakim Tjernlund | 2 Jan 2003 16:57
Picon

RE: gcc optimizes loops badly.


Hi Daniel

Thanks for running the test for me.

The option "-mregnames" exists only on gcc for PPC.

Results are as before for x86. crc32do_while is the
winner followed by crc32do_while_dec.
Gcc should be able generate the same code for
crc32org and crc32do_while, it's a simple optimization.
crc32do_while_dec is possibly only useful on PPC.

On PPC I expect crc32do_while_dec to be the winner.
Do you have a gcc 3.2 which will generate PPC assembly?

 Jocke
PS.
  You don't have to be on the list to post to it. I will
  CC the list for now.
>
>
> Jocke,
>
>    The option "-mregnames" no longer exists in version 3.2 of gcc.  I
> couldn't find anything equivalent.  I ran it without that option (gcc -S
> -O2 testcode.c) and produced the following on a i686 RedHat 7.3 box
> using gcc 3.2  (gcc 3.2.1 is the latest release I believe)
>
>    I am not on the list, hence I cannot CC the list.  This message was
(Continue reading)

Eisenhut, Daniel (MED | 2 Jan 2003 17:12
Picon

RE: gcc optimizes loops badly.


Jocke,

   It just happens that I have a ppc cross compiler using gcc v3.2.1
handy.  "-mregnames" worked fine with it.  Here's the output file:

	.file	"testcode.c"
	.section	".text"
	.align 2
	.globl crc32org
	.type	crc32org, <at> function
crc32org:
	cmpwi %cr0,%r5,0
	addi %r5,%r5,-1
	beqlr- %cr0
	addi %r5,%r5,1
	lis %r9,crc32_table <at> ha
	mtctr %r5
	la %r10,crc32_table <at> l(%r9)
.L18:
	lbz %r0,0(%r4)
	srwi %r11,%r3,8
	addi %r4,%r4,1
	xor %r0,%r3,%r0
	rlwinm %r0,%r0,2,22,29
	lwzx %r9,%r10,%r0
	xor %r3,%r9,%r11
	bdnz .L18
	blr
.Lfe1:
(Continue reading)

Kerl, John | 2 Jan 2003 17:49
Favicon

RE: how to put _start at 0x100


What I use is:

myfile.bin: $(OBJS)
        powerpc-linux-ld -Bstatic -Ttext 0x00000000 \
		--oformat binary $(OBJS) -o myfile.bin

where $(OBJS) is the list of object files.

-----Original Message-----
From: YanMin Qiao [mailto:sepherosa <at> sjtu.edu.cn]
Sent: Wednesday, January 01, 2003 10:03 AM
To: linuxppc-embedded <at> lists.linuxppc.org; wd <at> denx.de
Subject: how to put _start at 0x100

We have written an assembly file to initialize our MPC860 custom board.
How can we compile it and objcopy it into a pure binary file with
section _start
beginning at 0x100 from the head?
In our file, VectorTable resides immediately in front of the _start section.
We've looked into as, ld, objcopy man pages and got lost.
Thanks in advance.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

Joakim Tjernlund | 2 Jan 2003 18:34
Picon

RE: gcc optimizes loops badly.


Thanks Daniel

The results are somewhat better. Now the loop part for crc32org is better(same as crc32do_while) but
crc32org needs 2 more instructions to prepare the loop.
crc32do_while_dec is still the winner with one less instruction in the loop.

It seems like there is lots of loop optimization work to do in gcc when it comes
to the PPC arch.

  Jocke

>
> Jocke,
>
>    It just happens that I have a ppc cross compiler using gcc v3.2.1
> handy.  "-mregnames" worked fine with it.  Here's the output file:
>
>
> 	.file	"testcode.c"
> 	.section	".text"
> 	.align 2
> 	.globl crc32org
> 	.type	crc32org, <at> function
> crc32org:
> 	cmpwi %cr0,%r5,0
> 	addi %r5,%r5,-1
> 	beqlr- %cr0
> 	addi %r5,%r5,1
> 	lis %r9,crc32_table <at> ha
(Continue reading)

Dan Malek | 2 Jan 2003 18:40

Re: linuxppc_2_4_devel patch: 8xx FEC extensions


Tom Rini wrote:

> Erm.  That sounds bad.  So the values can change under the functions
> feet?  Even if the values tested for are only tested by this function,
> that still sounds like a bad idea.

If you look at the code, you will notice that the object that is declared
volatile is part of a shared driver data structure.  It isn't a hardware
register.  The reason it was originally declared volatile (and is a valid
programming method) was the MII interrupt would set this value, and later
a normal thread of execution would read this value in another function.
These functions can, which is also a valid compiler optimization technique,
cache these global values in a processor register.  In order for the normal
thread of execution to see the update from the interrupt handler, you have
to do something to force it to use the shared data structure, not the
optimization.  It's no big deal, and quite a common multithreaded programming
practice.

I don't understand Wolfgang's original "multiple access" comment since it's
only accessing a data structure in memory, unless there was some weird software
timing that occurred here due to the additional memory accesses.  There shouldn't
be any race conditions even though the status is updated in multiple C program
statements.

Thanks.

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
(Continue reading)

Dan Malek | 2 Jan 2003 19:11

Re: linuxppc_2_4_devel patch: 8xx FEC extensions


Wolfgang Denk wrote:

> I don't see where a "volatile" gets dropped in any really significant
> way.

No, it doesn't.  The function logically does exactly the same thing,
so I don't understand why these changes were necessary. :-)

The original code simply accessed the 'phy_status' in the data structure
as a volatile object.  The modification from Wolfgang makes any data
structure access volatile, and then updates the 'phy_status' only once
at the end.

If it makes something work better for Wolfgang, that's fine :-).  To me,
it seems to be covering up some other timing problem since the only
thing different is how many times a particular memory location is accessed.

In my past messages, I was just trying to explain why this had to be
volatile, and that is still a valid reason.  This patch doesn't affect
that in any way.

Thanks.

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

Tom Rini | 2 Jan 2003 19:21

Re: linuxppc_2_4_devel patch: 8xx FEC extensions


On Thu, Jan 02, 2003 at 01:11:50PM -0500, Dan Malek wrote:
> Wolfgang Denk wrote:
>
>
> >I don't see where a "volatile" gets dropped in any really significant
> >way.
>
> No, it doesn't.  The function logically does exactly the same thing,
> so I don't understand why these changes were necessary. :-)
>
> The original code simply accessed the 'phy_status' in the data structure
> as a volatile object.  The modification from Wolfgang makes any data
> structure access volatile, and then updates the 'phy_status' only once
> at the end.
>
> If it makes something work better for Wolfgang, that's fine :-).  To me,
> it seems to be covering up some other timing problem since the only
> thing different is how many times a particular memory location is accessed.

Would the patch, along with a comment about this potentially covering up
HW timing issues (since this seems to have 'fixed' a problem on a
certain HW config) be OK with everyone?

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

(Continue reading)


Gmane