William Cohen | 1 Mar 2004 16:58
Picon

Re: Intel ia32e support


Andi Kleen wrote:
> John Levon <levon <at> movementarian.org> writes:
> 
> 
>>On Wed, Feb 25, 2004 at 12:05:43PM -0500, Will Cohen wrote:
>>
>>
>>>+inline static int __init is_ia32e(void)
>>>+{
>>>+    return (test_bit(X86_FEATURE_LM, current_cpu_data.x86_capability));
>>>+}
>>
>>What's LM ? Is this at all like the Intel-preferred way of testing ?
> 
> 
> I don't think it's a good idea to only test for LM(=long mode) here.
> Future Intel CPUs may have completely different performance counters,
> but likely will still have LM. And the current shipping P4Es are
> Prescotts and have these counters, but long mode is not enabled.
> 
> Better test for family/model (family == 15, model >= 3).

You are right that using the model number will be more reliable. For the 
time being it appears that the ia32e performance monitoring hardware is 
the same as p4. Future generation of intel processors may have different 
performance monitoring hardware, but have a variety of LM.

> I must admit I never even liked the "x86_64" for hammer, because an
> hammer can as well run in 32bit legacy mode and then there is no
(Continue reading)

Andi Kleen | 1 Mar 2004 17:23
Picon
Favicon

Re: Intel ia32e support

> 
> >I must admit I never even liked the "x86_64" for hammer, because an
> >hammer can as well run in 32bit legacy mode and then there is no
> >x86-64. I think the event files should be only keyed on the CPU
> >family/mode, not on 32bit/64bit. oprofile really should not
> >care about the 32bit/64bitness, and I think it doesn't except
> >for the naming. Call it always i386/cpu, like i386/p4e (=prescott) 
> 
> So should have Opteron processors been "i386/amd64"? The amd64 

amd64 is the architecture too, that would be a bit confusing.

i386/k8 would be probably better. But changing it now would probably
add even more confusion, so better keep it for now.

> performance monitoring events are a super set of the athlons. I haven't 
> tried it but I would think that those events are still available even 
> when the processor is running in 32-bit legacy mode.

They are.

-Andi

-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
Will Cohen | 2 Mar 2004 16:33
Picon
Favicon

OProfile call-graph expectations of frame pointers

I have been looking over the various backends of GCC of a current cvs 
checkout of GCC to find out how portable the current implementation of 
OProfile call-graph support is going to be. GCC has the define 
CAN_DEBUG_WITHOUT_FP defined in most of the compiler backends. This 
define will cause the compiler to omit the frame pointer when there is 
any optimization (-O1 or greater).

The only architecture that OProfile supports that has frame pointers 
when the optimization is turned on is the i386. The x86_64 omits it. The 
other OProfile platforms: s390 s390x ia64, ppc64, alpha, hppa, sparc, 
and arm all define CAN_DEBUG_WITHOUT_FP. At Red Hat there has been 
discussion about setting the compiler options to omit frame pointers for 
i386 to improve performance.

The use of the stack unwind information available for gdb has been 
mentioned. However, that wouldn't be available to the nmi handler 
routine and would be too expensive to use. It is also unlikely that this 
information would be available/processable for samples in the kernel.

Phil, why doe the daemon need to built with frame pointer? If the daemon 
suffers from random arcs when it is built without frame pointer, 
wouldn't other applications built without the frame pointer have the 
same problem?

-Will

-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
(Continue reading)

John Levon | 2 Mar 2004 18:30
Favicon

Re: OProfile call-graph expectations of frame pointers

On Tue, Mar 02, 2004 at 10:33:11AM -0500, Will Cohen wrote:

> The use of the stack unwind information available for gdb has been 
> mentioned. However, that wouldn't be available to the nmi handler 
> routine and would be too expensive to use. It is also unlikely that this 
> information would be available/processable for samples in the kernel.

I've discussed this with some of the GCC people. Basically, the call
graph support is only going to be useful for system-wide profiling for
as long as the distributions keep the frame pointer enabled.

> Phil, why doe the daemon need to built with frame pointer? If the daemon 
> suffers from random arcs when it is built without frame pointer, 
> wouldn't other applications built without the frame pointer have the 
> same problem?

No - the daemon is special because we mmap() the sample files. So if we
get a random arc that happens to point to one of those, we try and
encode its name in the new sample file, and get horribly confused.

I can't think of a cleaner fix offhand. In fact, I'm not sure the
current fix is really reliable anyway...

regards
john

--

-- 
"Spammers get STABBED by GOD." - Ron Echeverri

-------------------------------------------------------
(Continue reading)

Philippe Elie | 2 Mar 2004 22:29
Picon

Re: OProfile call-graph expectations of frame pointers

On Tue, 02 Mar 2004 at 10:33 +0000, Will Cohen wrote:

> I have been looking over the various backends of GCC of a current cvs 
> checkout of GCC to find out how portable the current implementation of 
> OProfile call-graph support is going to be. GCC has the define 
> CAN_DEBUG_WITHOUT_FP defined in most of the compiler backends. This 
> define will cause the compiler to omit the frame pointer when there is 
> any optimization (-O1 or greater).

I suggest than RH people do some measure, code size and speed, it's not
obvious at all than fp will really improve performance depending
on sub-arch. On P4 I get the following suprising number (gcc 3.3.3):

-O2

with frame pointer
real    0m0.905s
   text	   data	    bss	    dec	    hex	filename
587859    1464    6668  595991   91817 pp/opreport

w/o frame pointer
real    0m0.626s
   text	   data	    bss	    dec	    hex	filename
532798    1268    6988  541054   8417e pp/opreport

three povray run on tomb.pov w/o display at 640*480

-O2 -mcpu=pentium4 -ffast-math -ansi -falign-functions=4 -falign-jumps=4 -falign-loops=4 -mpreferred-stack-boundary=3

with frame pointer
(Continue reading)

Philippe Elie | 2 Mar 2004 22:37
Picon

Re: OProfile call-graph expectations of frame pointers

On Tue, 02 Mar 2004 at 17:30 +0000, John Levon wrote:

> On Tue, Mar 02, 2004 at 10:33:11AM -0500, Will Cohen wrote:
> 
> > Phil, why doe the daemon need to built with frame pointer? If the daemon 
> > suffers from random arcs when it is built without frame pointer, 
> > wouldn't other applications built without the frame pointer have the 
> > same problem?
> 
> No - the daemon is special because we mmap() the sample files. So if we
> get a random arc that happens to point to one of those, we try and
> encode its name in the new sample file, and get horribly confused.
> 
> I can't think of a cleaner fix offhand. In fact, I'm not sure the
> current fix is really reliable anyway...

e.g. calling libc compiled w/o frame pointer from daemon is likely to
show the same problem.

I'm thinking about atomic commit of backtrace in driver

read_pos
write_pos
commit_pos

read_pos is incr by the reader write_pos and commit_pos by NMI, amount of
data available for the reader is calculated between read_pos and commit_pos,
NMI calculate free entry with read_pos and write_pos, if the bt is aborted
for any reason like next frame > previous frame, non aligned ebp etc it's
likely to mean we are tracing a bt w/o frame pointer in this case we don't
(Continue reading)

Will Cohen | 3 Mar 2004 00:32
Picon
Favicon

normal user dumps of oprofile data

I have reworked the patch for opcontrol to minimize the stuff done for 
the user dump by factoring out code that is common betweeen the root and 
normal user. I also removed the unneeded check of "OPROFILE_AVAILABLE" 
in the data dump. The comment that check for the presences of the daemon 
has be changed to reflect that the check doesn't verify that the daemon 
is actually running.

The oproffs.patch changes the /dev/oprofile/dump to be world writeable. 
The combination of the two patches should allow normal users to flush 
sample data.

-Will
--- opcontrol.userdump	2004-03-02 14:37:30.000000000 -0500
+++ opcontrol	2004-03-02 17:21:01.806725157 -0500
 <at>  <at>  -211,6 +211,30  <at>  <at> 
 	fi
 }

+# setup variables related to daemon
+do_init_daemon_vars()
+{
+	# as in op_user.h
+	DIR="/var/lib/oprofile"
+	LOCK_FILE="/var/lib/oprofile/lock"
+	LOG_FILE="$DIR/oprofiled.log"
+	SAMPLES_DIR="$DIR/samples"
+	CURRENT_SAMPLES_DIR=${SAMPLES_DIR}/current
+}
(Continue reading)

Philippe Elie | 3 Mar 2004 09:14
Picon

Re: normal user dumps of oprofile data

On Tue, 02 Mar 2004 at 18:32 +0000, Will Cohen wrote:

> I have reworked the patch for opcontrol to minimize the stuff done for 
> the user dump by factoring out code that is common betweeen the root and 
> normal user. I also removed the unneeded check of "OPROFILE_AVAILABLE" 
> in the data dump. The comment that check for the presences of the daemon 
> has be changed to reflect that the check doesn't verify that the daemon 
> is actually running.
> 
> The oproffs.patch changes the /dev/oprofile/dump to be world writeable. 
> The combination of the two patches should allow normal users to flush 
> sample data.
> 
> -Will

> +do_dump_data()
>  {
> -	# make sure that the daemon is running
> +	# make sure that the daemon is not dead and gone
>  	if test -e "$DIR/lock"; then
>  		OPROFILED_PID=`cat $DIR/lock`
>  		if test ! -d "/proc/$OPROFILED_PID"; then
>  <at>  <at>  -1073,14 +1084,14  <at>  <at> 
>  		# find current time
>  		TMPFILE=`mktemp /tmp/oprofile.XXXXXX` || exit 1
>  		echo 1 > $MOUNT/dump
> -		# loop until there is a file to check
> -		while [ ! -e "$DIR/complete_dump" ]
> -		do
> -			sleep 1;
(Continue reading)

Will Cohen | 3 Mar 2004 18:20
Picon
Favicon

Re: normal user dumps of oprofile data

Philippe Elie wrote:
> On Tue, 02 Mar 2004 at 18:32 +0000, Will Cohen wrote:
> 
> 
>>I have reworked the patch for opcontrol to minimize the stuff done for 
>>the user dump by factoring out code that is common betweeen the root and 
>>normal user. I also removed the unneeded check of "OPROFILE_AVAILABLE" 
>>in the data dump. The comment that check for the presences of the daemon 
>>has be changed to reflect that the check doesn't verify that the daemon 
>>is actually running.
>>
>>The oproffs.patch changes the /dev/oprofile/dump to be world writeable. 
>>The combination of the two patches should allow normal users to flush 
>>sample data.
>>
>>-Will
> 
> 
>>+do_dump_data()
>> {
>>-	# make sure that the daemon is running
>>+	# make sure that the daemon is not dead and gone
>> 	if test -e "$DIR/lock"; then
>> 		OPROFILED_PID=`cat $DIR/lock`
>> 		if test ! -d "/proc/$OPROFILED_PID"; then
>> <at>  <at>  -1073,14 +1084,14  <at>  <at> 
>> 		# find current time
>> 		TMPFILE=`mktemp /tmp/oprofile.XXXXXX` || exit 1
>> 		echo 1 > $MOUNT/dump
>>-		# loop until there is a file to check
(Continue reading)

Philippe Elie | 3 Mar 2004 20:59
Picon

Re: normal user dumps of oprofile data

On Wed, 03 Mar 2004 at 12:20 +0000, Will Cohen wrote:

A last bit and you can commit it.

I'll send the driver part when 2.6.4 will be out (too late for 2.6.4)

> All the Red Hat distro use a back port of the 2.6 mechanism, so I 
> haven't tried it on 2.4.

dump is 0666 access mode on 2.4 it should be ok.

> +do_dump()
> +{
> +	do_dump_data
> +	if test $? -ne 0 -a "$ONLY_DUMP" = "yes"; then
> +		echo "Unable to complete dump of oprofile data" >& 2

in normal condition (daemon not dead in previous run etc.) this can occur
only if profiler is not running so something like is more appropriate:

echo -n "Unable to complete dump of oprofile data: are you sure profiling is on ?" >& 2

Feel free to reword it if there is a better way.

regards,
Phil

-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
(Continue reading)


Gmane