Maynard Johnson | 17 Oct 18:30 2014
Picon

Transition of OProfile maintainership

To the whole OProfile community:

After 6 years of serving in the role of lead maintainer for the OProfile project, it's time for me to pass the
baton to someone else. I'll be retiring in late December, so I would like to have this transition completed
by early December.  Currently, the project is in pretty stable condition, with just a few open bugs and
feature requests. The primary task of the lead maintainer is to ensure submitted patches are
well-reviewed and tested before being committed.  Sometimes those reviews are handed off to other
community members who have the necessary expertise (in particular, architecture-specific patches are
usually reviewed by one of the architecture sub-maintainers on cc).  For arch-independent patches,
reviews from any community members are helpful, but in reality, the lead maintainer ha
 s the final say and so must also review such patches. Another part of the job is to help answer questions that
are posted to the mailing list. Traffic on the list is pretty low these days, !
 so not a l
ot of time is required for this. 

I encourage all who are interested in this opportunity to contact me, and we can discuss it. Feel free to do so
in a private response if you prefer. Thanks.

-Maynard

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
Maynard Johnson | 17 Oct 16:14 2014
Picon

Re: operf jit problem. No anon samples?

On 10/16/2014 06:46 PM, Maurice Marks wrote:
> Sorry. AFAIK It worked at one time:
Hi, Maurice,
In future, please don't remove oprofile-list from cc, since we often rely on the community at large to help
answer questions.
> 
>  http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090629/080117.html
I talked to a colleague of mine here at IBM who's involved with LLVM (although, not specifically with this
area).  He said that this oprofile support was for the "old" JIT, and the new  MCJIT does not yet have the
oprofile support, as far as he knew. He suggests you post a message to the LLVM mailing list -- http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev.
> 
>  and more recently:
> 
> http://rubini.us/2013/03/20/profiling-jitted-ruby-code-with-oprofile/
> 
> Regarding the anonymous mapping. My understanding is that malloc will grab memory from the heap if its
relatively small, but do an mmap call to get larger chunks from the kernel  which are marked anonymous. I
guess I didn't understand the difference from oprofile's point of view.
> If it regarded the heap as part of lli's address space then it should have attributed its pc samples to lli
not ld.so, shouldn't it?
The heap address returned from malloc *is* part of lli's address space.  And if that truly is where LLVM is
putting compiled code, then, as I said earlier, a deep dive debugging session would be needed to figure out
why samples are being attributed to ld.so.  Go ahead and knock yourself out, but you might want to check in
with LLVM developers first to see if it would be worthwhile doing -- i.e., if newer JIT really does not have
support for oprofile yet as my colleague stated.

-Maynard
> If it was outside of lli's space, in anonymous memory then it could be "anon".
> 
> I might have to make up a much larger jit example to test this theory.
(Continue reading)

Maynard Johnson | 16 Oct 23:59 2014
Picon

Re: operf jit problem. No anon samples?

On 10/16/2014 02:09 PM, Maurice Marks wrote:
> Thanks Maynard. I really appreciate your help with this.
> 
>  I made two versions of lli (the llvm interpreter that Jits code, then runs it). One is linked with an llvm
that has oprofile support (/usr/local/bin/lii), one linked with llvm that has no such support built in. (/usr/bin/lli)
> 
> The runs below (on AMD)  show that there are no "anon" samples reported even with no llvm support for oprofile.
> However - there is a clue. This time  I ran opreport with Verbose debug and it reports a discrepancy between
the number
> of samples counted and how many were attributed to DSOs. Plus there are 4 lines of "start_offset is now 0"
which I don't understand but you might.
Regarding the message about the discrepancy, see the opreport man page, under the "--symbols" option, for
an explanation. But if the LLVM JITed code is really being loaded into the heap, then I'm at a loss to explain
why the majority of your sample addresses seem to be in the range of memory where
/lib/x86_64-linux-gnu/ld-2.19.so was loaded. A deep dive debugging session would be needed to compare
the process's memory mappings (/proc/≤pid>/maps) with the sample addresses being collected by operf
(which can be see using "-V convert" option.

As for the "start_offset is now 0", it's just informational goop for debugging purposes, and these are not unusual.
> 
> Looking at your response I have a question about "anonymous memory mappings". When you say that the JVM's
jit puts code in anonymous memory mappings, is there something it does specifically to advise oprofile of
the range of addresses for jit'd code?
See http://oprofile.sourceforge.net/doc/devel/developing.html, "Chapter 1. Developing a new JIT
agent".  It states the following:

  Ensure your virtual machine provides an API that, at minimum, can provide the following information about
dynamically compiled code:
    - Notification when compilation occurs
    - Name of the symbol (i.e., function or class/method, etc.)
(Continue reading)

Maurice Marks | 15 Oct 23:25 2014
Picon

operf jit problem. No anon samples?

I'm trying to profile  non Java Jit code  (using llvm's built in oprofile interface) on Ubuntu 14.04 using operf.
I built the latest git version of oprofile just to be sure I'm up to date.

What I see is that there are lots of jit samples counted, but rather than being attributed to anon or to (hopefully) <pid>.jo they are
counted against one of the .so files.

Like this:
CPU: Intel Haswell microarchitecture, speed 3498 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CLK_UNHALT...|
  samples|      %|
------------------
    20792 100.000 lli
    CPU_CLK_UNHALT...|
      samples|      %|
    ------------------
        20634 99.2401 ld-2.19.so
           86  0.4136 no-vmlinux
           27  0.1299 libLLVMCore.so
           16  0.0770 libLLVMCodeGen.so
           10  0.0481 libLLVMSelectionDAG.so
            4  0.0192 libLLVMSupport.so
            3  0.0144 libLLVMJIT.so
            3  0.0144 libLLVMX86Desc.so
            2  0.0096 libLLVMScalarOpts.so
            2  0.0096 libLLVMX86CodeGen.so
            1  0.0048 lli
            1  0.0048 libc-2.19.so
            1  0.0048 libpthread-2.19.so
            1  0.0048 libLLVMAnalysis.so
            1  0.0048 libLLVMAsmParser.so

~99% of the samples are actually in the Jit'd code, not in ld-2.19.so.

Debugging I see that the opjitagent calls are being made correctly for the jit'd routines, and jitdump files are being generated.
And opjitconv runs and deletes the jitdump files at the end. But because there are no "anon" samples
nothing is reported.

With-Vdebug in operf I see:
....
profiled app ended normally.
operf recording finished.
Total bytes recorded from perf events: 841464
operf-record process returned OK
* * * * WARNING: Profiling rate was throttled back by the kernel * * * *
The number of samples actually recorded is less than expected, but is
probably still statistically valid.  Decreasing the sampling rate is the
best option if you want to avoid throttling.
operf_read: Total bytes received from operf_record process: 841280
Calling _do_jitdump_convert
start time/end time is 1413408126/1413408137
opjitconv: Ending with rc = 2. This code is usually OK, but can be useful for debugging purposes.
JIT dump processing complete.
operf-read process returned OK


I'm probably doing something wrong. But I'm not sure what.
Any ideas?



------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
oprofile-list mailing list
oprofile-list <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list
Carl Love | 9 Oct 20:43 2014
Picon

Re: [PATCH] Fix configure script to error out when no perf_events support is found

Maynard:

I tested on a Power7 with Red Hat Enterprise Linux Server release 6.5
(Santiago).  I tried installing the patch on a fresh git pull.  The
patch applied cleanly.  

I ran autogen.sh and configure to install locally in my directory.  That
all worked as expected.  I did a test run of operf and ocount to make
sure they built correctly.

I then tweeked the minimum version number, variables/directory location
names. in the script.  I tried to make it "appear" as if the needed
version and the expected files were not there to get the script to fail.
It seemed to fail as expected for each case I was testing.  The
resultant error message was consistent with the expected failure.  

I didn't find any formatting issues.

As far as I can tell, it looks good.

                     Carl Love

On Mon, 2014-10-06 at 16:47 -0500, Maynard Johnson wrote:
> Fix configure script to error out when no perf_events support is found
> 
> Prior to release 1.0.0, if the configure script found that the kernel
> version was not new enough to have perf_events support or if the perf_event.h
> header file could not be found, we would just fall back to building only
> the legacy opcontrol-based profiler and would skip building operf and ocount.
> But as of release 1.0.0, the opcontrol-based profiler is no longer included,
> so if operf and ocount cannot be built, we should error out of the configure
> execution, which is what this patch does.
> 
> Signed-off-by: Maynard Johnson <maynardj <at> us.ibm.com>
> ---
>  configure.ac |  149 +++++++++++++++++++++++++++++-----------------------------
>  1 files changed, 74 insertions(+), 75 deletions(-)
> 
> Index: op-master/configure.ac
> ===================================================================
> --- op-master.orig/configure.ac
> +++ op-master/configure.ac
>  <at>  <at>  -57,7 +57,7  <at>  <at>  AC_PROG_CC
>  AC_PROG_CPP
>  AC_PROG_CXX
>  AC_CHECK_PROG(LD,ld,ld,)
> -test "$LD" || AC_ERROR(ld not found)
> +test "$LD" || AC_MSG_ERROR(ld not found)
> 
>  # --with-kernel for cross compilation
>  AC_ARG_WITH(kernel,
>  <at>  <at>  -80,8 +80,6  <at>  <at>  if test "$KERNELDIR" != ""; then
>  		PERF_EVENT_FLAGS=" -I$KERNELDIR/include"
>  		AC_SUBST(PERF_EVENT_FLAGS)
>  		PERF_EVENT_H="$KERNELDIR/include/linux/perf_event.h"
> -	else
> -		echo "$KERNELDIR does not exist."
>  	fi
>  else
>  	PERF_EVENT_H="/usr/include/linux/perf_event.h"
>  <at>  <at>  -92,6 +90,10  <at>  <at>  kernel_may_have_perf_events_support="no"
>  AX_KERNEL_VERSION(2, 6, 31, <=, kernel_may_have_perf_events_support="yes",
>  kernel_has_perf_events_support="no")
> 
> +if test "$kernel_has_perf_events_support" = "no"; then
> +	AC_MSG_ERROR(Your kernel version is older than the required level (2.6.31) to build oprofile.)
> +fi
> +
>  dnl The AX_KERNEL_VERSION macro may return kernel_may_have_perf_events_support="yes",
>  dnl indicating a partial answer.  Some architectures do not implement the Performance
>  dnl Events Kernel Subsystem even with kernel versions > 2.6.31 -- i.e., not even
>  <at>  <at>  -135,66 +137,84  <at>  <at>  else
>  	kernel_has_perf_events_support="no"
>  fi
> 
> -AM_CONDITIONAL(BUILD_FOR_PERF_EVENT, test "$kernel_has_perf_events_support" = "yes")
> -
> -if test "$kernel_has_perf_events_support" = "yes"; then
> -	HAVE_PERF_EVENTS='1'
> -	AC_MSG_CHECKING([whether PERF_RECORD_MISC_GUEST_KERNEL is defined in perf_event.h])
> -	rm -f test-for-PERF_GUEST
> -	AC_LANG_CONFTEST(
> -		[AC_LANG_PROGRAM([[#include <linux/perf_event.h>]],
> -			[[unsigned int pr_guest_kern = PERF_RECORD_MISC_GUEST_KERNEL;
> -			unsigned int pr_guest_user = PERF_RECORD_MISC_GUEST_USER;]])
> -		])
> -	$CC conftest.$ac_ext $CFLAGS $LDFLAGS $LIBS $PERF_EVENT_FLAGS -o test-for-PERF_GUEST  > /dev/null 2>&1
> -	if test -f test-for-PERF_GUEST; then
> -		echo "yes"
> -		HAVE_PERF_GUEST_MACROS='1'
> +if test "$kernel_has_perf_events_support" != "yes"; then
> +	if test "$KERNELDIR" != ""; then
> +		echo "ERROR: You requested to build oprofile with '--with-kernel=$KERNELDIR',"
> +		if ! test -d $KERNELDIR; then
> +			echo "but that directory does not exist."
> +		elif test "$PERF_EVENT_H_EXISTS" != "yes"; then
> +			echo "but headers were not accessible at the given location."
> +			echo "Be sure you have run the following command from within your kernel source tree:"
> +			echo "     make headers_install INSTALL_HDR_PATH=<kernel-hdrs-install-dir>"
> +			echo "Then pass <kernel-hdrs-install-dir> to oprofile's '--with-kernel' configure option."
> +		else
> +			echo "but your kernel does not appear to have the necessary support to run oprofile."
> +		fi
>  	else
> -		echo "no"
> -		HAVE_PERF_GUEST_MACROS='0'
> +		if test "$PERF_EVENT_H_EXISTS" != "yes"; then
> +			echo "Error: perf_event.h not found.  Either install the kernel headers package or"
> +			echo "use the --with-kernel option."
> +		else
> +			echo "Error: Your kernel does not appear to have the necessary support to run oprofile."
> +		fi
>  	fi
> -	AC_DEFINE_UNQUOTED(HAVE_PERF_GUEST_MACROS, $HAVE_PERF_GUEST_MACROS,
[PERF_RECORD_MISC_GUEST_KERNEL is defined in perf_event.h])
> -	rm -f test-for-PERF_GUEST*
> +	AC_MSG_ERROR(Unable to build oprofile. Exiting.)
> +fi
> 
> -	AC_MSG_CHECKING([whether precise_ip is defined in perf_event.h])
> -	rm -f test-for-precise-ip
> -	AC_LANG_CONFTEST(
> -		[AC_LANG_PROGRAM([[#include <linux/perf_event.h>]],
> -			[[struct perf_event_attr attr;
> -			attr.precise_ip = 2;]])
> -		])
> -	$CC conftest.$ac_ext $CFLAGS $LDFLAGS $LIBS $PERF_EVENT_FLAGS -o test-for-precise-ip  > /dev/null 2>&1
> -	if test -f test-for-precise-ip; then
> -		echo "yes"
> -		HAVE_PERF_PRECISE_IP='1'
> -	else
> -		echo "no"
> -		HAVE_PERF_PRECISE_IP='0'
> -	fi
> -	AC_DEFINE_UNQUOTED(HAVE_PERF_PRECISE_IP, $HAVE_PERF_PRECISE_IP, [precise_ip is defined in perf_event.h])
> -	rm -f test-for-precise-ip*
> +AM_CONDITIONAL(BUILD_FOR_PERF_EVENT, test "$kernel_has_perf_events_support" = "yes")
> 
> +
> +HAVE_PERF_EVENTS='1'
> +AC_MSG_CHECKING([whether PERF_RECORD_MISC_GUEST_KERNEL is defined in perf_event.h])
> +rm -f test-for-PERF_GUEST
> +AC_LANG_CONFTEST(
> +	[AC_LANG_PROGRAM([[#include <linux/perf_event.h>]],
> +		[[unsigned int pr_guest_kern = PERF_RECORD_MISC_GUEST_KERNEL;
> +		unsigned int pr_guest_user = PERF_RECORD_MISC_GUEST_USER;]])
> +	])
> +$CC conftest.$ac_ext $CFLAGS $LDFLAGS $LIBS $PERF_EVENT_FLAGS -o test-for-PERF_GUEST  > /dev/null 2>&1
> +if test -f test-for-PERF_GUEST; then
> +	echo "yes"
> +	HAVE_PERF_GUEST_MACROS='1'
>  else
> -	HAVE_PERF_EVENTS='0'
> -	AC_MSG_RESULT([No perf_events support available; falling back to legacy oprofile])
> +	echo "no"
> +	HAVE_PERF_GUEST_MACROS='0'
>  fi
> +AC_DEFINE_UNQUOTED(HAVE_PERF_GUEST_MACROS, $HAVE_PERF_GUEST_MACROS,
[PERF_RECORD_MISC_GUEST_KERNEL is defined in perf_event.h])
> +rm -f test-for-PERF_GUEST*
> +
> +AC_MSG_CHECKING([whether precise_ip is defined in perf_event.h])
> +rm -f test-for-precise-ip
> +AC_LANG_CONFTEST(
> +	[AC_LANG_PROGRAM([[#include <linux/perf_event.h>]],
> +		[[struct perf_event_attr attr;
> +		attr.precise_ip = 2;]])
> +	])
> +$CC conftest.$ac_ext $CFLAGS $LDFLAGS $LIBS $PERF_EVENT_FLAGS -o test-for-precise-ip  > /dev/null 2>&1
> +if test -f test-for-precise-ip; then
> +	echo "yes"
> +	HAVE_PERF_PRECISE_IP='1'
> +else
> +	echo "no"
> +	HAVE_PERF_PRECISE_IP='0'
> +fi
> +AC_DEFINE_UNQUOTED(HAVE_PERF_PRECISE_IP, $HAVE_PERF_PRECISE_IP, [precise_ip is defined in perf_event.h])
> +rm -f test-for-precise-ip*
> +
> 
>  AC_DEFINE_UNQUOTED(HAVE_PERF_EVENTS, $HAVE_PERF_EVENTS, [Kernel support for perf_events exists])
>  AC_CANONICAL_HOST
> -if test "$HAVE_PERF_EVENTS" = "1"; then
> -	PFM_LIB=
> -	if test "$host_cpu" = "powerpc64le" -o "$host_cpu" = "powerpc64"; then
> -		AC_CHECK_HEADER(perfmon/pfmlib.h,,[AC_MSG_ERROR([pfmlib.h not found; may be provided by
libpfm devel or papi devel package])])
> -		AC_CHECK_LIB(pfm,pfm_get_os_event_encoding, HAVE_LIBPFM3='0'; HAVE_LIBPFM='1', [
> -			AC_CHECK_LIB(pfm, pfm_get_event_name, HAVE_LIBPFM3='1'; HAVE_LIBPFM='1',
> -			[AC_MSG_ERROR([libpfm not found; may be provided by libpfm devel or papi devel package])])])
> -		PFM_LIB="-lpfm"
> -		AC_DEFINE_UNQUOTED(HAVE_LIBPFM3, $HAVE_LIBPFM3, [Define to 1 if using libpfm3; 0 if using newer libpfm])
> -		AC_DEFINE_UNQUOTED(HAVE_LIBPFM, $HAVE_LIBPFM, [Define to 1 if libpfm is available])
> -	fi
> -	AC_SUBST(PFM_LIB)
> +PFM_LIB=
> +if test "$host_cpu" = "powerpc64le" -o "$host_cpu" = "powerpc64"; then
> +	AC_CHECK_HEADER(perfmon/pfmlib.h,,[AC_MSG_ERROR([pfmlib.h not found; may be provided by libpfm
devel or papi devel package])])
> +	AC_CHECK_LIB(pfm,pfm_get_os_event_encoding, HAVE_LIBPFM3='0'; HAVE_LIBPFM='1', [
> +		AC_CHECK_LIB(pfm, pfm_get_event_name, HAVE_LIBPFM3='1'; HAVE_LIBPFM='1',
> +		[AC_MSG_ERROR([libpfm not found; may be provided by libpfm devel or papi devel package])])])
> +	PFM_LIB="-lpfm"
> +	AC_DEFINE_UNQUOTED(HAVE_LIBPFM3, $HAVE_LIBPFM3, [Define to 1 if using libpfm3; 0 if using newer libpfm])
> +	AC_DEFINE_UNQUOTED(HAVE_LIBPFM, $HAVE_LIBPFM, [Define to 1 if libpfm is available])
>  fi
> +AC_SUBST(PFM_LIB)
> 
>  AC_ARG_WITH(java,
>  [  --with-java=java-home        Path to Java home directory (default is "no"; "yes" will use /usr as Java home)],
>  <at>  <at>  -425,25 +445,3  <at>  <at>  elif test "`getent passwd oprofile 2>/de
>  		echo "         The 'oprofile' group must be the default group for the 'oprofile' user."
>  	fi
>  fi
> -
> -if  test "$PERF_EVENT_H_EXISTS" != "yes" && test "$kernel_may_have_perf_events_support" = "yes"; then
> -	echo "Warning: perf_event.h not found.  Either install the kernel headers package or"
> -	echo "use the --with-kernel option if you want the non-root, single application"
> -	echo "profiling support provided by operf."
> -	echo ""
> -	echo "If you run 'make' now, only the legacy ocontrol-based profiler will be built."
> -fi
> -
> -if test "$KERNELDIR" != "" && test "$kernel_has_perf_events_support" != "yes"; then
> -	if ! test -d $KERNELDIR; then
> -		echo "WARNING: You passed '--with-kernel=$KERNELDIR', but $KERNELDIR"
> -		echo "does not exist."
> -	else
> -		echo "Warning: You requested to build with the '--with-kernel' option, but your kernel"
> -		echo "headers were not accessible at the given location. Be sure you have run the following"
> -		echo "command from within your kernel source tree:"
> -		echo "     make headers_install INSTALL_HDR_PATH=<kernel-hdrs-install-dir>"
> -		echo "Then pass <kernel-hdrs-install-dir> to oprofile's '--with-kernel' configure option."
> -	fi
> -	echo ""
> -fi
> 
> 
> ------------------------------------------------------------------------------
> Slashdot TV.  Videos for Nerds.  Stuff that Matters.
> http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
> _______________________________________________
> oprofile-list mailing list
> oprofile-list <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oprofile-list
> 

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
Will Deacon | 8 Oct 12:32 2014

Re: Oprofile cross compilation for arm

On Tue, Oct 07, 2014 at 02:59:58PM +0100, Sabra Gargouri wrote:
> Hi,
> Thank you Maynard for your support!
> I have first executed make  command from within my kernel source tree:
>      make headers_install INSTALL_HDR_PATH=<kernel-hdrs-install-dir> and then pass
<kernel-hdrs-install-dir> to oprofile's '--with-kernel' configure option.
> 
> ./configure --with-kernel-support --with-kernel=/home/sdk-test/Downloads/kernel-headers
--host=i686-linux --build=arm-cortex-linux-gnueabi
--with-extra-libs=/home/sdk-test/Downloads/popt-1.16/.libs/:/home/sdk-test/Downloads/binutils-2.24/libiberty/:/home/sdk-test/Downloads/binutils-2.24/bfd/
--with-binutils=/home/sdk-test/Downloads/binutils-2.24/ --with-extra-includes=/home/sdk-test/Downloads/popt-1.16/:/home/sdk-test/Downloads/binutils-2.24/bfd/

I've not done cross-compilation of oprofile for a while now -- ARMv7 CPUs
tend to be quick enough to build things natively without much pain.

Are we sure that the problem being reported is due to the cross-compilation
step? Does a native compilation appear to fix the problem?

Sorry for the vague questions, but there's really not a lot to go on here.

Will

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
Maynard Johnson | 7 Oct 22:27 2014
Picon

[PATCH] Java profiling: opagent: fwrite_unlocked failed

Note: The patch below has already been committed upstream.

-----------------------------------------------------------------

Java profiling: opagent: fwrite_unlocked failed

Certain Java Virtual Machines do not provide full code information in
the JNI callbacks to oprofile's java agent library; in particular,
the code size argument may be set to zero even if the code address
is non-null. The libopagent/opagent.c:op_write_native_code function
was not properly handling this case and would print the message
   opagent: fwrite_unlocked failed
and return a -1, causing the caller to display a message like
   Error: op_write_native_code(): Success

This patch fixes that issue.

Signed-off-by: Maynard Johnson <maynardj <at> us.ibm.com>
---
 libopagent/opagent.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/libopagent/opagent.c b/libopagent/opagent.c
index 59b7bb6..f280880 100644
--- a/libopagent/opagent.c
+++ b/libopagent/opagent.c
 <at>  <at>  -376,7 +376,9  <at>  <at>  again:
 	    fwrite_unlocked(symbol_name, sz_symb_name, 1, dumpfile)) {
 		size_t expected_sz, sz;
 		expected_sz = sz = 0;
-		if (code) {
+		// Note: Some JVMs always pass size=zero, so it's not enough just to check
+		// if 'code' is non-null.
+		if (code && size) {
 			sz = fwrite_unlocked(code, size, 1, dumpfile);
 			expected_sz++;
 		}
 <at>  <at>  -390,7 +392,7  <at>  <at>  again:
 		funlockfile(dumpfile);
 		flock(dumpfd, LOCK_UN);
 		if (sz != expected_sz) {
-			printf("opagent: fwrite_unlocked failed");
+			printf("opagent: fwrite_unlocked failed\n");
 			return -1;
 		}
 		return 0;
--

-- 
1.7.1

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
Sabra Gargouri | 17 Sep 12:06 2014
Picon

Idle time in Oprofile

Hi,
In the context of my current activity witch intent to analyse Oprofile features, I would like to check if Oprofile is considering idle time in its profiling results. So for that purpose, I have run Oprofile without running any other application (the system is idle).  I used Oprofile in the timer mode.
First, I have run Oprofile on SH4 platform, and as following results show, 93% used by "poll_idle". 

15655    93.8437  vmlinux                  poll_idle
238       1.4267  vmlinux                  arch_local_irq_restore
28        0.1678  oprofiled                do_match
24        0.1439  oprofiled                pop_buffer_value
24        0.1439  vmlinux                  copy_page
18        0.1079  oprofiled                opd_process_samples
18        0.1079  oprofiled                sfile_find
16        0.0959  bash                     shell_getc
16        0.0959  libc-2.14.1.so           __gconv_transform_ascii_internal
16        0.0959  oprofiled                get_file
14        0.0839  libc-2.14.1.so           mbrtowc
14        0.0839  oprofiled                sfile_log_sample_count
13        0.0779  oprofiled                odb_update_node_with_offset
12        0.0719  libc-2.14.1.so           _int_malloc
9         0.0540  oprofiled                find_kernel_image
9         0.0540  vmlinux                  __copy_user
9         0.0540  vmlinux                  link_path_walk
9         0.0540  vmlinux                  nfs_permission
8         0.0480  ld-2.14.1.so             _dl_relocate_object
8         0.0480  vmlinux                  tcp_ack

I have also run Oprofile on ARM Cortex-a9 (SMP) whithout any application running (system is idle) and got 99.44% used by "fpa_get" function and there's nothing related to "idle".  

samples  %        app name                 symbol name
4148     99.4486  vmlinux                  fpa_get
2         0.0480  vmlinux                  print_cfs_rq
1         0.0240  bash                     hash_search
1         0.0240  bash                     parse_matched_pair
1         0.0240  gawk                     check_special
1         0.0240  ld-2.14.1.so             _dl_lookup_symbol_x
1         0.0240  libc-2.14.1.so           __default_morecore
1         0.0240  libc-2.14.1.so           __gconv_transform_ascii_internal
1         0.0240  libc-2.14.1.so           malloc_consolidate
1         0.0240  libc-2.14.1.so           strcpy
1         0.0240  vmlinux                  create_new_namespaces
1         0.0240  vmlinux                  dup_fd
1         0.0240  vmlinux                  fuse_copy_args
1         0.0240  vmlinux                  mnt_alloc_group_id
1         0.0240  vmlinux                  print_cpu
1         0.0240  vmlinux                  ptrace_request
1         0.0240  vmlinux                  seq_list_start_head
1         0.0240  vmlinux                  seq_write
1         0.0240  vmlinux                  usleep_range
1         0.0240  vmlinux                  vga_arbiter_notify_clients.part.11
1         0.0240  vmlinux                  vga_get
1         0.0240  vmlinux                  vm_insert_page
1         0.0240  vmlinux                  write_wb_reg

When searching in the official Oprofile documentation, I have found the following explanation
 " Your kernel is likely to support halting the processor when a CPU is idle. As the typical hardware events like CPU_CLK_UNHALTED do not count when the CPU is halted, the kernel profile will not reflect the actual amount of time spent idle.You can change this behaviour by booting with the idle=poll option, which uses a different idle routine. This will appear as poll_idle() in your kernel profile".
So I have rebooted my kernel with adding  idle=poll option, but I have not noticed any diffrence between the previous ones.

4707     99.5137  vmlinux                  fpa_get
2         0.0423  vmlinux                  attribute_container_unregister
2         0.0423  vmlinux                  print_cfs_rq
1         0.0211  bash                     execute_command_internal
1         0.0211  bash                     shell_getc
1         0.0211  libc-2.14.1.so           __gconv_transform_ascii_internal
1         0.0211  libc-2.14.1.so           sigprocmask
1         0.0211  libdl-2.14.1.so          call_gmon_start
1         0.0211  vmlinux                  __getnstimeofday
1         0.0211  vmlinux                  bdi_min_pause.isra.19
1         0.0211  vmlinux                  cgroup_scan_tasks
1         0.0211  vmlinux                  dev_alert
1         0.0211  vmlinux                  ext2_block_to_path.isra.19
1         0.0211  vmlinux                  ext4_ext_remove_space
1         0.0211  vmlinux                  iterate_supers
1         0.0211  vmlinux                  lg_local_lock
1         0.0211  vmlinux                  pipe_to_file
1         0.0211  vmlinux                  ptrace_request
1         0.0211  vmlinux                  register_filesystem
1         0.0211  vmlinux                  seq_write
1         0.0211  vmlinux                  sys_prctl
1         0.0211  vmlinux                  ubi_start_leb_change

Why does "idle_poll" does not appear in the "ARM" case? does it relate to architectural reasons?
Could we say that Oprofile is not intended to determine idle time ? or it's related to the configuration of the oprofile daemon?

BR








------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
oprofile-list mailing list
oprofile-list <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list
Narayanan, Krishnaprasad | 15 Sep 14:48 2014
Picon
Picon

Clarifications regarding the event output

Hallo all,

 

I am using Oprofile version 0.9.9 and use Operf command with –separate-cpu to obtain information on the following events: CPU_CLK_UNHALTED, INST_MISSES, INST_RETIRED, LLC_MISSES, LLC_REFS and BR_MISS_PRED_RETIRED with sampling rate as 1000000. The sampling rate here refers to the count flag that is specified for every event.

 

I run the Operf command in the background and for every 1 sec, I obtain the output from Opreport which is dumped to an output file.

 

Can I seek answers for the following questions?

a)      Are the values for the events that are generated every 1 sec is a cumulative sum from the previous timestamp? For ex, at timestamp T1, if there are 100 instructions retired and at timestamp T1+1, there are 200 instructions retired, can I know the event value (output) at timestamp  T1+1? Is it 200 or 100?

b)      Can I also know for which of these events, the difference between the current and previous output is applicable?

c)      Besides, can I kindly know the methodology to compute the total count of instructions?

 

Regards,

Krishnaprasad

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
oprofile-list mailing list
oprofile-list <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list
Maynard Johnson | 12 Sep 18:31 2014
Picon

Announcement: OProfile 1.0.0

We are pleased to announce the general availability of OProfile 1.0.0.
You can download this release at:
	http://oprofile.sourceforge.net/download/

-Maynard Johnson

-----------------------------------------------------------------------

OProfile 1.0.0 has been released. A major change in this release
is the removal of the legacy opcontrol-based profiler. The legacy
profiling tool has been deprecated since release 0.9.8 when operf
was first introduced. The following components and processor types
that were dependent on opcontrol have also been removed:

   - GUI component (i.e., oprof_start)
   - IBS events removed from AMD processors
   - All Alpha processors, except for EV67 (which *is* supported by operf/ocount)
   - Architecture avr32
   - Architecture ia64
   - Processor model IBM Cell
   - Processor model P.A. Semi PA6T
   - RTC (real time clock mode)

OProfile users still running on any of these affected systems or
needing any of the removed components listed above should not upgrade
to OProfile release 1.0. Alternatively, you can obtain all of the new
features, enhancements, and bug fixes described below and still have
access to opcontrol by doing the following:

	git clone git://git.code.sf.net/p/oprofile/oprofile oprofile
	cd oprofile
	git checkout PRE_RELEASE_1_0

and then build/install as usual.

More information about OProfile can be seen at
    http://oprofile.sf.net

Incompatibilities with previous release
---------------------------------------

- Sample data collected with previous releases of OProfile are incompatible
  with release 1.0.
- ophelp schema: Major version changed for removal of unit mask 'extra'
  attribute and addition of unit mask 'name'.

New features
------------

- Enhance ocount to support millisecond time intervals
- Obtain kernel symbols from /proc/kallsyms if no vmlinux file specified

- New/updated Processor Support
    * (New) Freescale e6500 
    * (New) Freescale e500mc
    * (New) Intel Silvermont
    * (New) ARMv7 Krait
    * (New) APM X-Gene (ARMv8)
    * (New) Intel Broadwell
    * (New) ARMv8 Cortex A57
    * (New) ARMv8 Cortex A53
    * Added little endian support for IBM POWER8
    * Update events for IBM POWER8
    * Added edge-detect events for IBM POWER7
    * Update events for Intel Haswell

Bug fixes
---------

Filed bug reports:
-------------------------------------------------------------------------
|  BUG ID   |  Summary 
|-----------|------------------------------------------------------------
|   236     | opreport schema: Fix count field maxOccurs (changed to
|           | 'unbounded')
|   245     | Fix compile error on ppc/uClibc platform: 'AT_BASE_PLATFORM'
|           | undeclared'
|   248     | Duplicate event specs passed to ocount show up twice in
|           | output
|   252     | Fix operf/ocount default unit mask selection
|   253     | ocount: print the unit mask, kernel and user modes if
|           | specified for the event
|   254     | ophelp schema is not included in installed files
|   255     | Remove unused 'extra' attribute from ophelp schema
|   256     | opreport from 'operf --callgraph' profile shows false
|           | recursive calls
|   257     | Fix handling of default named unit masks longer than 11 chars
|   259     | Print unit mask name where applicable in ophelp XML output
|   260     | Fix profiling of multi-threaded apps when using "--pid"
|           | option
|   262     | Fix operf/opreport kernel throttling detection
|   263     | Fix sample attribution problem when using multiple events
|   266     | exclude/include files option doesn't work for opannotate -a
-------------------------------------------------------------------------

Other bug fixes and improvements without a filed report (e.g., posted to the list):
---------------
   - Fix behavior and documentation for '--threshold' option
   - Remove hard-coded timeout for JIT dump conversion
   - Update Alpha EV67 CPU support and remove all other Alpha CPU support
   - operf main process improperly killing conversion process
   - Fix up S390 support to work with operf/ocount
   - Link ocount with librt for clock_gettime only when needed
   - Fix 'Invalid argument' running 'opcontrol --start --callgraph=<n>' in
     Timer mode
   - Allow root to remove old jitdump files from /tmp/.oprofile/jitdump
   - Remove opreport warnings for /no-vmlinux, [vdso], [hypervisor_bucket]
     not found
   - Fix event codes for marked architected events (IBM ppc64)
   - Make operf/ocount detect invalid timer mode from opcontrol
   - Reduce overhead of operf waiting for profiled app to end
   - Fix "Unable to open cpu_type file for reading" for IBM POWER7+
   - Allow all native events for IBM POWER8 in POWER7 compat mode
   - Fix spurious "backtraces skipped due to no file mapping" log entries
   - Fix the units for the reported CPU frequency

Known problems and limitations
-------------------------
- When using operf to profile multiple events, the absolute number of
  events recorded may be substantially fewer than expected. This can be
  due to known bug in the Linux kernel's Performance Events Subsystem that
  was fixed sometime between Linux kernel version 3.1 and 3.5.

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Andi Kleen | 11 Sep 01:07 2014

[PATCH] Update the Silvermont event files

From: Andi Kleen <ak <at> linux.intel.com>

On further review the silvermont event files had a lot of problems.
I regenerated them completely. This fixes the PEBS events, and
fixes a range of others.

The test suite passes without problems.

I realize it's a hard to review patch, but I think
it's the best option for 1.0.

Signed-off-by: Andi Kleen <ak <at> linux.intel.com>
---
 events/i386/silvermont/events     |  24 +++----
 events/i386/silvermont/unit_masks | 146 +++++++++++++++++++++-----------------
 2 files changed, 93 insertions(+), 77 deletions(-)

diff --git a/events/i386/silvermont/events b/events/i386/silvermont/events
index 077cc0a..434538f 100644
--- a/events/i386/silvermont/events
+++ b/events/i386/silvermont/events
 <at>  <at>  -7,20 +7,18  <at>  <at> 
 # lowered in many cases without ill effect.
 #
 include:i386/arch_perfmon
-event:0x32 counters:0,1 um:l2_prefetcher_throttle minimum:200003 name:l2_prefetcher_throttle :
-event:0x3e counters:0,1 um:one minimum:200003 name:l2_prefetcher_pref_stream_alloc :
-event:0x50 counters:0,1 um:zero minimum:200003
name:l2_prefetch_pend_streams_pref_stream_pend_set :
-event:0x86 counters:0,1 um:nip_stall minimum:200003 name:nip_stall :
-event:0x87 counters:0,1 um:decode_stall minimum:200003 name:decode_stall :
-event:0x96 counters:0,1 um:uip_match minimum:200003 name:uip_match :
+event:0x03 counters:0,1 um:rehabq minimum:200003 name:rehabq :
+event:0x04 counters:0,1 um:mem_uops_retired minimum:200003 name:mem_uops_retired :
+event:0x05 counters:0,1 um:page_walks minimum:200003 name:page_walks :
+event:0x30 counters:0,1 um:zero minimum:200003 name:l2_reject_xq_all :
+event:0x31 counters:0,1 um:zero minimum:200003 name:core_reject_l2q_all :
+event:0x80 counters:0,1 um:icache minimum:200003 name:icache :
 event:0xc2 counters:0,1 um:uops_retired minimum:2000003 name:uops_retired :
-event:0xc3 counters:0,1 um:x10 minimum:200003 name:machine_clears_live_lock_breaker :
-event:0xc4 counters:0,1 um:br_inst_retired minimum:2000003 name:br_inst_retired :
+event:0xc3 counters:0,1 um:machine_clears minimum:200003 name:machine_clears :
+event:0xc4 counters:0,1 um:br_inst_retired minimum:200003 name:br_inst_retired :
 event:0xc5 counters:0,1 um:br_misp_retired minimum:200003 name:br_misp_retired :
 event:0xca counters:0,1 um:no_alloc_cycles minimum:200003 name:no_alloc_cycles :
 event:0xcb counters:0,1 um:rs_full_stall minimum:200003 name:rs_full_stall :
-event:0xcc counters:0,1 um:rs_dispatch_stall minimum:200003 name:rs_dispatch_stall :
-event:0xe6 counters:0,1 um:baclears minimum:2000003 name:baclears :
-event:0xe7 counters:0,1 um:x02 minimum:200003 name:ms_decoded_early_exit :
-event:0xe8 counters:0,1 um:one minimum:200003 name:btclears_all :
-event:0xe9 counters:0,1 um:decode_restriction minimum:200003 name:decode_restriction :
+event:0xcd counters:0,1 um:one minimum:2000003 name:cycles_div_busy_all :
+event:0xe6 counters:0,1 um:baclears minimum:200003 name:baclears :
+event:0xe7 counters:0,1 um:one minimum:200003 name:ms_decoded_ms_entry :
diff --git a/events/i386/silvermont/unit_masks b/events/i386/silvermont/unit_masks
index 6309282..c0dac26 100644
--- a/events/i386/silvermont/unit_masks
+++ b/events/i386/silvermont/unit_masks
 <at>  <at>  -4,68 +4,86  <at>  <at> 
 # See http://ark.intel.com/ for help in identifying Silvermont based CPUs
 #
 include:i386/arch_perfmon
-name:x02 type:mandatory default:0x2
-	0x2 No unit mask
-name:x10 type:mandatory default:0x10
-	0x10 No unit mask
-name:l2_prefetcher_throttle type:exclusive default:0x2
-	0x2 extra:edge conservative Counts the number of cycles the L2 prefetcher spends in throttling mode
-	0x1 extra:edge aggressive Counts the number of cycles the L2 prefetcher spends in throttling mode
-name:nip_stall type:exclusive default:0x3f
-	0x3f extra: all Counts the number of cycles the NIP stalls.
-	0x1 extra: pfb_full Counts the number of cycles the NIP stalls and the PFBs are full.   This DOES NOT inlude
PFB throttler cases.
-	0x2 extra: itlb_miss Counts the number of cycles the NIP stalls and there is an outstanding ITLB miss.
This is a cummulative count of cycles the NIP stalled for all ITLB misses.
-	0x8 extra: pfb_throttler Counts the number of cycles the NIP stalls, the throttler is engaged, and the
PFBs appear full.
-	0x10 extra: do_snoop Counts the number of cycles the NIP stalls because of a SMC compliance snoop to the
MEC is required.
-	0x20 extra: misc_other Counts the number of cycles the NIP stalls due to NUKE, Stop Front End, Inserted flows.
-	0x1e extra: pfb_ready Counts the number of cycles the NIP stalls when the PFBs are not full and the
decoders are able to process bytes.  Does not count PFB_FULL nor MISC_OTHER stall cycles.
-name:decode_stall type:exclusive default:0x1
-	0x1 extra: pfb_empty Counts the number of cycles decoder is stalled because the PFB is empty, this count
is useful to see if the decoder is receiving the bytes from the front end. This event together with the
DECODE_STALL.IQ_FULL may be used to narrow down on the bottleneck.
-	0x2 extra: iq_full Counts the number of cycles decoder is stalled because the IQ is full, this count is
useful to see if the decoder is delivering the decoded uops. This event together with the
DECODE_STALL.PFB_EMPTY may be used to narrow down on the bottleneck.
-name:uip_match type:exclusive default:0x1
-	0x1 extra: first_uip This event is used for counting the number of times a specific micro IP address was decoded
-	0x2 extra: second_uip This event is used for counting the number of times a specific micro IP address was decoded
-name:uops_retired type:exclusive default:0x2
-	0x2 extra: x87 This event counts the number of micro-ops retired that used X87 hardware.
-	0x4 extra: mul This event counts the number of micro-ops retired that used MUL hardware.
-	0x8 extra: div This event counts the number of micro-ops retired that used DIV hardware.
-	0x1 extra: ms_cyles Counts the number of uops that are from the complex flows issued by the
micro-sequencer (MS).  This includes uops from flows due to faults, assists, and inserted flows.
-name:br_inst_retired type:exclusive default:0x1
-	0x1 extra: remove_jcc REMOVE_JCC counts the number of branch instructions retired but removes taken
and not taken conditional branches (JCC).  Branch prediction predicts the branch target and enables the
processor to begin executing instructions long before the branch true execution path is known. All
branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address
not only based on the EIP of the branch but also based on the execution path through which execution reached
this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct
calls and jumps, indirect calls and jumps, returns.
-	0x2 extra: remove_rel_call REMOVE_REL_CALL counts the number of branch instructions retired but
removes near relative CALL.  Branch prediction predicts the branch target and enables the processor to
begin executing instructions long before the branch true execution path is known. All branches utilize
the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on
the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU
can efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x4 extra: remove_ind_call REMOVE_IND_CALL counts the number of branch instructions retired but
removes near indirect CALL. Branch prediction predicts the branch target and enables the processor to
begin executing instructions long before the branch true execution path is known. All branches utilize
the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on
the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU
can efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x8 extra: remove_ret REMOVE_RET counts the number of branch instructions retired but removes near
RET.  Branch prediction predicts the branch target and enables the processor to begin executing
instructions long before the branch true execution path is known. All branches utilize the branch
prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of
the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x10 extra: remove_ind_jmp REMOVE_IND_JMP counts the number of branch instructions retired but
removes near indirect JMP.  Branch prediction predicts the branch target and enables the processor to
begin executing instructions long before the branch true execution path is known. All branches utilize
the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on
the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU
can efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x20 extra: remove_rel_jmp REMOVE_REL_JMP counts the number of branch instructions retired but
removes near relative JMP.  Branch prediction predicts the branch target and enables the processor to
begin executing instructions long before the branch true execution path is known. All branches utilize
the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on
the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU
can efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x40 extra: remove_far REMOVE_FAR counts the number of branch instructions retired but removes all far
branches.  Branch prediction predicts the branch target and enables the processor to begin executing
instructions long before the branch true execution path is known. All branches utilize the branch
prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of
the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x80 extra: remove_not_taken_jcc REMOVE_NOT_TAKEN_JCC counts the number of branch instructions
retired but removes taken conditional branches (JCC).  Branch prediction predicts the branch target and
enables the processor to begin executing instructions long before the branch true execution path is
known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the
target address not only based on the EIP of the branch but also based on the execution path through which
execution reached this EIP. The BPU can efficiently predict the following branch types: conditional
branches, direct calls and jumps, indirect calls and jumps, returns.
-name:br_misp_retired type:exclusive default:0x1
-	0x1 extra: remove_jcc REMOVE_JCC counts the number of mispredicted branch instructions retired but
removes taken and not taken conditional branches (JCC).  This event counts the number of retired branch
instructions that were mispredicted by the processor, categorized by type. A branch misprediction
occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the
misprediction is discovered, all the instructions executed in the wrong (speculative) path must be
discarded, and the processor must start fetching from the correct path.
-	0x4 extra: remove_ind_call REMOVE_IND_CALL Counts the number of mispredicted branch instructions
retired but removes near indirect CALL.  This event counts the number of retired branch instructions that
were mispredicted by the processor, categorized by type. A branch misprediction occurs when the
processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
-	0x8 extra: remove_ret REMOVE_RET Counts the number of mispredicted branch instructions retired but
removes near RET.  This event counts the number of retired branch instructions that were mispredicted by
the processor, categorized by type. A branch misprediction occurs when the processor predicts that the
branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the
instructions executed in the wrong (speculative) path must be discarded, and the processor must start
fetching from the correct path.
-	0x10 extra: remove_ind_jmp REMOVE_IND_JMP counts the number of mispredicted branch instructions
retired but removes near indirect JMP.  This event counts the number of retired branch instructions that
were mispredicted by the processor, categorized by type. A branch misprediction occurs when the
processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
-	0x80 extra: remove_not_taken_jcc REMOVE_NOT_TAKEN_JCC counts the number of mispredicted branch
instructions retired but removes taken conditional branches (JCC).  This event counts the number of
retired branch instructions that were mispredicted by the processor, categorized by type. A branch
misprediction occurs when the processor predicts that the branch would be taken, but it is not, or
vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong
(speculative) path must be discarded, and the processor must start fetching from the correct path.
+name:rehabq type:exclusive default:0x1
+	0x1 extra: ld_block_st_forward This event counts the number of retired loads that were prohibited from
receiving forwarded data from the store because of address mismatch.
+	0x1 extra:pebs ld_block_st_forward_pebs This event counts the number of retired loads that were
prohibited from receiving forwarded data from the store because of address mismatch.
+	0x2 extra: ld_block_std_notready This event counts the cases where a forward was technically
possible, but did not occur because the store data was not available at the right time
+	0x4 extra: st_splits This event counts the number of retire stores that experienced cache line boundary splits
+	0x8 extra: ld_splits This event counts the number of retire loads that experienced cache line boundary splits
+	0x8 extra:pebs ld_splits_pebs This event counts the number of retire loads that experienced cache line
boundary splits
+	0x10 extra: lock This event counts the number of retired memory operations with lock semantics. These
are either implicit locked instructions such as the XCHG instruction or instructions with an explicit
LOCK prefix (0xF0).
+	0x20 extra: sta_full This event counts the number of retired stores that are delayed because there is not
a store address buffer available.
+	0x40 extra: any_ld This event counts the number of load uops reissued from Rehabq
+	0x80 extra: any_st This event counts the number of store uops reissued from Rehabq
+name:mem_uops_retired type:exclusive default:0x1
+	0x1 extra: l1_miss_loads This event counts the number of load ops retired that miss in L1 Data cache. Note
that prefetch misses will not be counted.
+	0x2 extra: l2_hit_loads This event counts the number of load ops retired that hit in the L2
+	0x2 extra:pebs l2_hit_loads_pebs This event counts the number of load ops retired that hit in the L2
+	0x4 extra: l2_miss_loads This event counts the number of load ops retired that miss in the L2
+	0x4 extra:pebs l2_miss_loads_pebs This event counts the number of load ops retired that miss in the L2
+	0x8 extra: dtlb_miss_loads This event counts the number of load ops retired that had DTLB miss.
+	0x8 extra:pebs dtlb_miss_loads_pebs This event counts the number of load ops retired that had DTLB miss.
+	0x10 extra: utlb_miss This event counts the number of load ops retired that had UTLB miss.
+	0x20 extra: hitm This event counts the number of load ops retired that got data from the other core or from
the other module.
+	0x20 extra:pebs hitm_pebs This event counts the number of load ops retired that got data from the other
core or from the other module.
+	0x40 extra: all_loads This event counts the number of load ops retired
+	0x80 extra: all_stores This event counts the number of store ops retired
+name:page_walks type:exclusive default:0x1
+	0x1 extra:edge d_side_walks This event counts when a data (D) page walk is completed or started.  Since a
page walk implies a TLB miss, the number of TLB misses can be counted by counting the number of pagewalks.
+	0x1 extra: d_side_cycles This event counts every cycle when a D-side (walks due to a load) page walk is in
progress. Page walk duration divided by number of page walks is the average duration of page-walks.
+	0x2 extra:edge i_side_walks This event counts when an instruction (I) page walk is completed or
started.  Since a page walk implies a TLB miss, the number of TLB misses can be counted by counting the number
of pagewalks.
+	0x2 extra: i_side_cycles This event counts every cycle when a I-side (walks due to an instruction fetch)
page walk is in progress. Page walk duration divided by number of page walks is the average duration of page-walks.
+	0x3 extra:edge walks This event counts when a data (D) page walk or an instruction (I) page walk is
completed or started.  Since a page walk implies a TLB miss, the number of TLB misses can be counted by
counting the number of pagewalks.
+	0x3 extra: cycles This event counts every cycle when a data (D) page walk or instruction (I) page walk is in
progress.  Since a pagewalk implies a TLB miss, the approximate cost of a TLB miss can be determined from
this event.
+name:icache type:exclusive default:0x3
+	0x3 extra: accesses This event counts all instruction fetches, including uncacheable fetches.
+	0x1 extra: hit This event counts all instruction fetches from the instruction cache.
+	0x2 extra: misses This event counts all instruction fetches that miss the Instruction cache or produce
memory requests. This includes uncacheable fetches. An instruction fetch miss is counted only once and
not once for every cycle it is outstanding.
+name:uops_retired type:exclusive default:0x10
+	0x10 extra: all This event counts the number of micro-ops retired. The processor decodes complex macro
instructions into a sequence of simpler micro-ops. Most instructions are composed of one or two
micro-ops. Some instructions are decoded into longer sequences such as repeat instructions, floating
point transcendental instructions, and assists. In some cases micro-op sequences are fused or whole
instructions are fused into one micro-op. See other UOPS_RETIRED events for differentiating retired
fused and non-fused micro-ops.
+	0x1 extra: ms This event counts the number of micro-ops retired that were supplied from MSROM.
+name:machine_clears type:exclusive default:0x8
+	0x8 extra: all Machine clears happen when something happens in the machine that causes the hardware to
need to take special care to get the right answer. When such a condition is signaled on an instruction, the
front end of the machine is notified that it must restart, so no more instructions will be decoded from the
current path.  All instructions "older" than this one will be allowed to finish.  This instruction and all
"younger" instructions must be cleared, since they must not be allowed to complete.  Essentially, the
hardware waits until the problematic instruction is the oldest instruction in the machine.  This means
all older instructions are retired, and all pending stores (from older instructions) are completed. 
Then the new path of instructions from the front end are allowed t
 o start into the machine.  There are many conditions that might cause a machine clear (including the receipt
of an interrupt, or a trap or a fault).  All those conditions (including but not limited 
 to MACHINE_CLEARS.MEMORY_ORDERING, MACHINE_CLEARS.SMC, and MACHINE_CLEARS.FP_ASSIST) are
captured in the ANY event. In addition, some conditions can be specifically counted (i.e. SMC,
MEMORY_ORDERING, FP_ASSIST).  However, the sum of SMC, MEMORY_ORDERING, and FP_ASSIST machine clears
will not necessarily equal the number of ANY.
+	0x1 extra: smc This event counts the number of times that a program writes to a code section.
Self-modifying code causes a severe penalty in all Intel? architecture processors.
+	0x2 extra: memory_ordering This event counts the number of times that pipeline was cleared due to memory
ordering issues.
+	0x4 extra: fp_assist This event counts the number of times that pipeline stalled due to FP operations
needing assists.
+name:br_inst_retired type:exclusive default:0x7e
+	0x7e extra: jcc JCC counts the number of conditional branch (JCC) instructions retired. Branch
prediction predicts the branch target and enables the processor to begin executing instructions long
before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0x7e extra:pebs jcc_pebs JCC counts the number of conditional branch (JCC) instructions retired.
Branch prediction predicts the branch target and enables the processor to begin executing instructions
long before the branch true execution path is known. All branches utilize the branch prediction unit
(BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also
based on the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfe extra: taken_jcc TAKEN_JCC counts the number of taken conditional branch (JCC) instructions
retired. Branch prediction predicts the branch target and enables the processor to begin executing
instructions long before the branch true execution path is known. All branches utilize the branch
prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of
the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
+	0xfe extra:pebs taken_jcc_pebs TAKEN_JCC counts the number of taken conditional branch (JCC)
instructions retired. Branch prediction predicts the branch target and enables the processor to begin
executing instructions long before the branch true execution path is known. All branches utilize the
branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the
EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
+	0xf9 extra: call CALL counts the number of near CALL branch instructions retired.  Branch prediction
predicts the branch target and enables the processor to begin executing instructions long before the
branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xf9 extra:pebs call_pebs CALL counts the number of near CALL branch instructions retired.  Branch
prediction predicts the branch target and enables the processor to begin executing instructions long
before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfd extra: rel_call REL_CALL counts the number of near relative CALL branch instructions retired. 
Branch prediction predicts the branch target and enables the processor to begin executing instructions
long before the branch true execution path is known. All branches utilize the branch prediction unit
(BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also
based on the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfd extra:pebs rel_call_pebs REL_CALL counts the number of near relative CALL branch instructions
retired.  Branch prediction predicts the branch target and enables the processor to begin executing
instructions long before the branch true execution path is known. All branches utilize the branch
prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of
the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
+	0xfb extra: ind_call IND_CALL counts the number of near indirect CALL branch instructions retired. 
Branch prediction predicts the branch target and enables the processor to begin executing instructions
long before the branch true execution path is known. All branches utilize the branch prediction unit
(BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also
based on the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfb extra:pebs ind_call_pebs IND_CALL counts the number of near indirect CALL branch instructions
retired.  Branch prediction predicts the branch target and enables the processor to begin executing
instructions long before the branch true execution path is known. All branches utilize the branch
prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of
the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
+	0xf7 extra: return RETURN counts the number of near RET branch instructions retired.  Branch prediction
predicts the branch target and enables the processor to begin executing instructions long before the
branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xf7 extra:pebs return_pebs RETURN counts the number of near RET branch instructions retired.  Branch
prediction predicts the branch target and enables the processor to begin executing instructions long
before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xeb extra: non_return_ind NON_RETURN_IND counts the number of near indirect JMP and near indirect
CALL branch instructions retired.  Branch prediction predicts the branch target and enables the
processor to begin executing instructions long before the branch true execution path is known. All
branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address
not only based on the EIP of the branch but also based on the execution path through which execution reached
this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct
calls and jumps, indirect calls and jumps, returns.
+	0xeb extra:pebs non_return_ind_pebs NON_RETURN_IND counts the number of near indirect JMP and near
indirect CALL branch instructions retired.  Branch prediction predicts the branch target and enables
the processor to begin executing instructions long before the branch true execution path is known. All
branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address
not only based on the EIP of the branch but also based on the execution path through which execution reached
this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct
calls and jumps, indirect calls and jumps, returns.
+	0xbf extra: far_branch FAR counts the number of far branch instructions retired.  Branch prediction
predicts the branch target and enables the processor to begin executing instructions long before the
branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xbf extra:pebs far_branch_pebs FAR counts the number of far branch instructions retired.  Branch
prediction predicts the branch target and enables the processor to begin executing instructions long
before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+name:br_misp_retired type:exclusive default:0x7e
+	0x7e extra: jcc JCC counts the number of mispredicted conditional branches (JCC) instructions
retired.  This event counts the number of retired branch instructions that were mispredicted by the
processor, categorized by type. A branch misprediction occurs when the processor predicts that the
branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the
instructions executed in the wrong (speculative) path must be discarded, and the processor must start
fetching from the correct path.
+	0x7e extra:pebs jcc_pebs JCC counts the number of mispredicted conditional branches (JCC)
instructions retired.  This event counts the number of retired branch instructions that were
mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor
predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
+	0xfe extra: taken_jcc TAKEN_JCC counts the number of mispredicted taken conditional branch (JCC)
instructions retired.  This event counts the number of retired branch instructions that were
mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor
predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
+	0xfe extra:pebs taken_jcc_pebs TAKEN_JCC counts the number of mispredicted taken conditional branch
(JCC) instructions retired.  This event counts the number of retired branch instructions that were
mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor
predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
+	0xfb extra: ind_call IND_CALL counts the number of mispredicted near indirect CALL branch
instructions retired.  This event counts the number of retired branch instructions that were
mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor
predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
+	0xfb extra:pebs ind_call_pebs IND_CALL counts the number of mispredicted near indirect CALL branch
instructions retired.  This event counts the number of retired branch instructions that were
mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor
predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
+	0xf7 extra: return RETURN counts the number of mispredicted near RET branch instructions retired.  This
event counts the number of retired branch instructions that were mispredicted by the processor,
categorized by type. A branch misprediction occurs when the processor predicts that the branch would be
taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed
in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0xf7 extra:pebs return_pebs RETURN counts the number of mispredicted near RET branch instructions
retired.  This event counts the number of retired branch instructions that were mispredicted by the
processor, categorized by type. A branch misprediction occurs when the processor predicts that the
branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the
instructions executed in the wrong (speculative) path must be discarded, and the processor must start
fetching from the correct path.
+	0xeb extra: non_return_ind NON_RETURN_IND counts the number of mispredicted near indirect JMP and
near indirect CALL branch instructions retired.  This event counts the number of retired branch
instructions that were mispredicted by the processor, categorized by type. A branch misprediction
occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the
misprediction is discovered, all the instructions executed in the wrong (speculative) path must be
discarded, and the processor must start fetching from the correct path.
+	0xeb extra:pebs non_return_ind_pebs NON_RETURN_IND counts the number of mispredicted near indirect
JMP and near indirect CALL branch instructions retired.  This event counts the number of retired branch
instructions that were mispredicted by the processor, categorized by type. A branch misprediction
occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the
misprediction is discovered, all the instructions executed in the wrong (speculative) path must be
discarded, and the processor must start fetching from the correct path.
 name:no_alloc_cycles type:exclusive default:0x3f
-	0x3f extra:inv all Counts the number of cycles that uops are allocated (inverse of NO_ALLOC_CYCLES.ALL)
-	0x2 extra: sd_buffer_full Counts the number of cycles when no uops are allocated and the store data
buffer is full.
-	0x4 extra: mispredicts Counts the number of cycles when no uops are allocated and the alloc pipe is
stalled waiting for a mispredicted jump to retire.  After the misprediction is detected, the front end
will start immediately but the allocate pipe stalls until the mispredicted
-	0x8 extra: scoreboard Counts the number of cycles when no uops are allocated and a microcode IQ-based
scoreboard stall is active. This includes stalls due to both the retirement scoreboard (at-ret) and
micro-Jcc execution scoreboard (at-jeu).  Does not count cycles when the MS
-	0x10 extra: iq_empty Counts the number of cycles when no uops are allocated and the IQ is empty.  Will
assert immediately after a mispredict and partially overlap with MISPREDICTS sub event.
-name:rs_full_stall type:exclusive default:0x2
-	0x2 extra: iec_port0 Counts the number of cycles the Alloc pipeline is stalled because IEC RS for port 0 is full.
-	0x4 extra: iec_port1 Counts the number of cycles the Alloc pipeline is stalled because IEC RS for port 1 is full.
-	0x8 extra: fpc_port0 Counts the number of cycles the Alloc pipeline is stalled because FPC RS for port 0 is full.
-	0x10 extra: fpc_port1 Counts the number of cycles the Alloc pipeline is stalled because FPC RS for port 1
is full.
-name:rs_dispatch_stall type:exclusive default:0x1
-	0x1 extra: iec0_rs *COUNTER BROKEN - NO FIX* Counts cycles when no uops were disptached from port 0 of IEC
RS while the RS had valid ops left to dispatch
-	0x2 extra: iec1_rs *COUNTER BROKEN - NO FIX* Counts cycles when no uops were disptached from port 1 of IEC
RS while the RS had valid ops left to dispatch
-	0x4 extra: fpc0_rs Counts cycles when no uops were disptached from port 0 of FPC RS while the RS had valid
ops left to dispatch
-	0x8 extra: fpc1_rs Counts cycles when no uops were disptached from port 1 of FPC RS while the RS had valid
ops left to dispatch
-	0x10 extra: mec_rs Counts cycles when no uops were dispatched from the MEC RS or rehab queue while valid
ops were left to dispatch
-name:baclears type:exclusive default:0x2
-	0x2 extra: indirect Counts the number indirect branch baclears
-	0x4 extra: uncond Counts the number unconditional branch baclears
-	0x1e extra: no_corner_case sum of submasks [4:1].  Does not count special case baclears due to things
like parity errors, bogus branches, and pd$ issues.
-name:decode_restriction type:exclusive default:0x1
-	0x1 extra: pdcache_wrong Counts the number of times a decode restriction reduced the decode throughput
due to wrong instruction length prediction
-	0x2 extra: all_3cycle_resteers Counts the number of times a decode restriction reduced the decode
throughput because of all 3 cycle resteer conditions.  Mainly PDCACHE_WRONG and MS_ENTRY cases.
+	0x3f extra: all The NO_ALLOC_CYCLES.ALL event counts the number of cycles when the front-end does not
provide any instructions to be allocated for any reason. This event indicates the cycles where an
allocation stalls occurs, and no UOPS are allocated in that cycle.
+	0x1 extra: rob_full Counts the number of cycles when no uops are allocated and the ROB is full (less than 2
entries available)
+	0x20 extra: rat_stall Counts the number of cycles when no uops are allocated and a RATstall is asserted.
+	0x50 extra: not_delivered The NO_ALLOC_CYCLES.NOT_DELIVERED event is used to measure front-end
inefficiencies, i.e. when front-end of the machine is not delivering micro-ops to the back-end and the
back-end is not stalled. This event can be used to identify if the machine is truly front-end bound.  When
this event occurs, it is an indication that the front-end of the machine is operating at less than its
theoretical peak performance.  Background: We can think of the processor pipeline as being divided into 2
broader parts: Front-end and Back-end. Front-end is responsible for fetching the instruction,
decoding into micro-ops (uops) in machine understandable format and putting them into a micro-op queue
to be consumed by back end. The back-end then takes these micro-ops, allocates the requ
 ired resources.  When all resources are ready, micro-ops are executed. If the back-end is not ready to
accept micro-ops from the front-end, then we do not want to count these as front-end bottleneck
 s.  However, whenever we have bottlenecks in the back-end, we will have allocation unit stalls and
eventually forcing the front-end to wait until the back-end is ready to receive more UOPS. This event
counts the cycles only when back-end is requesting more uops and front-end is not able to provide them.
Some examples of conditions that cause front-end efficiencies are: Icache misses, ITLB misses, and
decoder restrictions that limit the the front-end bandwidth.
+name:rs_full_stall type:exclusive default:0x1f
+	0x1f extra: all Counts the number of cycles the Alloc pipeline is stalled when any one of the RSs (IEC, FPC
and MEC) is full. This event is a superset of all the individual RS stall event counts.
+	0x1 extra: mec Counts the number of cycles and allocation pipeline is stalled and is waiting for a free MEC
reservation station entry.  The cycles should be appropriately counted in case of the cracked ops e.g. In
case of a cracked load-op, the load portion is sent to M
+name:baclears type:exclusive default:0x1
+	0x1 extra: all The BACLEARS event counts the number of times the front end is resteered, mainly when the
Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address
Calculator at the front end.  The BACLEARS.ANY event counts the number of baclears for any type of branch.
+	0x8 extra: return The BACLEARS event counts the number of times the front end is resteered, mainly when
the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch
Address Calculator at the front end.  The BACLEARS.RETURN event counts the number of RETURN baclears.
+	0x10 extra: cond The BACLEARS event counts the number of times the front end is resteered, mainly when the
Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address
Calculator at the front end.  The BACLEARS.COND event counts the number of JCC (Jump on Condtional Code) baclears.
--

-- 
1.9.3

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

Gmane