Michael Petlan | 1 Apr 22:13 2015
Picon

[PATCH] fixed default unit masks for Haswells

Since some of the default unit masks for Haswell events cannot be
uniquely specified by numbers, the defaults have had to be replaced
by the named ones. When the affected events are used on Haswell without
specifying unit masks after applying this patch, the default masks
are chosen correctly.

Signed-off-by: Michael Petlan <mpetlan <at> redhat.com>
--
diff --git a/events/i386/haswell/unit_masks b/events/i386/haswell/unit_masks
index 60c2a61..9b4be33 100644
--- a/events/i386/haswell/unit_masks
+++ b/events/i386/haswell/unit_masks
 <at>  <at>  -32,7 +32,7  <at>  <at>  name:dtlb_load_misses type:exclusive default:0x1
        0x80 extra: pde_cache_miss DTLB demand load misses with low part of linear-to-physical address
translation missed
        0xe extra: walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page
walk that completes of any page size.
        0x60 extra: stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks
-name:uops_issued type:exclusive default:0x1
+name:uops_issued type:exclusive default:any
        0x1 extra: any This event counts the number of uops issued by the Front-end of the pipeline to the Back-end.
This event is counted at the allocation stage and will count both retired and non-retired uops.
        0x10 extra: flags_merge Number of flags-merge uops being allocated. Such uops considered perf
sensitive; added by GSR u-arch.
        0x20 extra: slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it
has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not.
 <at>  <at>  -56,7 +56,7  <at>  <at>  name:l2_rqsts type:exclusive default:0x21
        0xe7 extra: all_demand_references Demand requests to L2 cache
        0x3f extra: miss All requests that miss L2 cache
        0xff extra: references All L2 requests
(Continue reading)

Simon Arlott | 28 Mar 17:50 2015
Picon

[PATCH] build-id files shouldn't have to be symlinks

find_debuginfo_file_by_build requires the file in /usr/lib/debug/.build-id/
to be a symlink, but this is not the case on Ubuntu 14.10 where the -dbg
packages put normal files directly into /usr/lib/debug/.build-id/
---
Index: oprofile-1.0.0/libutil++/bfd_support.cpp
===================================================================
--- oprofile-1.0.0.orig/libutil++/bfd_support.cpp
+++ oprofile-1.0.0/libutil++/bfd_support.cpp
 <at>  <at>  -92,8 +92,8  <at>  <at>  static bool find_debuginfo_file_by_build
 {
 	size_t build_id_fname_size = strlen (DEBUGDIR) + (sizeof "/.build-id/" - 1) + 1
 			+ (2 * build_id_size) + (sizeof ".debug" - 1) + 1;
-	char * buildid_symlink = (char *) xmalloc(build_id_fname_size);
-	char * sptr = buildid_symlink;
+	char * build_id_fname = (char *) xmalloc(build_id_fname_size);
+	char * sptr = build_id_fname;
 	unsigned char * bptr = buildid;
 	bool retval = false;
 	size_t build_id_segment_len = strlen("/.build-id/");
 <at>  <at>  -110,14 +110,12  <at>  <at>  static bool find_debuginfo_file_by_build

 	strcpy(sptr, ".debug");

-	if (access (buildid_symlink, F_OK) == 0) {
-		debug_filename = op_realpath (buildid_symlink);
-		if (debug_filename.compare(buildid_symlink)) {
-			retval = true;
-			cverb << vbfd << "Using build-id symlink" << endl;
-		}
+	if (access(build_id_fname, R_OK) == 0) {
(Continue reading)

Michael Petlan | 27 Mar 17:48 2015
Picon

[PATCH] DATA_CACHE_MISSES unit mask testsuite fix for AMD generic

Resending the patch again after fixing it..
--
The default unit mask for the event DATA_CACHE_MISSES  on AMD family 15h
CPUs is 0x1 instead of 0x0, which is common for other AMD CPUs. When the
"athlon" test event table is used on an AMD family 15h machine, the test
fails. A slightly modified event table for AMD family 15h has been added
in order to fix the issue.

Signed-off-by: Michael Petlan <mpetlan <at> redhat.com>
--
diff --git a/testsuite/lib/op_events.exp b/testsuite/lib/op_events.exp
index b71811c..70acace 100644
--- a/testsuite/lib/op_events.exp
+++ b/testsuite/lib/op_events.exp
 <at>  <at>  -210,6 +210,18  <at>  <at>  set op_event_table(athlon)                          \
        }                                           \
     }

+set op_event_table(amd_generic)                          \
+    {                                               \
+       {                                           \
+            {0 CPU_CLK_UNHALTED  0 500000}         \
+            {1 DATA_CACHE_ACCESSES 0 500000}       \
+       }                                           \
+       {                                           \
+            {0 DATA_CACHE_ACCESSES   0 500000}     \
+            {1 DATA_CACHE_MISSES     1 500000}     \
+       }                                           \
+    }
+
(Continue reading)

Michael Petlan | 26 Mar 23:31 2015
Picon

[Fwd: Re: opcontrol does not create results]

Just resending this mail to the mailing list in order to have it
complete there.

-------- Forwarded Message --------
> From: Michael Petlan <mpetlan <at> redhat.com>
> To: "Lentes, Bernd" <bernd.lentes <at> helmholtz-muenchen.de>
> Subject: Re: opcontrol does not create results
> Date: Thu, 26 Mar 2015 18:13:29 +0100
> 
> On Thu, 2015-03-26 at 17:01 +0100, Lentes, Bernd wrote:
> > Hi Michael,
> > 
> > i managed to install the debuginfo package:
> > 
> > idcc-devel:~ # rpm -qa|grep -i kernel
> > kernel-smp-2.6.16.60-0.103.1
> > kernel-debug-2.6.16.60-0.103.1
> > kernel-syms-2.6.16.60-0.103.1
> > kernel-smp-debuginfo-2.6.16.60-0.103.1
> > kernel-source-2.6.16.60-0.103.1
> > 
> > Is this the right one ?
> > 
> 
> I am not very familiar with SUSE, but I suppose, that the kernel-smp-debuginfo is the right package.
> If the kernel-smp is the kernel package, then kernel-smp-debuginfo is a package with its debuginfo.
> It goes like that in RHEL, except that the kernel package is called kernel and then the debuginfo is
> called kernel-debuginfo.
> 
> > 
(Continue reading)

Lentes, Bernd | 26 Mar 19:30 2015
Picon

RE: opcontrol does not create results

Micael wrote

> -----Original Message-----
> From: Michael Petlan [mailto:mpetlan <at> redhat.com]
> Sent: Thursday, March 26, 2015 6:13 PM
> To: Lentes, Bernd
> Subject: Re: opcontrol does not create results
>
> On Thu, 2015-03-26 at 17:01 +0100, Lentes, Bernd wrote:
> > Hi Michael,
> >
> > i managed to install the debuginfo package:
> >
> > idcc-devel:~ # rpm -qa|grep -i kernel
> > kernel-smp-2.6.16.60-0.103.1
> > kernel-debug-2.6.16.60-0.103.1
> > kernel-syms-2.6.16.60-0.103.1
> > kernel-smp-debuginfo-2.6.16.60-0.103.1
> > kernel-source-2.6.16.60-0.103.1
> >
> > Is this the right one ?
> >
>
> I am not very familiar with SUSE, but I suppose, that the kernel-smp-
> debuginfo is the right package.
> If the kernel-smp is the kernel package, then kernel-smp-debuginfo is a
> package with its debuginfo.
> It goes like that in RHEL, except that the kernel package is called kernel
> and then the debuginfo is called kernel-debuginfo.
>
(Continue reading)

Michael Petlan | 25 Mar 15:23 2015
Picon

[PATCH] DATA_CACHE_MISSES unit mask testsuite fix for AMD family15h

The default unit mask for the event DATA_CACHE_MISSES  on AMD family 15h
CPUs is 0x1 instead of 0x0, which is common for other AMD CPUs. When the
"athlon" test event table is used on an AMD family 15h machine, the test
fails. A slightly modified event table for AMD family 15h has been added
in order to fix the issue.

Signed-off-by: Michael Petlan <mpetlan <at> redhat.com>
--

diff --git a/testsuite/lib/op_events.exp b/testsuite/lib/op_events.exp
index b71811c..c2e5f58 100644
--- a/testsuite/lib/op_events.exp
+++ b/testsuite/lib/op_events.exp
 <at>  <at>  -210,6 +210,18  <at>  <at>  set op_event_table(athlon)                          \
        }                                           \
     }

+set op_event_table(amd_family15h)                          \
+    {                                               \
+       {                                           \
+            {0 CPU_CLK_UNHALTED  0 500000}         \
+            {1 DATA_CACHE_ACCESSES 0 500000}       \
+       }                                           \
+       {                                           \
+            {0 DATA_CACHE_ACCESSES   0 500000}     \
+            {1 DATA_CACHE_MISSES     1 500000}     \
+       }                                           \
+    }
+
 set op_event_table(p4)                              \                                                                                                                                                           
(Continue reading)

Henry May | 23 Mar 13:24 2015
Picon

opreport on time window of operf data

Is there a way to make opreport only process events that occurred in a subset window of the total operf collection time?  I have a long running workload that occasionally sees a spike in L2 misses.  I can determine when this happens because throughput drops.  I'd like to profile the workload, but only consider events that fall within the window of decreased throughput when generating the opreport.


Henry May
IBM InfoSphere Streams Performance
hjmay <at> us.ibm.com
720-342-8873
Tie: 963-8873
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
oprofile-list mailing list
oprofile-list <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list
William Cohen | 10 Mar 16:05 2015
Picon

[PATCH] Ensure that umask is set if the extra bits (edge, inv, cmask) are used

When testing ocount on some of the Intel processor it was discovered
that that the umask not not being set for events that specified the
the extra bits.  Below is an example of the problem on an Intel Ivy
Bridge processor with the event code missing the 0x03 unit masks for
the events:

$  ocount --verbose -e int_misc:recovery_cycles -e int_misc:recovery_stalls_count ls
Final event code is 140000d
Final event code is 144000d
Number of events passed is 2
Exec args are: ls
telling child to start app
parent says start app /usr/bin/ls
calling perf_event_open for pid 240d
perf_event_open returning fd 9
perf_event_open returning fd a
perf counter setup complete
app 240d is running
going into waitpid on monitored app 240d
app process ended normally.
Reading counter data for event int_misc
Reading counter data for event int_misc

Events were actively counted for 1070382 nanoseconds.
Event counts (actual) for /usr/bin/ls:
	Event                                 Count                    % time counted
	int_misc:recovery_cycles              0                        100.00
	int_misc:recovery_stalls_count        0                        100.00

With this patch the umasks are included and the example executes correctly:

$  ocount --verbose -e int_misc:recovery_cycles -e int_misc:recovery_stalls_count ls
Final event code is 140030d
Final event code is 144030d
Number of events passed is 2
Exec args are: ls
telling child to start app
calling perf_event_open for pid 72e1
parent says start app /usr/bin/ls
perf_event_open returning fd 9
perf_event_open returning fd a
perf counter setup complete
app 72e1 is running
going into waitpid on monitored app 72e1
app process ended normally.
Reading counter data for event int_misc
Reading counter data for event int_misc

Events were actively counted for 1216948 nanoseconds.
Event counts (actual) for /usr/bin/ls:
	Event                                 Count                    % time counted
	int_misc:recovery_cycles              69,730                   100.00
	int_misc:recovery_stalls_count        14,800                   100.00

Signed-off-by: William Cohen <wcohen <at> redhat.com>
---
 libop/op_events.c | 3 +++
 libop/op_events.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/libop/op_events.c b/libop/op_events.c
index 99266c6..2badc8e 100644
--- a/libop/op_events.c
+++ b/libop/op_events.c
 <at>  <at>  -238,6 +238,9  <at>  <at>  static void parse_um_entry(struct op_described_um * entry, char const * line)
 	if (strisprefix(c, "extra:")) {
 		c += 6;
 		entry->extra = parse_extra(c);
+		/* include the regular umask if there are real extra bits */
+		if (entry->extra != EXTRA_NONE)
+			entry->extra |= (entry->value & UMASK_MASK) << UMASK_SHIFT;
 		/* named mask */
 		c = skip_nonws(c);
 		c = skip_ws(c);
diff --git a/libop/op_events.h b/libop/op_events.h
index ec345e5..f09c830 100644
--- a/libop/op_events.h
+++ b/libop/op_events.h
 <at>  <at>  -20,6 +20,9  <at>  <at>  extern "C" {
 #include "op_types.h"
 #include "op_list.h"

+#define UMASK_SHIFT 8
+#define UMASK_MASK 0xff
+
 #define EXTRA_EDGE (1U << 18)
 #define EXTRA_MIN_VAL EXTRA_EDGE

--

-- 
2.1.0

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
William Cohen | 9 Mar 22:31 2015
Picon

Improper configuration of events using cmask/edge

There appears to be some cases where ocount/operf are not setting up
the event configuration incorrectly.  On Ivy Bridge machine have the following event:

event:0xd counters:cpuid um:int_misc minimum:2000000 name:int_misc : Instruction decoder events

And unit masks:

name:int_misc type:exclusive default:recovery_cycles
	0x3 extra:cmask=1 recovery_cycles Number of cycles waiting for the checkpoints in Resource Allocation
Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist
is needed like SSE exception, memory disambiguation, etc...)
	0x3 extra:cmask=1,edge recovery_stalls_count Number of occurences waiting for the checkpoints in
Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g.
whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...)

ophelp provides the following for the int_misc event:

int_misc: (counter: all)
	Instruction decoder events (min count: 2000000)
	Unit masks (default recovery_cycles)
	----------
	0x03: (name=recovery_cycles) Number of cycles waiting for the checkpoints in Resource 
              Allocation Table (RAT) to be recovered after Nuke due to all other cases except 
              JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory 
              disambiguation, etc...)
	0x03: (name=recovery_stalls_count) Number of occurences waiting for the checkpoints in 
              Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases 
              except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory 
              disambiguation, etc...)

Based on the events and unit_mask info running both of these events
should translate to following perf stat command and that works:

$ perf stat -e cpu/event=0xd,umask=0x3,cmask=1/ -e cpu/event=0xd,umask=0x3,cmask=1,edge=1/ ls
 Performance counter stats for 'ls':

            72,477      cpu/event=0xd,umask=0x3,cmask=1/                                   
            15,430      cpu/event=0xd,umask=0x3,cmask=1,edge=1/                                   

       0.002052602 seconds time elapsed

However, the ocount is returning 0 for the counts of both events:

$  ocount  -e int_misc:recovery_cycles -e int_misc:recovery_stalls_count ls

Events were actively counted for 1317554 nanoseconds.
Event counts (actual) for /usr/bin/ls:
	Event                                 Count                    % time counted
	int_misc:recovery_cycles              0                        100.00
	int_misc:recovery_stalls_count        0                        100.00

According to the " Intel® 64 and IA-32 Architectures Software
Developer’s Manual Volume 3 (3A, 3B & 3C): System Programming Guide"
have the following bit fields

bits	      int_misc:recovery_cycles  int_isc:recovery_stalls_count
31-24 cmask   0x01			0x01
23    invert  0				0
22    enable
20    int
19    pin
18    edge    0				1
17    os      0				0
16    usr     0				0
15-8  umask   0x03			0x03
7-0   event   0x0d			0x0d
expect events 0x0100030d		0x0104030d

Using gdb on perf and running the "perf stat" command above setting
breakpoint on perf_evsel__open_per_thread see that events numbers match up:

(gdb) print/x evsel->attr.config
$4 = 0x100030d
,,,
(gdb) print/x evsel->attr.config
$5 = 0x104030d

The verbose output shows the events set incorrectly (zero unit-mask and enable bit being set):

$  ocount --verbose -e int_misc:recovery_cycles -e int_misc:recovery_stalls_count ls
Final event code is 140000d
Final event code is 144000d
Number of events passed is 2
Exec args are: ls 
telling child to start app
parent says start app /usr/bin/ls
calling perf_event_open for pid 240d
perf_event_open returning fd 9
perf_event_open returning fd a
perf counter setup complete
app 240d is running
going into waitpid on monitored app 240d
app process ended normally.
Reading counter data for event int_misc
Reading counter data for event int_misc

Events were actively counted for 1070382 nanoseconds.
Event counts (actual) for /usr/bin/ls:
	Event                                 Count                    % time counted
	int_misc:recovery_cycles              0                        100.00
	int_misc:recovery_stalls_count        0                        100.00

Digging through the code find that ocount gets the information from ophelp.
It looks like ophelp is returning the wrong value for the --extra-mask option:

$ /usr/local/bin/ophelp --extra-mask int_misc:0:recovery_cycles
20971520
$ /usr/local/bin/ophelp --extra-mask int_misc:0:recovery_stalls_count
21233664

(gdb) print/x 20971520
$1 = 0x1400000
(gdb) print/x 21233664
$2 = 0x1440000

Andi, any ideas of where things are going wrong in ophelp?

-Will

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
oprofile-list mailing list
oprofile-list <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list
Lentes, Bernd | 6 Mar 16:32 2015
Picon

opcontrol does not create results

Hi ML,

i'm new to oprofile. I have a VM (KVM) running a SLES 10 SP4 64bit, 2.6.16. System has always between 5 and 15
percent si in top (SoftIrq), but I don't know why. I hope oprofile can help me finding the culprit. I setup
oprofile as it follows:

idcc-devel:~ # opcontrol --status
Daemon running: pid 21153
Event 0: BUS_DATA_RCV:1000:0:1:1
Event 0: HW_INT_RX:1000:0:1:1
Separate options: library kernel
vmlinux file: /boot/vmlinux-2.6.16.60-0.103.1-smp.gz
Image filter: none
Call-graph depth: 0

(events are not choosen especially, currently just for testing purpose).

This is my log:

idcc-devel:~ # cat /var/lib/oprofile/oprofiled.log
oprofiled started Fri Mar  6 15:39:05 2015
kernel pointer size: 8

Fri Mar  6 15:49:05 2015

Nr. sample dumps: 1
Nr. non-backtrace samples: 0
Nr. kernel samples: 0
Nr. lost samples (no kernel/user): 0
Nr. lost kernel samples: 0
Nr. incomplete code structs: 0
Nr. samples lost due to sample file open failure: 0
Nr. samples lost due to no permanent mapping: 0
Nr. event lost due to buffer overflow: 0
Nr. samples lost due to no mapping: 0
Nr. backtraces skipped due to no file mapping: 0
Nr. samples lost due to no mm: 0
Nr. samples lost cpu buffer overflow: 0
Nr. samples received: 0
Nr. backtrace aborted: 0

Fri Mar  6 15:59:05 2015

Nr. sample dumps: 3
Nr. non-backtrace samples: 0
Nr. kernel samples: 0
Nr. lost samples (no kernel/user): 0
Nr. lost kernel samples: 0
Nr. incomplete code structs: 0
Nr. samples lost due to sample file open failure: 0
Nr. samples lost due to no permanent mapping: 0
Nr. event lost due to buffer overflow: 0
Nr. samples lost due to no mapping: 0
Nr. backtraces skipped due to no file mapping: 0
Nr. samples lost due to no mm: 0
Nr. samples lost cpu buffer overflow: 0
Nr. samples received: 0
Nr. backtrace aborted: 0

Fri Mar  6 16:09:05 2015

Nr. sample dumps: 5
Nr. non-backtrace samples: 0
Nr. kernel samples: 0
Nr. lost samples (no kernel/user): 0
Nr. lost kernel samples: 0
Nr. incomplete code structs: 0
Nr. samples lost due to sample file open failure: 0
Nr. samples lost due to no permanent mapping: 0
Nr. event lost due to buffer overflow: 0
Nr. samples lost due to no mapping: 0
Nr. backtraces skipped due to no file mapping: 0
Nr. samples lost due to no mm: 0
Nr. samples lost cpu buffer overflow: 0
Nr. samples received: 0
Nr. backtrace aborted: 0

no results. I read that oprofile is able to profile virtual machines using KVM, just vm's in VMWare don't
work. I use KVM.

idcc-devel:~ # l /var/lib/oprofile/samples/current/
total 8
drwxr-xr-x 2 root root 4096 Mar  6 15:39 ./
drwxr-xr-x 6 root root 4096 Mar  6 15:39 ../

How can I get profile data ?

I'm using oprofile 0.9.1-15.

Bernd

--
Bernd Lentes

Systemadministration
Institut für Entwicklungsgenetik
Gebäude 35.34 - Raum 208
HelmholtzZentrum münchen
bernd.lentes <at> helmholtz-muenchen.de
phone: +49 89 3187 1241
fax:   +49 89 3187 2294
http://www.helmholtz-muenchen.de/idg

Je suis Charlie

Helmholtz Zentrum München
Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
Ingolstädter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe
Geschäftsführer: Prof. Dr. Günther Wess, Dr. Nikolaus Blum, Dr. Alfons Enhsen
Registergericht: Amtsgericht München HRB 6466
USt-IdNr: DE 129521671

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
Ankur Sharma | 5 Mar 10:17 2015
Picon

oprofile help

Hi,

Is their an alternative to legacy opcontrol —start and —stop in present version? I want to count
hardware events for small portion of the program.

Sincere regards
Ankur
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
oprofile-list mailing list
oprofile-list <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list

Gmane