Sabra Gargouri | 17 Sep 12:06 2014
Picon

Idle time in Oprofile

Hi,
In the context of my current activity witch intent to analyse Oprofile features, I would like to check if Oprofile is considering idle time in its profiling results. So for that purpose, I have run Oprofile without running any other application (the system is idle).  I used Oprofile in the timer mode.
First, I have run Oprofile on SH4 platform, and as following results show, 93% used by "poll_idle". 

15655    93.8437  vmlinux                  poll_idle
238       1.4267  vmlinux                  arch_local_irq_restore
28        0.1678  oprofiled                do_match
24        0.1439  oprofiled                pop_buffer_value
24        0.1439  vmlinux                  copy_page
18        0.1079  oprofiled                opd_process_samples
18        0.1079  oprofiled                sfile_find
16        0.0959  bash                     shell_getc
16        0.0959  libc-2.14.1.so           __gconv_transform_ascii_internal
16        0.0959  oprofiled                get_file
14        0.0839  libc-2.14.1.so           mbrtowc
14        0.0839  oprofiled                sfile_log_sample_count
13        0.0779  oprofiled                odb_update_node_with_offset
12        0.0719  libc-2.14.1.so           _int_malloc
9         0.0540  oprofiled                find_kernel_image
9         0.0540  vmlinux                  __copy_user
9         0.0540  vmlinux                  link_path_walk
9         0.0540  vmlinux                  nfs_permission
8         0.0480  ld-2.14.1.so             _dl_relocate_object
8         0.0480  vmlinux                  tcp_ack

I have also run Oprofile on ARM Cortex-a9 (SMP) whithout any application running (system is idle) and got 99.44% used by "fpa_get" function and there's nothing related to "idle".  

samples  %        app name                 symbol name
4148     99.4486  vmlinux                  fpa_get
2         0.0480  vmlinux                  print_cfs_rq
1         0.0240  bash                     hash_search
1         0.0240  bash                     parse_matched_pair
1         0.0240  gawk                     check_special
1         0.0240  ld-2.14.1.so             _dl_lookup_symbol_x
1         0.0240  libc-2.14.1.so           __default_morecore
1         0.0240  libc-2.14.1.so           __gconv_transform_ascii_internal
1         0.0240  libc-2.14.1.so           malloc_consolidate
1         0.0240  libc-2.14.1.so           strcpy
1         0.0240  vmlinux                  create_new_namespaces
1         0.0240  vmlinux                  dup_fd
1         0.0240  vmlinux                  fuse_copy_args
1         0.0240  vmlinux                  mnt_alloc_group_id
1         0.0240  vmlinux                  print_cpu
1         0.0240  vmlinux                  ptrace_request
1         0.0240  vmlinux                  seq_list_start_head
1         0.0240  vmlinux                  seq_write
1         0.0240  vmlinux                  usleep_range
1         0.0240  vmlinux                  vga_arbiter_notify_clients.part.11
1         0.0240  vmlinux                  vga_get
1         0.0240  vmlinux                  vm_insert_page
1         0.0240  vmlinux                  write_wb_reg

When searching in the official Oprofile documentation, I have found the following explanation
 " Your kernel is likely to support halting the processor when a CPU is idle. As the typical hardware events like CPU_CLK_UNHALTED do not count when the CPU is halted, the kernel profile will not reflect the actual amount of time spent idle.You can change this behaviour by booting with the idle=poll option, which uses a different idle routine. This will appear as poll_idle() in your kernel profile".
So I have rebooted my kernel with adding  idle=poll option, but I have not noticed any diffrence between the previous ones.

4707     99.5137  vmlinux                  fpa_get
2         0.0423  vmlinux                  attribute_container_unregister
2         0.0423  vmlinux                  print_cfs_rq
1         0.0211  bash                     execute_command_internal
1         0.0211  bash                     shell_getc
1         0.0211  libc-2.14.1.so           __gconv_transform_ascii_internal
1         0.0211  libc-2.14.1.so           sigprocmask
1         0.0211  libdl-2.14.1.so          call_gmon_start
1         0.0211  vmlinux                  __getnstimeofday
1         0.0211  vmlinux                  bdi_min_pause.isra.19
1         0.0211  vmlinux                  cgroup_scan_tasks
1         0.0211  vmlinux                  dev_alert
1         0.0211  vmlinux                  ext2_block_to_path.isra.19
1         0.0211  vmlinux                  ext4_ext_remove_space
1         0.0211  vmlinux                  iterate_supers
1         0.0211  vmlinux                  lg_local_lock
1         0.0211  vmlinux                  pipe_to_file
1         0.0211  vmlinux                  ptrace_request
1         0.0211  vmlinux                  register_filesystem
1         0.0211  vmlinux                  seq_write
1         0.0211  vmlinux                  sys_prctl
1         0.0211  vmlinux                  ubi_start_leb_change

Why does "idle_poll" does not appear in the "ARM" case? does it relate to architectural reasons?
Could we say that Oprofile is not intended to determine idle time ? or it's related to the configuration of the oprofile daemon?

BR








------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
oprofile-list mailing list
oprofile-list <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list
Narayanan, Krishnaprasad | 15 Sep 14:48 2014
Picon
Picon

Clarifications regarding the event output

Hallo all,

 

I am using Oprofile version 0.9.9 and use Operf command with –separate-cpu to obtain information on the following events: CPU_CLK_UNHALTED, INST_MISSES, INST_RETIRED, LLC_MISSES, LLC_REFS and BR_MISS_PRED_RETIRED with sampling rate as 1000000. The sampling rate here refers to the count flag that is specified for every event.

 

I run the Operf command in the background and for every 1 sec, I obtain the output from Opreport which is dumped to an output file.

 

Can I seek answers for the following questions?

a)      Are the values for the events that are generated every 1 sec is a cumulative sum from the previous timestamp? For ex, at timestamp T1, if there are 100 instructions retired and at timestamp T1+1, there are 200 instructions retired, can I know the event value (output) at timestamp  T1+1? Is it 200 or 100?

b)      Can I also know for which of these events, the difference between the current and previous output is applicable?

c)      Besides, can I kindly know the methodology to compute the total count of instructions?

 

Regards,

Krishnaprasad

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
oprofile-list mailing list
oprofile-list <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list
Maynard Johnson | 12 Sep 18:31 2014
Picon

Announcement: OProfile 1.0.0

We are pleased to announce the general availability of OProfile 1.0.0.
You can download this release at:
	http://oprofile.sourceforge.net/download/

-Maynard Johnson

-----------------------------------------------------------------------

OProfile 1.0.0 has been released. A major change in this release
is the removal of the legacy opcontrol-based profiler. The legacy
profiling tool has been deprecated since release 0.9.8 when operf
was first introduced. The following components and processor types
that were dependent on opcontrol have also been removed:

   - GUI component (i.e., oprof_start)
   - IBS events removed from AMD processors
   - All Alpha processors, except for EV67 (which *is* supported by operf/ocount)
   - Architecture avr32
   - Architecture ia64
   - Processor model IBM Cell
   - Processor model P.A. Semi PA6T
   - RTC (real time clock mode)

OProfile users still running on any of these affected systems or
needing any of the removed components listed above should not upgrade
to OProfile release 1.0. Alternatively, you can obtain all of the new
features, enhancements, and bug fixes described below and still have
access to opcontrol by doing the following:

	git clone git://git.code.sf.net/p/oprofile/oprofile oprofile
	cd oprofile
	git checkout PRE_RELEASE_1_0

and then build/install as usual.

More information about OProfile can be seen at
    http://oprofile.sf.net

Incompatibilities with previous release
---------------------------------------

- Sample data collected with previous releases of OProfile are incompatible
  with release 1.0.
- ophelp schema: Major version changed for removal of unit mask 'extra'
  attribute and addition of unit mask 'name'.

New features
------------

- Enhance ocount to support millisecond time intervals
- Obtain kernel symbols from /proc/kallsyms if no vmlinux file specified

- New/updated Processor Support
    * (New) Freescale e6500 
    * (New) Freescale e500mc
    * (New) Intel Silvermont
    * (New) ARMv7 Krait
    * (New) APM X-Gene (ARMv8)
    * (New) Intel Broadwell
    * (New) ARMv8 Cortex A57
    * (New) ARMv8 Cortex A53
    * Added little endian support for IBM POWER8
    * Update events for IBM POWER8
    * Added edge-detect events for IBM POWER7
    * Update events for Intel Haswell

Bug fixes
---------

Filed bug reports:
-------------------------------------------------------------------------
|  BUG ID   |  Summary 
|-----------|------------------------------------------------------------
|   236     | opreport schema: Fix count field maxOccurs (changed to
|           | 'unbounded')
|   245     | Fix compile error on ppc/uClibc platform: 'AT_BASE_PLATFORM'
|           | undeclared'
|   248     | Duplicate event specs passed to ocount show up twice in
|           | output
|   252     | Fix operf/ocount default unit mask selection
|   253     | ocount: print the unit mask, kernel and user modes if
|           | specified for the event
|   254     | ophelp schema is not included in installed files
|   255     | Remove unused 'extra' attribute from ophelp schema
|   256     | opreport from 'operf --callgraph' profile shows false
|           | recursive calls
|   257     | Fix handling of default named unit masks longer than 11 chars
|   259     | Print unit mask name where applicable in ophelp XML output
|   260     | Fix profiling of multi-threaded apps when using "--pid"
|           | option
|   262     | Fix operf/opreport kernel throttling detection
|   263     | Fix sample attribution problem when using multiple events
|   266     | exclude/include files option doesn't work for opannotate -a
-------------------------------------------------------------------------

Other bug fixes and improvements without a filed report (e.g., posted to the list):
---------------
   - Fix behavior and documentation for '--threshold' option
   - Remove hard-coded timeout for JIT dump conversion
   - Update Alpha EV67 CPU support and remove all other Alpha CPU support
   - operf main process improperly killing conversion process
   - Fix up S390 support to work with operf/ocount
   - Link ocount with librt for clock_gettime only when needed
   - Fix 'Invalid argument' running 'opcontrol --start --callgraph=<n>' in
     Timer mode
   - Allow root to remove old jitdump files from /tmp/.oprofile/jitdump
   - Remove opreport warnings for /no-vmlinux, [vdso], [hypervisor_bucket]
     not found
   - Fix event codes for marked architected events (IBM ppc64)
   - Make operf/ocount detect invalid timer mode from opcontrol
   - Reduce overhead of operf waiting for profiled app to end
   - Fix "Unable to open cpu_type file for reading" for IBM POWER7+
   - Allow all native events for IBM POWER8 in POWER7 compat mode
   - Fix spurious "backtraces skipped due to no file mapping" log entries
   - Fix the units for the reported CPU frequency

Known problems and limitations
-------------------------
- When using operf to profile multiple events, the absolute number of
  events recorded may be substantially fewer than expected. This can be
  due to known bug in the Linux kernel's Performance Events Subsystem that
  was fixed sometime between Linux kernel version 3.1 and 3.5.

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Andi Kleen | 11 Sep 01:07 2014

[PATCH] Update the Silvermont event files

From: Andi Kleen <ak <at> linux.intel.com>

On further review the silvermont event files had a lot of problems.
I regenerated them completely. This fixes the PEBS events, and
fixes a range of others.

The test suite passes without problems.

I realize it's a hard to review patch, but I think
it's the best option for 1.0.

Signed-off-by: Andi Kleen <ak <at> linux.intel.com>
---
 events/i386/silvermont/events     |  24 +++----
 events/i386/silvermont/unit_masks | 146 +++++++++++++++++++++-----------------
 2 files changed, 93 insertions(+), 77 deletions(-)

diff --git a/events/i386/silvermont/events b/events/i386/silvermont/events
index 077cc0a..434538f 100644
--- a/events/i386/silvermont/events
+++ b/events/i386/silvermont/events
 <at>  <at>  -7,20 +7,18  <at>  <at> 
 # lowered in many cases without ill effect.
 #
 include:i386/arch_perfmon
-event:0x32 counters:0,1 um:l2_prefetcher_throttle minimum:200003 name:l2_prefetcher_throttle :
-event:0x3e counters:0,1 um:one minimum:200003 name:l2_prefetcher_pref_stream_alloc :
-event:0x50 counters:0,1 um:zero minimum:200003
name:l2_prefetch_pend_streams_pref_stream_pend_set :
-event:0x86 counters:0,1 um:nip_stall minimum:200003 name:nip_stall :
-event:0x87 counters:0,1 um:decode_stall minimum:200003 name:decode_stall :
-event:0x96 counters:0,1 um:uip_match minimum:200003 name:uip_match :
+event:0x03 counters:0,1 um:rehabq minimum:200003 name:rehabq :
+event:0x04 counters:0,1 um:mem_uops_retired minimum:200003 name:mem_uops_retired :
+event:0x05 counters:0,1 um:page_walks minimum:200003 name:page_walks :
+event:0x30 counters:0,1 um:zero minimum:200003 name:l2_reject_xq_all :
+event:0x31 counters:0,1 um:zero minimum:200003 name:core_reject_l2q_all :
+event:0x80 counters:0,1 um:icache minimum:200003 name:icache :
 event:0xc2 counters:0,1 um:uops_retired minimum:2000003 name:uops_retired :
-event:0xc3 counters:0,1 um:x10 minimum:200003 name:machine_clears_live_lock_breaker :
-event:0xc4 counters:0,1 um:br_inst_retired minimum:2000003 name:br_inst_retired :
+event:0xc3 counters:0,1 um:machine_clears minimum:200003 name:machine_clears :
+event:0xc4 counters:0,1 um:br_inst_retired minimum:200003 name:br_inst_retired :
 event:0xc5 counters:0,1 um:br_misp_retired minimum:200003 name:br_misp_retired :
 event:0xca counters:0,1 um:no_alloc_cycles minimum:200003 name:no_alloc_cycles :
 event:0xcb counters:0,1 um:rs_full_stall minimum:200003 name:rs_full_stall :
-event:0xcc counters:0,1 um:rs_dispatch_stall minimum:200003 name:rs_dispatch_stall :
-event:0xe6 counters:0,1 um:baclears minimum:2000003 name:baclears :
-event:0xe7 counters:0,1 um:x02 minimum:200003 name:ms_decoded_early_exit :
-event:0xe8 counters:0,1 um:one minimum:200003 name:btclears_all :
-event:0xe9 counters:0,1 um:decode_restriction minimum:200003 name:decode_restriction :
+event:0xcd counters:0,1 um:one minimum:2000003 name:cycles_div_busy_all :
+event:0xe6 counters:0,1 um:baclears minimum:200003 name:baclears :
+event:0xe7 counters:0,1 um:one minimum:200003 name:ms_decoded_ms_entry :
diff --git a/events/i386/silvermont/unit_masks b/events/i386/silvermont/unit_masks
index 6309282..c0dac26 100644
--- a/events/i386/silvermont/unit_masks
+++ b/events/i386/silvermont/unit_masks
 <at>  <at>  -4,68 +4,86  <at>  <at> 
 # See http://ark.intel.com/ for help in identifying Silvermont based CPUs
 #
 include:i386/arch_perfmon
-name:x02 type:mandatory default:0x2
-	0x2 No unit mask
-name:x10 type:mandatory default:0x10
-	0x10 No unit mask
-name:l2_prefetcher_throttle type:exclusive default:0x2
-	0x2 extra:edge conservative Counts the number of cycles the L2 prefetcher spends in throttling mode
-	0x1 extra:edge aggressive Counts the number of cycles the L2 prefetcher spends in throttling mode
-name:nip_stall type:exclusive default:0x3f
-	0x3f extra: all Counts the number of cycles the NIP stalls.
-	0x1 extra: pfb_full Counts the number of cycles the NIP stalls and the PFBs are full.   This DOES NOT inlude
PFB throttler cases.
-	0x2 extra: itlb_miss Counts the number of cycles the NIP stalls and there is an outstanding ITLB miss.
This is a cummulative count of cycles the NIP stalled for all ITLB misses.
-	0x8 extra: pfb_throttler Counts the number of cycles the NIP stalls, the throttler is engaged, and the
PFBs appear full.
-	0x10 extra: do_snoop Counts the number of cycles the NIP stalls because of a SMC compliance snoop to the
MEC is required.
-	0x20 extra: misc_other Counts the number of cycles the NIP stalls due to NUKE, Stop Front End, Inserted flows.
-	0x1e extra: pfb_ready Counts the number of cycles the NIP stalls when the PFBs are not full and the
decoders are able to process bytes.  Does not count PFB_FULL nor MISC_OTHER stall cycles.
-name:decode_stall type:exclusive default:0x1
-	0x1 extra: pfb_empty Counts the number of cycles decoder is stalled because the PFB is empty, this count
is useful to see if the decoder is receiving the bytes from the front end. This event together with the
DECODE_STALL.IQ_FULL may be used to narrow down on the bottleneck.
-	0x2 extra: iq_full Counts the number of cycles decoder is stalled because the IQ is full, this count is
useful to see if the decoder is delivering the decoded uops. This event together with the
DECODE_STALL.PFB_EMPTY may be used to narrow down on the bottleneck.
-name:uip_match type:exclusive default:0x1
-	0x1 extra: first_uip This event is used for counting the number of times a specific micro IP address was decoded
-	0x2 extra: second_uip This event is used for counting the number of times a specific micro IP address was decoded
-name:uops_retired type:exclusive default:0x2
-	0x2 extra: x87 This event counts the number of micro-ops retired that used X87 hardware.
-	0x4 extra: mul This event counts the number of micro-ops retired that used MUL hardware.
-	0x8 extra: div This event counts the number of micro-ops retired that used DIV hardware.
-	0x1 extra: ms_cyles Counts the number of uops that are from the complex flows issued by the
micro-sequencer (MS).  This includes uops from flows due to faults, assists, and inserted flows.
-name:br_inst_retired type:exclusive default:0x1
-	0x1 extra: remove_jcc REMOVE_JCC counts the number of branch instructions retired but removes taken
and not taken conditional branches (JCC).  Branch prediction predicts the branch target and enables the
processor to begin executing instructions long before the branch true execution path is known. All
branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address
not only based on the EIP of the branch but also based on the execution path through which execution reached
this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct
calls and jumps, indirect calls and jumps, returns.
-	0x2 extra: remove_rel_call REMOVE_REL_CALL counts the number of branch instructions retired but
removes near relative CALL.  Branch prediction predicts the branch target and enables the processor to
begin executing instructions long before the branch true execution path is known. All branches utilize
the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on
the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU
can efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x4 extra: remove_ind_call REMOVE_IND_CALL counts the number of branch instructions retired but
removes near indirect CALL. Branch prediction predicts the branch target and enables the processor to
begin executing instructions long before the branch true execution path is known. All branches utilize
the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on
the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU
can efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x8 extra: remove_ret REMOVE_RET counts the number of branch instructions retired but removes near
RET.  Branch prediction predicts the branch target and enables the processor to begin executing
instructions long before the branch true execution path is known. All branches utilize the branch
prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of
the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x10 extra: remove_ind_jmp REMOVE_IND_JMP counts the number of branch instructions retired but
removes near indirect JMP.  Branch prediction predicts the branch target and enables the processor to
begin executing instructions long before the branch true execution path is known. All branches utilize
the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on
the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU
can efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x20 extra: remove_rel_jmp REMOVE_REL_JMP counts the number of branch instructions retired but
removes near relative JMP.  Branch prediction predicts the branch target and enables the processor to
begin executing instructions long before the branch true execution path is known. All branches utilize
the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on
the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU
can efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x40 extra: remove_far REMOVE_FAR counts the number of branch instructions retired but removes all far
branches.  Branch prediction predicts the branch target and enables the processor to begin executing
instructions long before the branch true execution path is known. All branches utilize the branch
prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of
the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
-	0x80 extra: remove_not_taken_jcc REMOVE_NOT_TAKEN_JCC counts the number of branch instructions
retired but removes taken conditional branches (JCC).  Branch prediction predicts the branch target and
enables the processor to begin executing instructions long before the branch true execution path is
known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the
target address not only based on the EIP of the branch but also based on the execution path through which
execution reached this EIP. The BPU can efficiently predict the following branch types: conditional
branches, direct calls and jumps, indirect calls and jumps, returns.
-name:br_misp_retired type:exclusive default:0x1
-	0x1 extra: remove_jcc REMOVE_JCC counts the number of mispredicted branch instructions retired but
removes taken and not taken conditional branches (JCC).  This event counts the number of retired branch
instructions that were mispredicted by the processor, categorized by type. A branch misprediction
occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the
misprediction is discovered, all the instructions executed in the wrong (speculative) path must be
discarded, and the processor must start fetching from the correct path.
-	0x4 extra: remove_ind_call REMOVE_IND_CALL Counts the number of mispredicted branch instructions
retired but removes near indirect CALL.  This event counts the number of retired branch instructions that
were mispredicted by the processor, categorized by type. A branch misprediction occurs when the
processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
-	0x8 extra: remove_ret REMOVE_RET Counts the number of mispredicted branch instructions retired but
removes near RET.  This event counts the number of retired branch instructions that were mispredicted by
the processor, categorized by type. A branch misprediction occurs when the processor predicts that the
branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the
instructions executed in the wrong (speculative) path must be discarded, and the processor must start
fetching from the correct path.
-	0x10 extra: remove_ind_jmp REMOVE_IND_JMP counts the number of mispredicted branch instructions
retired but removes near indirect JMP.  This event counts the number of retired branch instructions that
were mispredicted by the processor, categorized by type. A branch misprediction occurs when the
processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
-	0x80 extra: remove_not_taken_jcc REMOVE_NOT_TAKEN_JCC counts the number of mispredicted branch
instructions retired but removes taken conditional branches (JCC).  This event counts the number of
retired branch instructions that were mispredicted by the processor, categorized by type. A branch
misprediction occurs when the processor predicts that the branch would be taken, but it is not, or
vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong
(speculative) path must be discarded, and the processor must start fetching from the correct path.
+name:rehabq type:exclusive default:0x1
+	0x1 extra: ld_block_st_forward This event counts the number of retired loads that were prohibited from
receiving forwarded data from the store because of address mismatch.
+	0x1 extra:pebs ld_block_st_forward_pebs This event counts the number of retired loads that were
prohibited from receiving forwarded data from the store because of address mismatch.
+	0x2 extra: ld_block_std_notready This event counts the cases where a forward was technically
possible, but did not occur because the store data was not available at the right time
+	0x4 extra: st_splits This event counts the number of retire stores that experienced cache line boundary splits
+	0x8 extra: ld_splits This event counts the number of retire loads that experienced cache line boundary splits
+	0x8 extra:pebs ld_splits_pebs This event counts the number of retire loads that experienced cache line
boundary splits
+	0x10 extra: lock This event counts the number of retired memory operations with lock semantics. These
are either implicit locked instructions such as the XCHG instruction or instructions with an explicit
LOCK prefix (0xF0).
+	0x20 extra: sta_full This event counts the number of retired stores that are delayed because there is not
a store address buffer available.
+	0x40 extra: any_ld This event counts the number of load uops reissued from Rehabq
+	0x80 extra: any_st This event counts the number of store uops reissued from Rehabq
+name:mem_uops_retired type:exclusive default:0x1
+	0x1 extra: l1_miss_loads This event counts the number of load ops retired that miss in L1 Data cache. Note
that prefetch misses will not be counted.
+	0x2 extra: l2_hit_loads This event counts the number of load ops retired that hit in the L2
+	0x2 extra:pebs l2_hit_loads_pebs This event counts the number of load ops retired that hit in the L2
+	0x4 extra: l2_miss_loads This event counts the number of load ops retired that miss in the L2
+	0x4 extra:pebs l2_miss_loads_pebs This event counts the number of load ops retired that miss in the L2
+	0x8 extra: dtlb_miss_loads This event counts the number of load ops retired that had DTLB miss.
+	0x8 extra:pebs dtlb_miss_loads_pebs This event counts the number of load ops retired that had DTLB miss.
+	0x10 extra: utlb_miss This event counts the number of load ops retired that had UTLB miss.
+	0x20 extra: hitm This event counts the number of load ops retired that got data from the other core or from
the other module.
+	0x20 extra:pebs hitm_pebs This event counts the number of load ops retired that got data from the other
core or from the other module.
+	0x40 extra: all_loads This event counts the number of load ops retired
+	0x80 extra: all_stores This event counts the number of store ops retired
+name:page_walks type:exclusive default:0x1
+	0x1 extra:edge d_side_walks This event counts when a data (D) page walk is completed or started.  Since a
page walk implies a TLB miss, the number of TLB misses can be counted by counting the number of pagewalks.
+	0x1 extra: d_side_cycles This event counts every cycle when a D-side (walks due to a load) page walk is in
progress. Page walk duration divided by number of page walks is the average duration of page-walks.
+	0x2 extra:edge i_side_walks This event counts when an instruction (I) page walk is completed or
started.  Since a page walk implies a TLB miss, the number of TLB misses can be counted by counting the number
of pagewalks.
+	0x2 extra: i_side_cycles This event counts every cycle when a I-side (walks due to an instruction fetch)
page walk is in progress. Page walk duration divided by number of page walks is the average duration of page-walks.
+	0x3 extra:edge walks This event counts when a data (D) page walk or an instruction (I) page walk is
completed or started.  Since a page walk implies a TLB miss, the number of TLB misses can be counted by
counting the number of pagewalks.
+	0x3 extra: cycles This event counts every cycle when a data (D) page walk or instruction (I) page walk is in
progress.  Since a pagewalk implies a TLB miss, the approximate cost of a TLB miss can be determined from
this event.
+name:icache type:exclusive default:0x3
+	0x3 extra: accesses This event counts all instruction fetches, including uncacheable fetches.
+	0x1 extra: hit This event counts all instruction fetches from the instruction cache.
+	0x2 extra: misses This event counts all instruction fetches that miss the Instruction cache or produce
memory requests. This includes uncacheable fetches. An instruction fetch miss is counted only once and
not once for every cycle it is outstanding.
+name:uops_retired type:exclusive default:0x10
+	0x10 extra: all This event counts the number of micro-ops retired. The processor decodes complex macro
instructions into a sequence of simpler micro-ops. Most instructions are composed of one or two
micro-ops. Some instructions are decoded into longer sequences such as repeat instructions, floating
point transcendental instructions, and assists. In some cases micro-op sequences are fused or whole
instructions are fused into one micro-op. See other UOPS_RETIRED events for differentiating retired
fused and non-fused micro-ops.
+	0x1 extra: ms This event counts the number of micro-ops retired that were supplied from MSROM.
+name:machine_clears type:exclusive default:0x8
+	0x8 extra: all Machine clears happen when something happens in the machine that causes the hardware to
need to take special care to get the right answer. When such a condition is signaled on an instruction, the
front end of the machine is notified that it must restart, so no more instructions will be decoded from the
current path.  All instructions "older" than this one will be allowed to finish.  This instruction and all
"younger" instructions must be cleared, since they must not be allowed to complete.  Essentially, the
hardware waits until the problematic instruction is the oldest instruction in the machine.  This means
all older instructions are retired, and all pending stores (from older instructions) are completed. 
Then the new path of instructions from the front end are allowed t
 o start into the machine.  There are many conditions that might cause a machine clear (including the receipt
of an interrupt, or a trap or a fault).  All those conditions (including but not limited 
 to MACHINE_CLEARS.MEMORY_ORDERING, MACHINE_CLEARS.SMC, and MACHINE_CLEARS.FP_ASSIST) are
captured in the ANY event. In addition, some conditions can be specifically counted (i.e. SMC,
MEMORY_ORDERING, FP_ASSIST).  However, the sum of SMC, MEMORY_ORDERING, and FP_ASSIST machine clears
will not necessarily equal the number of ANY.
+	0x1 extra: smc This event counts the number of times that a program writes to a code section.
Self-modifying code causes a severe penalty in all Intel? architecture processors.
+	0x2 extra: memory_ordering This event counts the number of times that pipeline was cleared due to memory
ordering issues.
+	0x4 extra: fp_assist This event counts the number of times that pipeline stalled due to FP operations
needing assists.
+name:br_inst_retired type:exclusive default:0x7e
+	0x7e extra: jcc JCC counts the number of conditional branch (JCC) instructions retired. Branch
prediction predicts the branch target and enables the processor to begin executing instructions long
before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0x7e extra:pebs jcc_pebs JCC counts the number of conditional branch (JCC) instructions retired.
Branch prediction predicts the branch target and enables the processor to begin executing instructions
long before the branch true execution path is known. All branches utilize the branch prediction unit
(BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also
based on the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfe extra: taken_jcc TAKEN_JCC counts the number of taken conditional branch (JCC) instructions
retired. Branch prediction predicts the branch target and enables the processor to begin executing
instructions long before the branch true execution path is known. All branches utilize the branch
prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of
the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
+	0xfe extra:pebs taken_jcc_pebs TAKEN_JCC counts the number of taken conditional branch (JCC)
instructions retired. Branch prediction predicts the branch target and enables the processor to begin
executing instructions long before the branch true execution path is known. All branches utilize the
branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the
EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
+	0xf9 extra: call CALL counts the number of near CALL branch instructions retired.  Branch prediction
predicts the branch target and enables the processor to begin executing instructions long before the
branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xf9 extra:pebs call_pebs CALL counts the number of near CALL branch instructions retired.  Branch
prediction predicts the branch target and enables the processor to begin executing instructions long
before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfd extra: rel_call REL_CALL counts the number of near relative CALL branch instructions retired. 
Branch prediction predicts the branch target and enables the processor to begin executing instructions
long before the branch true execution path is known. All branches utilize the branch prediction unit
(BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also
based on the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfd extra:pebs rel_call_pebs REL_CALL counts the number of near relative CALL branch instructions
retired.  Branch prediction predicts the branch target and enables the processor to begin executing
instructions long before the branch true execution path is known. All branches utilize the branch
prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of
the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
+	0xfb extra: ind_call IND_CALL counts the number of near indirect CALL branch instructions retired. 
Branch prediction predicts the branch target and enables the processor to begin executing instructions
long before the branch true execution path is known. All branches utilize the branch prediction unit
(BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also
based on the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfb extra:pebs ind_call_pebs IND_CALL counts the number of near indirect CALL branch instructions
retired.  Branch prediction predicts the branch target and enables the processor to begin executing
instructions long before the branch true execution path is known. All branches utilize the branch
prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of
the branch but also based on the execution path through which execution reached this EIP. The BPU can
efficiently predict the following branch types: conditional branches, direct calls and jumps,
indirect calls and jumps, returns.
+	0xf7 extra: return RETURN counts the number of near RET branch instructions retired.  Branch prediction
predicts the branch target and enables the processor to begin executing instructions long before the
branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xf7 extra:pebs return_pebs RETURN counts the number of near RET branch instructions retired.  Branch
prediction predicts the branch target and enables the processor to begin executing instructions long
before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xeb extra: non_return_ind NON_RETURN_IND counts the number of near indirect JMP and near indirect
CALL branch instructions retired.  Branch prediction predicts the branch target and enables the
processor to begin executing instructions long before the branch true execution path is known. All
branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address
not only based on the EIP of the branch but also based on the execution path through which execution reached
this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct
calls and jumps, indirect calls and jumps, returns.
+	0xeb extra:pebs non_return_ind_pebs NON_RETURN_IND counts the number of near indirect JMP and near
indirect CALL branch instructions retired.  Branch prediction predicts the branch target and enables
the processor to begin executing instructions long before the branch true execution path is known. All
branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address
not only based on the EIP of the branch but also based on the execution path through which execution reached
this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct
calls and jumps, indirect calls and jumps, returns.
+	0xbf extra: far_branch FAR counts the number of far branch instructions retired.  Branch prediction
predicts the branch target and enables the processor to begin executing instructions long before the
branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xbf extra:pebs far_branch_pebs FAR counts the number of far branch instructions retired.  Branch
prediction predicts the branch target and enables the processor to begin executing instructions long
before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for
prediction. This unit predicts the target address not only based on the EIP of the branch but also based on
the execution path through which execution reached this EIP. The BPU can efficiently predict the
following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+name:br_misp_retired type:exclusive default:0x7e
+	0x7e extra: jcc JCC counts the number of mispredicted conditional branches (JCC) instructions
retired.  This event counts the number of retired branch instructions that were mispredicted by the
processor, categorized by type. A branch misprediction occurs when the processor predicts that the
branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the
instructions executed in the wrong (speculative) path must be discarded, and the processor must start
fetching from the correct path.
+	0x7e extra:pebs jcc_pebs JCC counts the number of mispredicted conditional branches (JCC)
instructions retired.  This event counts the number of retired branch instructions that were
mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor
predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
+	0xfe extra: taken_jcc TAKEN_JCC counts the number of mispredicted taken conditional branch (JCC)
instructions retired.  This event counts the number of retired branch instructions that were
mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor
predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
+	0xfe extra:pebs taken_jcc_pebs TAKEN_JCC counts the number of mispredicted taken conditional branch
(JCC) instructions retired.  This event counts the number of retired branch instructions that were
mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor
predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
+	0xfb extra: ind_call IND_CALL counts the number of mispredicted near indirect CALL branch
instructions retired.  This event counts the number of retired branch instructions that were
mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor
predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
+	0xfb extra:pebs ind_call_pebs IND_CALL counts the number of mispredicted near indirect CALL branch
instructions retired.  This event counts the number of retired branch instructions that were
mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor
predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is
discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the
processor must start fetching from the correct path.
+	0xf7 extra: return RETURN counts the number of mispredicted near RET branch instructions retired.  This
event counts the number of retired branch instructions that were mispredicted by the processor,
categorized by type. A branch misprediction occurs when the processor predicts that the branch would be
taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed
in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0xf7 extra:pebs return_pebs RETURN counts the number of mispredicted near RET branch instructions
retired.  This event counts the number of retired branch instructions that were mispredicted by the
processor, categorized by type. A branch misprediction occurs when the processor predicts that the
branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the
instructions executed in the wrong (speculative) path must be discarded, and the processor must start
fetching from the correct path.
+	0xeb extra: non_return_ind NON_RETURN_IND counts the number of mispredicted near indirect JMP and
near indirect CALL branch instructions retired.  This event counts the number of retired branch
instructions that were mispredicted by the processor, categorized by type. A branch misprediction
occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the
misprediction is discovered, all the instructions executed in the wrong (speculative) path must be
discarded, and the processor must start fetching from the correct path.
+	0xeb extra:pebs non_return_ind_pebs NON_RETURN_IND counts the number of mispredicted near indirect
JMP and near indirect CALL branch instructions retired.  This event counts the number of retired branch
instructions that were mispredicted by the processor, categorized by type. A branch misprediction
occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the
misprediction is discovered, all the instructions executed in the wrong (speculative) path must be
discarded, and the processor must start fetching from the correct path.
 name:no_alloc_cycles type:exclusive default:0x3f
-	0x3f extra:inv all Counts the number of cycles that uops are allocated (inverse of NO_ALLOC_CYCLES.ALL)
-	0x2 extra: sd_buffer_full Counts the number of cycles when no uops are allocated and the store data
buffer is full.
-	0x4 extra: mispredicts Counts the number of cycles when no uops are allocated and the alloc pipe is
stalled waiting for a mispredicted jump to retire.  After the misprediction is detected, the front end
will start immediately but the allocate pipe stalls until the mispredicted
-	0x8 extra: scoreboard Counts the number of cycles when no uops are allocated and a microcode IQ-based
scoreboard stall is active. This includes stalls due to both the retirement scoreboard (at-ret) and
micro-Jcc execution scoreboard (at-jeu).  Does not count cycles when the MS
-	0x10 extra: iq_empty Counts the number of cycles when no uops are allocated and the IQ is empty.  Will
assert immediately after a mispredict and partially overlap with MISPREDICTS sub event.
-name:rs_full_stall type:exclusive default:0x2
-	0x2 extra: iec_port0 Counts the number of cycles the Alloc pipeline is stalled because IEC RS for port 0 is full.
-	0x4 extra: iec_port1 Counts the number of cycles the Alloc pipeline is stalled because IEC RS for port 1 is full.
-	0x8 extra: fpc_port0 Counts the number of cycles the Alloc pipeline is stalled because FPC RS for port 0 is full.
-	0x10 extra: fpc_port1 Counts the number of cycles the Alloc pipeline is stalled because FPC RS for port 1
is full.
-name:rs_dispatch_stall type:exclusive default:0x1
-	0x1 extra: iec0_rs *COUNTER BROKEN - NO FIX* Counts cycles when no uops were disptached from port 0 of IEC
RS while the RS had valid ops left to dispatch
-	0x2 extra: iec1_rs *COUNTER BROKEN - NO FIX* Counts cycles when no uops were disptached from port 1 of IEC
RS while the RS had valid ops left to dispatch
-	0x4 extra: fpc0_rs Counts cycles when no uops were disptached from port 0 of FPC RS while the RS had valid
ops left to dispatch
-	0x8 extra: fpc1_rs Counts cycles when no uops were disptached from port 1 of FPC RS while the RS had valid
ops left to dispatch
-	0x10 extra: mec_rs Counts cycles when no uops were dispatched from the MEC RS or rehab queue while valid
ops were left to dispatch
-name:baclears type:exclusive default:0x2
-	0x2 extra: indirect Counts the number indirect branch baclears
-	0x4 extra: uncond Counts the number unconditional branch baclears
-	0x1e extra: no_corner_case sum of submasks [4:1].  Does not count special case baclears due to things
like parity errors, bogus branches, and pd$ issues.
-name:decode_restriction type:exclusive default:0x1
-	0x1 extra: pdcache_wrong Counts the number of times a decode restriction reduced the decode throughput
due to wrong instruction length prediction
-	0x2 extra: all_3cycle_resteers Counts the number of times a decode restriction reduced the decode
throughput because of all 3 cycle resteer conditions.  Mainly PDCACHE_WRONG and MS_ENTRY cases.
+	0x3f extra: all The NO_ALLOC_CYCLES.ALL event counts the number of cycles when the front-end does not
provide any instructions to be allocated for any reason. This event indicates the cycles where an
allocation stalls occurs, and no UOPS are allocated in that cycle.
+	0x1 extra: rob_full Counts the number of cycles when no uops are allocated and the ROB is full (less than 2
entries available)
+	0x20 extra: rat_stall Counts the number of cycles when no uops are allocated and a RATstall is asserted.
+	0x50 extra: not_delivered The NO_ALLOC_CYCLES.NOT_DELIVERED event is used to measure front-end
inefficiencies, i.e. when front-end of the machine is not delivering micro-ops to the back-end and the
back-end is not stalled. This event can be used to identify if the machine is truly front-end bound.  When
this event occurs, it is an indication that the front-end of the machine is operating at less than its
theoretical peak performance.  Background: We can think of the processor pipeline as being divided into 2
broader parts: Front-end and Back-end. Front-end is responsible for fetching the instruction,
decoding into micro-ops (uops) in machine understandable format and putting them into a micro-op queue
to be consumed by back end. The back-end then takes these micro-ops, allocates the requ
 ired resources.  When all resources are ready, micro-ops are executed. If the back-end is not ready to
accept micro-ops from the front-end, then we do not want to count these as front-end bottleneck
 s.  However, whenever we have bottlenecks in the back-end, we will have allocation unit stalls and
eventually forcing the front-end to wait until the back-end is ready to receive more UOPS. This event
counts the cycles only when back-end is requesting more uops and front-end is not able to provide them.
Some examples of conditions that cause front-end efficiencies are: Icache misses, ITLB misses, and
decoder restrictions that limit the the front-end bandwidth.
+name:rs_full_stall type:exclusive default:0x1f
+	0x1f extra: all Counts the number of cycles the Alloc pipeline is stalled when any one of the RSs (IEC, FPC
and MEC) is full. This event is a superset of all the individual RS stall event counts.
+	0x1 extra: mec Counts the number of cycles and allocation pipeline is stalled and is waiting for a free MEC
reservation station entry.  The cycles should be appropriately counted in case of the cracked ops e.g. In
case of a cracked load-op, the load portion is sent to M
+name:baclears type:exclusive default:0x1
+	0x1 extra: all The BACLEARS event counts the number of times the front end is resteered, mainly when the
Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address
Calculator at the front end.  The BACLEARS.ANY event counts the number of baclears for any type of branch.
+	0x8 extra: return The BACLEARS event counts the number of times the front end is resteered, mainly when
the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch
Address Calculator at the front end.  The BACLEARS.RETURN event counts the number of RETURN baclears.
+	0x10 extra: cond The BACLEARS event counts the number of times the front end is resteered, mainly when the
Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address
Calculator at the front end.  The BACLEARS.COND event counts the number of JCC (Jump on Condtional Code) baclears.
--

-- 
1.9.3

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Maynard Johnson | 9 Sep 18:08 2014
Picon

oprofile 1.0.0 to be GA'ed soon

Assuming I do not receive any negative feedback from anyone testing the RC2, I will GA the 1.0.0 release
later this week -- probably Thursday.

-Maynard

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Maynard Johnson | 29 Aug 23:29 2014
Picon

Announcement: Release Candidate 2 for OProfile 1.0.0

We are pleased to announce OProfile 1.0.0 Release Candidate 2.  You can download this release at:
	http://sourceforge.net/projects/oprofile/files/oprofile/oprofile-1.0.0-rc2/

Changes from RC1:
  * Back out recent change to exclude hypervisor samples and counts
  * Fix behavior and documentation for '--threshold' option
  * Fix Java profiling regression bug from Aug 13 Coverity fixes
  * Fix cryptic objdump error message from opannotate for /proc/kallsyms
  * Remove hard-coded timeout for JIT dump conversion

Please download and test this release candidate, and send your feedback by replying to this message. 
Please include your hardware platform and Linux distribution information in your reply.

Thanks.
-Maynard Johnson

-----------------------------------------------------------------

Release Notes
===============
OProfile 1.0.0 has been released. A major change in this release
is the removal of the legacy opcontrol-based profiler. The legacy
profiling tool has been deprecated since release 0.9.8 when operf
was first introduced. The following components and processor types
that were dependent on opcontrol have also been removed:

   - GUI component (i.e., oprof_start)
   - IBS events removed from AMD processors
   - All Alpha processors, except for EV67 (which *is* supported by operf/ocount)
   - Architecture avr32
   - Architecture ia64
   - Processor model IBM Cell
   - Processor model P.A. Semi PA6T
   - RTC (real time clock mode)

OProfile users still running on any of these affected systems or
needing any of the removed components listed above should not upgrade
to OProfile release 1.0. Alternatively, you can obtain all of the new
features, enhancements, and bug fixes described below and still have
access to opcontrol by doing the following:

	git clone git://git.code.sf.net/p/oprofile/oprofile oprofile
	cd oprofile
	git checkout PRE_RELEASE_1_0

and then build/install as usual.

More information about OProfile can be seen at
    http://oprofile.sf.net

Incompatibilities with previous release
---------------------------------------

- Sample data collected with previous releases of OProfile are incompatible
  with release 1.0.
- ophelp schema: Major version changed for removal of unit mask 'extra'
  attribute and addition of unit mask 'name'.

New features
------------

- Enhance ocount to support millisecond time intervals
- Obtain kernel symbols from /proc/kallsyms if no vmlinux file specified

- New/updated Processor Support
    * (New) Freescale e6500 
    * (New) Freescale e500mc
    * (New) Intel Silvermont
    * (New) ARM ARMv7 Krait
    * (New) ARM ARMv8 (AArch64)
    * (New) Intel Broadwell
    * (New) ARM Cortex A57
    * (New) ARM Cortex A53
    * Added little endian support for IBM POWER8
    * Update events for IBM POWER8
    * Added edge-detect events for IBM POWER7
    * Update events for Intel Haswell

Bug fixes
---------

Filed bug reports:
-------------------------------------------------------------------------
|  BUG ID   |  Summary 
|-----------|------------------------------------------------------------
|   236     | opreport schema: Fix count field maxOccurs (changed to
|           | 'unbounded')
|   245     | Fix compile error on ppc/uClibc platform: 'AT_BASE_PLATFORM'
|           | undeclared'
|   248     | Duplicate event specs passed to ocount show up twice in
|           | output
|   252     | Fix operf/ocount default unit mask selection
|   253     | ocount: print the unit mask, kernel and user modes if
|           | specified for the event
|   254     | ophelp schema is not included in installed files
|   255     | Remove unused 'extra' attribute from ophelp schema
|   256     | opreport from 'operf --callgraph' profile shows false
|           | recursive calls
|   257     | Fix handling of default named unit masks longer than 11 chars
|   259     | Print unit mask name where applicable in ophelp XML output
|   260     | Fix profiling of multi-threaded apps when using "--pid"
|           | option
|   262     | Fix operf/opreport kernel throttling detection
|   263     | Fix sample attribution problem when using multiple events
|   266     | exclude/include files option doesn't work for opannotate -a
-------------------------------------------------------------------------

Other bug fixes and improvements without a filed report (e.g., posted to the list):
---------------
   - Fix behavior and documentation for '--threshold' option
   - Remove hard-coded timeout for JIT dump conversion
   - Update Alpha EV67 CPU support and remove all other Alpha CPU support
   - operf main process improperly killing conversion process
   - Fix up S390 support to work with operf/ocount
   - Link ocount with librt for clock_gettime only when needed
   - Fix 'Invalid argument' running 'opcontrol --start --callgraph=<n>' in
     Timer mode
   - Allow root to remove old jitdump files from /tmp/.oprofile/jitdump
   - Remove opreport warnings for /no-vmlinux, [vdso], [hypervisor_bucket]
     not found
   - Fix event codes for marked architected events (IBM ppc64)
   - Make operf/ocount detect invalid timer mode from opcontrol
   - Reduce overhead of operf waiting for profiled app to end
   - Fix "Unable to open cpu_type file for reading" for IBM POWER7+
   - Allow all native events for IBM POWER8 in POWER7 compat mode
   - Fix spurious "backtraces skipped due to no file mapping" log entries
   - Fix the units for the reported CPU frequency

Known problems and limitations
-------------------------
- When using operf to profile multiple events, the absolute number of
  events recorded may be substantially fewer than expected. This can be
  due to knwon bug in the Linux kernel's Performance Events Subsystem that
  was fixed sometime between Linux kernel version 3.1 and 3.5.

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
Maynard Johnson | 29 Aug 15:24 2014
Picon

[PATCH] Back out recent change to exclude hypervisor samples and counts

Back out recent change to exclude hypervisor samples and counts

Recent commits, 3f93a3b3 and 9c662bfa, made changes to exclude the
collection of hypervisor samples (for operf) and counts (for ocount).
I learned later that S390 and Alpha architectures do not support mode
exclusion, and so fixes were required to avoid excluding hypervisor
on those architectures. Now it seems that under certain conditions,
the ARM architecture also cannot do mode exclusion, and this exclusion
of hypervisor was causing operf and ocount to fail.

The original changes (in commits 3f93a3b3 and 9c662bfa) were not made
due to a bug report, but because I had noted oprofile had always been
silently collecting hypervisor samples/counts, without ever having a
means of identifying them as such. Additionally, the event specification
(for passing events to operf and ocount) has no support for explicit
inclusion/exclusion of hypervisor data. I thought it would be better to
simply always exclude hypervisor until such time that we expanded the
event specification and operf/ocount interfaces to properly support
hypervisor. But in retrospect, that was a bad decision, causing too
much breakage from various architectures. This patch backs out the
exclusion of hypervisor, as well as the S390 and Alpha architecutre-
specific conditional compilation involving same.

Signed-off-by: Maynard Johnson <maynardj <at> us.ibm.com>
---
 libpe_utils/op_pe_utils.cpp      |   29 ++---------------------------
 libperf_events/operf_counter.cpp |    4 +---
 pe_counting/ocount_counter.cpp   |    3 +--
 3 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/libpe_utils/op_pe_utils.cpp b/libpe_utils/op_pe_utils.cpp
index 7c9691b..8c69894 100644
--- a/libpe_utils/op_pe_utils.cpp
+++ b/libpe_utils/op_pe_utils.cpp
 <at>  <at>  -880,24 +880,6  <at>  <at>  void op_pe_utils::op_process_events_list(set<string> & passed_evts,
 		event.evt_um = 0UL;
 		event.no_kernel = 0;
 		event.no_user = 0;
-		/* Explicitly exclude hypervisor samples since we currently do not have any
-		 * interface support for such.  If we did not do this, we could see situations
-		 * on hypervisor-controlled systems like the following:
-		 * 	$ ocount -e PM_RUN_CYC:0:0:0 /bin/true
-		 *
-		 * 	Events were actively counted for 398213 nanoseconds.
-		 * 	Event counts (actual) for /bin/true:
-		 *	        Event                         Count                    % time counted
-		 *	        PM_RUN_CYC_GRP1:0x0:0:0       123,260                  100.00
-		 *
-		 * Note that the event spec explicitly excludes both kernel and user events, yet
-		 * the output shows a non-zero count. The user could "assume" those counts are from
-		 * hypervisor, but that's ugly.
-		 *
-		 * FIXME: Add full hypervisor support by adding another bit in the event specification
-		 * and documenting it in the man pages and user guide.
-		 */
-		event.no_hv = 1;
 		event.throttled = false;
 		event.mode_specified = false;
 		event.umask_specified = false;
 <at>  <at>  -960,10 +942,8  <at>  <at>  void op_pe_utils::op_process_events_list(set<string> & passed_evts,
 #endif

 #ifdef __alpha__
-		// Alpha arch does not support any mode exclusion.  We'll just silently enable
-		// hypervisor, but if either user or kernel mode are excluded by the user, we'll
-		// exit with an error message.
-		event.no_hv = 0;
+		// Alpha arch does not support any mode exclusion, so if either user or kernel
+		// mode are excluded by the user, we'll exit with an error message.
 		if (event.no_kernel || event.no_user) {
 			cerr << "Mode exclusion is not supported on Alpha." << endl
 			     << "Re-run the command and simply pass the event name " << endl
 <at>  <at>  -1022,11 +1002,6  <at>  <at>  void op_pe_utils::op_get_default_event(bool do_callgraph)
 		dft_evt.count = descr.count;
 	}
 	dft_evt.evt_um = descr.um;
-#ifndef __alpha__
-	// See comment in op_process_events_list for why we set no_hv to 1.
-	// Alpha arch does not support any mode exclusion.
-	dft_evt.no_hv = 1;
-#endif
 	strncpy(dft_evt.name, descr.name, OP_MAX_EVT_NAME_LEN - 1);
 	_get_event_code(&dft_evt, cpu_type);
 	events.push_back(dft_evt);
diff --git a/libperf_events/operf_counter.cpp b/libperf_events/operf_counter.cpp
index 32d10a6..42c0cd1 100644
--- a/libperf_events/operf_counter.cpp
+++ b/libperf_events/operf_counter.cpp
 <at>  <at>  -220,10 +220,8  <at>  <at>  operf_counter::operf_counter(operf_event_t & evt,  bool enable_on_exec, bool do_

 #ifdef __s390__
 	attr.type = PERF_TYPE_HARDWARE;
-	attr.exclude_hv = 0;
 #else
 	attr.type = PERF_TYPE_RAW;
-	attr.exclude_hv = evt.no_hv;
 #endif
 #if ((defined(__i386__) || defined(__x86_64__)) && (HAVE_PERF_PRECISE_IP))
 	if (evt.evt_code & EXTRA_PEBS) {
 <at>  <at>  -231,7 +229,7  <at>  <at>  operf_counter::operf_counter(operf_event_t & evt,  bool enable_on_exec, bool do_
 		evt.evt_code ^= EXTRA_PEBS;
 	}
 #endif
-	
+	attr.exclude_hv = evt.no_hv;
 	attr.config = evt.evt_code;
 	attr.sample_period = evt.count;
 	attr.inherit = inherit ? 1 : 0;
diff --git a/pe_counting/ocount_counter.cpp b/pe_counting/ocount_counter.cpp
index 2dd6210..1573ed4 100644
--- a/pe_counting/ocount_counter.cpp
+++ b/pe_counting/ocount_counter.cpp
 <at>  <at>  -71,13 +71,12  <at>  <at>  ocount_counter::ocount_counter(operf_event_t & evt,  bool enable_on_exec,
 	attr.config = evt.evt_code;
 #ifdef __s390__
 	attr.type = PERF_TYPE_HARDWARE;
-	attr.exclude_hv = 0;
 	if (evt.no_kernel && !evt.no_user)
 		attr.config |= 32;
 #else
 	attr.type = PERF_TYPE_RAW;
-	attr.exclude_hv = evt.no_hv;
 #endif
+	attr.exclude_hv = evt.no_hv;
 	attr.inherit = inherit ? 1 : 0;
 	attr.enable_on_exec = enable_on_exec ? 1 : 0;
 	attr.disabled  = attr.enable_on_exec;
--

-- 
1.7.1

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
William Cohen | 28 Aug 22:39 2014
Picon

Re: Announcement: Release Candidate 1 for OProfile 1.0.0

On 08/28/2014 04:16 PM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 28, 2014 at 03:54:58PM -0400, William Cohen escreveu:
>> On 08/28/2014 02:59 PM, Will Deacon wrote:
>>> On Thu, Aug 28, 2014 at 07:24:07PM +0100, William Cohen wrote:
>>>> On 08/28/2014 02:07 PM, Will Deacon wrote:
>>>>> Thanks for giving this a whirl. Which was the A9 SoC you used? The PMU
>>>>> interrupts are often not described properly on those, so it could simply be
>>>>> that perf hasn't initialised.
>  
>> Hi Will,
>  
>> Yes, there are cases where dtb doesn't descrbibe the performance
>> monitoring hardware/interrupts properly, but in this case a compulab
>> trimslice with nvidia tegra2 process it appears "perf record ls" seems
>> to work correctly and "perf report" provides sane data.
>  
>> Below is before and after of /proc/interrupts of the "perf record".  I
>> suspect that there is some difference in the way that operf and ocount
>> are trying to set up the events when compared to "perf record" and
>> "perf stat.
>  
>> I should use systemtap to look at the various parameters being passed
>> in to set up perf and determine where "perf" and "operf" diverge.
> 
> You can try using perf evlist to see how 'perf record' sets up the event
> attributes:
> 
> [root <at> zoo ~]# perf record usleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.012 MB perf.data (~525 samples) ]
> [root <at> zoo ~]# perf evlist -v
> cycles: sample_freq=4000, size: 96, sample_type: IP|TID|TIME|PERIOD,
> disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, comm_exec: 1, freq:
> 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
> [root <at> zoo ~]#
> 
> - Arnaldo
> 

oprofile uses raw events to set up the pmu hardware.  Both the cortex a9 and cortex a15 should be using the same
basic setup.  A place setup differs is the processor specific code in the kernel.  Also the kernel kernel on
the a15 is a locally built stock 3.15.10 kernel, while the cortex a9 machine is running a fedora kernel that
may have patches and different config.

I rolled back to the oprofile-0.9.9-2.fc20.armv7hl rpm and operf works.  So it looks like there is some in
the oprofile userspace code regression.  Doing git bisect to see where things broke.

-Will

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
Maynard Johnson | 28 Aug 20:06 2014
Picon

[PATCH] Fix behavior and documentation for '--threshold' option

Fix behavior and documentation for '--threshold' option

A user reported some issues with how the opreport and opannotate
'--threshold' option was working. He was using operf to collect
a profile using mulitiple events (PM_CMPLU_STALL_REJECT_LHS and
PM_MRK_ST_FWD). For a particular symbol in his profile data, he
had 0% for PM_CMPLU_STALL_REJECT_LHS and 12% for PM_MRK_ST_FWD.
But when he ran 'opannotate --assembly -t 1', he was surprised to
see that the function in question was not in the output at all,
even though the ratio of samples for the PM_CMPLU_STALL_REJECT_LHS
was well above the 1% threshold.

The events are stored in alphabetical order in a C++ set. When
applying the threshold level against a symbol, the code was only
looking at the ratio of samples for the first event in the set
(PM_CMPLU_STALL_REJECT_LHS, in this case). This is not the intended
behavior (IMHO), so this patch looks at all ratios for every event
and will only filter out the sample data for a given symbol if none
of the events meets the threshold.

This issue applies to opreport as well, and the same fix works for
both opreport and 'opannotate --assembly'.

On the other hand, 'opannotate --source' applies the threshold to
a given source file (contrary to the man page). The same problem
exists there, where annotation for a given source file was not displayed
if the ratio of samples for the first event in the set did not meet the
specified threshold. This patch fixes that problem as well.

This patch also updates the man pages for opreport and opannotate, as
well as the oprofile user manual, to better explain how the theshold
option works.

Signed-off-by: Maynard Johnson <maynardj <at> us.ibm.com>
---
 doc/opannotate.1.in         |    9 +++++++--
 doc/opreport.1.in           |    3 ++-
 doc/oprofile.xml            |   13 ++++++++++---
 libpp/profile_container.cpp |   23 +++++++++++++++--------
 4 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/doc/opannotate.1.in b/doc/opannotate.1.in
index e0ae1cd..98eda51 100644
--- a/doc/opannotate.1.in
+++ b/doc/opannotate.1.in
 <at>  <at>  -139,8 +139,13  <at>  <at>  for the binaries.
 .br
 .TP
 .BI "--threshold / -t [percentage]"
-Only output data for symbols that have more than the given percentage
-of total samples.
+For annotated assembly, only output data for symbols that have more than the given percentage
+of total samples. For profiles using multiple events, if the threshold is reached
+for any event, then all sample data for the symbol is shown.
+
+For annotated source, only output data for source files that have more than the given percentage
+of total samples. For profiles using multiple events, if the threshold is reached
+for any event, then all sample data for the source file is shown.
 .br
 .TP
 .BI "--verbose / -V [options]"
diff --git a/doc/opreport.1.in b/doc/opreport.1.in
index 39374c4..0627aa9 100644
--- a/doc/opreport.1.in
+++ b/doc/opreport.1.in
 <at>  <at>  -130,7 +130,8  <at>  <at>  This difference is typically very small and can be ignored.
 .TP
 .BI "--threshold / -t [percentage]"
 Only output data for symbols that have more than the given percentage
-of total samples.
+of total samples. For profiles using multiple events, if the threshold is reached
+for any event, then all sample data for the symbol is shown.
 .br
 .TP
 .BI "--verbose / -V [options]"
diff --git a/doc/oprofile.xml b/doc/oprofile.xml
index 435ad36..01cd309 100644
--- a/doc/oprofile.xml
+++ b/doc/oprofile.xml
 <at>  <at>  -1680,7 +1680,8  <at>  <at>  List per-symbol information instead of a binary image summary.
 </para></listitem></varlistentry>
 <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para>
 Only output data for symbols that have more than the given percentage
-of total samples.
+of total samples. For profiles using multiple events, if the threshold is reached
+for any event, then all sample data for the symbol is shown.
 </para></listitem></varlistentry>
 <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para>
 Give verbose debugging output.
 <at>  <at>  -1886,8 +1887,14  <at>  <at>  first. If that directory does not exist, the standard session-dir of
 as the session directory.
 </para></listitem></varlistentry>
 <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para>
-Only output data for symbols that have more than the given percentage
-of total samples.
+For annotated assembly, only output data for symbols that have more than the given percentage
+of total samples. For profiles using multiple events, if the threshold is reached
+for any event, then all sample data for the symbol is shown.
+</para>
+<para>
+For annotated source, only output data for source files that have more than the given percentage
+of total samples. For profiles using multiple events, if the threshold is reached
+for any event, then all sample data for the source file is shown.
 </para></listitem></varlistentry>
 <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para>
 Give verbose debugging output.
diff --git a/libpp/profile_container.cpp b/libpp/profile_container.cpp
index 88de266..dca6fdd 100644
--- a/libpp/profile_container.cpp
+++ b/libpp/profile_container.cpp
 <at>  <at>  -183,13 +183,16  <at>  <at>  profile_container::select_symbols(symbol_choice & choice) const
 		    && (image_names.name(it->image_name) != choice.image_name))
 			continue;

-		double const percent =
-			op_ratio(it->sample.counts[0], total_count[0]);
+		for (size_t j = 0; j < total_count.size(); j++) {
+			double const percent =
+					op_ratio(it->sample.counts[j], total_count[j]);

-		if (percent >= threshold) {
-			result.push_back(&*it);
+			if (percent >= threshold) {
+				result.push_back(&*it);

-			choice.hints = it->output_hint(choice.hints);
+				choice.hints = it->output_hint(choice.hints);
+				break;
+			}
 		}
 	}

 <at>  <at>  -226,9 +229,13  <at>  <at>  profile_container::select_filename(double threshold) const
 		// FIXME: is samples_count() the right interface now ?
 		count_array_t counts = samples_count(*it);

-		double const ratio = op_ratio(counts[0], total_count[0]);
-		filename_by_samples const f(*it, ratio);
-
+		double highest_ratio = 0.0;
+		for (size_t j = 0; j < total_count.size(); j++ ) {
+			double const ratio = op_ratio(counts[j], total_count[j]);
+			if (ratio > highest_ratio)
+				highest_ratio = ratio;
+		}
+		filename_by_samples const f(*it, highest_ratio);
 		file_by_samples.push_back(f);
 	}

--

-- 
1.7.1

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
Maynard Johnson | 26 Aug 18:58 2014
Picon

[PATCH] Fix Java profiling regression bug from Aug 13 Coverity fixes

Fix Java profiling regression bug from Aug 13 Coverity fixes

One of the changes made in the Aug 13 commit to fix issues
identified by Coverity caused a regression in oprofile's JIT
support. The libopagent.so (used by libjvm[t|p]i.so) may incorrectly
return an error from the op_write_native_code function. This patch
fixes that issue.

Signed-off-by: Maynard Johnson <maynardj <at> us.ibm.com>
---
 libopagent/opagent.c |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/libopagent/opagent.c b/libopagent/opagent.c
index f567f9e..431dfae 100644
--- a/libopagent/opagent.c
+++ b/libopagent/opagent.c
 <at>  <at>  -374,17 +374,22  <at>  <at>  again:
 	 */
 	if (fwrite_unlocked(&rec, sizeof(rec), 1, dumpfile) &&
 	    fwrite_unlocked(symbol_name, sz_symb_name, 1, dumpfile)) {
-		size_t sz = 0;
-		if (code)
+		size_t expected_sz, sz;
+		expected_sz = sz = 0;
+		if (code) {
 			sz = fwrite_unlocked(code, size, 1, dumpfile);
-		if (padding_count)
+			expected_sz++;
+		}
+		if (padding_count) {
 			sz += fwrite_unlocked(pad_bytes, padding_count, 1, dumpfile);
+			expected_sz++;
+		}
 		/* Always flush to ensure conversion code to elf will see
 		 * data as soon as possible */
 		fflush_unlocked(dumpfile);
 		funlockfile(dumpfile);
 		flock(dumpfd, LOCK_UN);
-		if (sz != 2) {
+		if (sz != expected_sz) {
 			printf("opagent: fwrite_unlocked failed");
 			return -1;
 		}
--

-- 
1.7.1

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
C K Kashyap | 26 Aug 17:32 2014
Picon

Help with oprofile on android

Hi,

I have a asus nexus 7 where I need to profile my c++ app. Is there a place where I can see a step by step instruction for enabling/collecting oprofile data on my app. I found this link http://brownydev.blogspot.in/2012/06/android-oprofile.html from the mailing list but it seems its old - I dont see any "CONFIG_OPROFILE_ARMV7" in the kernel source - I am using - https://android.googlesource.com/kernel/msm.git

I'd appreciate any help with this very much. For some reason, googling it does not seem to yield any current links to me. 

Regards,
Kashyap
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
oprofile-list mailing list
oprofile-list <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list

Gmane