When using --pid, I have occasionally seen operf does not end by hitting ^C once. By hitting ^C multiple times, operf ends with error messages:
operf --events=CPU_CLK_UNHALTED:100000000:0:1:1 --pid `pgrep -f CassandraDaemon`
Kernel profiling is not possible with current system config.
Set /proc/sys/kernel/kptr_restrict to 0 to collect kernel samples.
operf: Press Ctl-c or 'kill -SIGINT 18042' to stop profiling
operf: Profiler started
^C^Cwaitpid for operf-record process failed: Interrupted system call
^Cwaitpid for operf-read process failed: Interrupted system call
Error running profiler
I am using the master branch in the git repository.
Here is what I found:
The operf-read process was waiting for a read of a sample ID from the operf-record process to return:
#0 0x00007fd90e0fa480 in __read_nocancel ()
#1 0x0000000000412999 in read (__nbytes=8, __buf=0x7ffddc82b620,
__fd=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
#2 __handle_fork_event (event=0x98e860) at operf_utils.cpp:125
#3 OP_perf_utils::op_write_event (event=event <at> entry=0x98e860,
sample_type=<optimized out>) at operf_utils.cpp:834
#4 0x0000000000417250 in operf_read::convertPerfData (
this=this <at> entry=0x648000 <operfRead>) at operf_counter.cpp:1147
#5 0x000000000040a4cb in convert_sample_data () at operf.cpp:947
#6 0x0000000000407482 in _run () at operf.cpp:625
#7 main (argc=4, argv=0x7ffddc82be48) at operf.cpp:1539
The operf-record process was waiting for a write of sample data to the operf-read process to complete. Why did the write of the sample data block? My guess is that the sample_data_pipe was full:
#0 0x00007fbe0dc9f4e0 in __write_nocancel ()
#1 0x000000000040cd0e in OP_perf_utils::op_write_output (output=6,
buf=0x7fbe0e4e5140, size=32) at operf_utils.cpp:989
#2 0x000000000040d605 in OP_perf_utils::op_get_kernel_event_data (
md=0xd3c7f0, pr=pr <at> entry=0xd07900) at operf_utils.cpp:1443
#3 0x000000000041bc12 in operf_record::recordPerfData (this=0xd07900)
#4 0x00000000004098b8 in start_profiling () at operf.cpp:402
#5 0x0000000000406305 in _run () at operf.cpp:596
#6 main (argc=4, argv=0x7ffdc6fcde58) at operf.cpp:1539
As a result, when I hit ^C, the operf main process sent SIGUSR1 to the operf-record process, in which the write returned with EINTR and simply got retried. Since the operf-record process did not end, the operf main process waited at waitpid(2) forever.
Do you think my guess makes sense? What would be a fundamental solution? Simply extending the pipe size would not be appropriate....