He who conquers the left side

musings from someone that likes to use the word musings.

Wednesday, June 17, 2009

How to use oprofile correctly in Fedora

First of all, there is no correct way to use oprofile. It's just that people using it w/o --separate=kernel are out of their minds. Very rarely do I want to profile the system as a whole, and frequently I want to look at an individual process. With --separate=kernel you can do both.

Installing kernel-debug info

First make sure your debug-info repo is enabled. You will find that in the /etc/yum.repos.d directory. Then,


$ yum -y install kernel-debuginfo
$ rpm -ql kernel-debuginfo |grep vmlinux
/usr/lib/debug/lib/modules/2.6.18-128.1.14.el5/vmlinux


Starting the profiler using opcontrol

opcontrol --separate=kernel --vmlinux=/usr/lib/debug/lib/modules/2.6.18-128.1.14.el5/vmlinux
opcontrol --start

Obviously you must have the correct location of your vmlinux. Fedora and Red Hat based distros like to keep them in /usr/lib/debug/lib/modules/.

Do stuff

In my case, I'd like to profile the command:

# time dd if=/dev/zero of=/dev/raw/raw1 bs=2048 count=100000
100000+0 records in
100000+0 records out
204800000 bytes (205 MB) copied, 10.6901 seconds, 19.2 MB/s

real 0m10.692s
user 0m0.036s
sys 0m4.884s

Finish profiling

Now you should tell the profiler to finish:

opcontrol --stop
oparchive -o /var/lib/oprofile/samples/current/

oparchive tars all the necessary executables and debug info files such that it can easily be taken offline and analyzed on another machine.

Display results

In my case,

# opreport -l image:/bin/dd
warning: /e1000e could not be found.
warning: /ehci_hcd could not be found.
warning: /ide_cd could not be found.
warning: /libata could not be found.
warning: /scsi_mod could not be found.
warning: /sd_mod could not be found.
warning: /uhci_hcd could not be found.
CPU: Core 2, speed 2992.57 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples % image name symbol name
90680 61.5134 vmlinux do_generic_mapping_read
18997 12.8867 vmlinux iowrite16
12418 8.4238 vmlinux pci_iomap
2983 2.0235 libata (no symbols)
1633 1.1078 vmlinux gen_pool_add
1442 0.9782 vmlinux __pud_alloc
1218 0.8262 scsi_mod (no symbols)
974 0.6607 vmlinux __blockdev_direct_IO
727 0.4932 vmlinux .text.do_gettimeoffset_tsc
547 0.3711 vmlinux schedule
423 0.2869 vmlinux __link_path_walk
343 0.2327 vmlinux tcp_rcv_established
335 0.2272 vmlinux do_gettimeofday
320 0.2171 libc-2.5.so __read_nocancel

This list of symbols and events goes on for miles, so I'm not going to list the entire thing.

Each processor has their own flags to track things like L1/L2 cache misses, TLB misses, and other neat stuff. For extra credit, put some of your favorites in the comments section. Make sure to put your cpu model in with whatever flags you're including. I may update this with some more flags, or post about it later.

For more info, go to http://oprofile.sourceforge.net/doc/index.html