The CERN openlab Platform Competence Centre just published a report on the overhead of profiling using PMU hardware counters. Andrzej Nowak and Georgios Bitzes, funded by Intel, detail in this report how run-time profiling of executable binaries can offer valuable insight into the performance characteristics and behaviour of a program. Some methods, such as instrumentation, are invasive and involve modifications of the profiled binary. This can significantly impact performance, to the point that an instrumented binary runs many times slower than the original. The Performance Monitoring Unit found in many modern processors offers the possibility of low-overhead profiling through a plethora of performance events. In this report, they investigate and quantify this overhead for a variety of tests and configurations, using the “perf” tool of the Linux kernel. Results for four main usage modes of the PMU are included: counting, sampling, PEBS events, and Last Branch Record (LBR).