All the events and metrics are split between several groups which are described in details sub-sections below. In addition most of the groups are further split into 2 variants:
-
RETIRED - for events counted at retirement. For example, CACHE_RETIRED.L2.LOAD.MISS event will count L2 misses caused by retired load instructions
-
SPEC - for speculative events. For example, CACHE_SPEC.L2.LOAD.MISS event will count L2 misses caused by load instructions regardless of whether they were retired or not.
In general RETIRED events look more useful for performance analysis. In addition in the future it may be possible to provide more context for them - e.g. precise sample IP. But on the other hand they it may be significantly more expensive to implement. It is up to implementations to decide if they want to provide RETIRED, SPEC or both variants of a group.
This group contains general events not specific to a particular part of CPU pipeline.
This group contains events measured at retirement which didn’t fall into any specific group below.
This group contains speculative events which didn’t fall into any specific group below.
Retirement control flow group which contains events and metrics for measuring things branch mispredictions, data mis-speculations etc.
Speculation prediction group. Unlike most of the groups below prediction events mostly naturally counted only at retirement time. So this group contains only a few events which make sense to count speculatively.
This group contains events and metrics for data and instruction caches (all levels) counted at retirement.
This group contains events and metrics for data and instruction caches (all levels) counted speculatively.
This group contains events and metrics for data and instruction TLB caches (all levels) counted at retirement.
This group contains events and metrics for data and instruction TLB caches (all levels) counted speculatively.
This group contains events and metrics related for Topdown Microarchitecture Analysis (TMA) methodology.
The TMA methodology categorizes CPU execution time at a high level first. This step flags (reports high fraction value) some domain(s) for possible investigation. Next, the user can drill down into those flagged domains, and can safely ignore all non-flagged domains. The process is repeated in a hierarchical manner until a specific performance issue is determined or at least a small subset of candidate issues is identified for potential investigation.
Given the highly sophisticated microarchitecture, the first interesting question is how and where to do the first level breakdown? TMA chooses the issue point as it is the natural border that splits the frontend and backend portions of machine.
At issue point it classifies each pipeline-slot into one of four base categories: Frontend Bound, Backend Bound, Bad Speculation and Retiring. If a uop is issued in a given cycle, it would eventually either get retired or cancelled. Thus it can be attributed to either Retiring or Bad Speculation respectively. Otherwise it can be split into whether there was a backend-stall or not. A backend-stall is a backpressure mechanism the Backend asserts upon resource unavailability (e.g. lack of load buffer entries). In such a case TMA attributes the stall to the Backend, since even if the Frontend was ready with more uops it would not be able to pass them down the pipeline. If there was no backend-stall, it means the Frontend should have delivered some uops while the Backend was ready to accept them; hence it is tagged with Frontend Bound.
This group contains events and metrics related to vectorized operations counted at retirement.
This group contains events and metrics related to vectorized operations counted speculatively.