PMELib 1.0

Introduction

    This library is an interface to the Performance Monitoring Events (PME) that are available in the Pentium P5, P6 and P4 processors.  There are 18 counters that let you gather information about what the processor is going during execution.  They are described in these manuals:

IA-32 Intel Architecture Software Developer's Manual Volume 1: Basic Architecture

IA-32 Intel Architecture Software Developer's Manual Volume 2A: Instruction Set Reference, A-M

IA-32 Intel Architecture Software Developer's Manual Volume 2B: Instruction Set Reference, N-Z

IA-32 Intel Architecture Software Developer's Manual Volume 3: System Programming Guide

        See chapter 15, Appendix A and Appendix B

Pentium 4 documentation is available here

This library  is an extension to the utilities from the Game Developer's Magazine article by Robert Wyatt in May 1998.  The library is now in a class and included the Pentium 4 processor.  Only Intel is currently supported and tested.  

 

Send feedback to doug@nvidia.com

 

 

Installation

    Window NT and Windows XP are supported and tested.  Win98 may work, however.

    

    You need to install a driver, set some registry settings and reboot.  If you installed the GDPerf.sys from the GD magazine article, you can skip the installation step. It uses the same driver.  Let me know if you have problems.

        In the Installation directory

            copy GDPerf.sys to the window driver directory

                        copy GDPerf.sys C:\windows\system32\drivers

                    there is a batch file as an example

            Run the PMELib.reg file to set the registry settings

            Reboot

 

Configuring PMELib 

For P5 and P6 processors (anything before Pentium 4), You use the same interfaces that are described in the Game Developer article.  They have just been incorporated in to the PMELib as is.

In the Pentium 4, are 18 performance monitoring counters and more than 40 Events Modes that can be captured.  Each Mode has a bit mask that indicates which tests to perform.  These are described in Appendix A of IA-32 Intel Architecture Software Developer's Manual Volume 3: System Programming Guide   Each of these event mode has a class dedicated to it.  The event modes are listed below.

Set PerfTest2 for an example.

Step 1) Choose an Event Mode class

    Example:

           Event_branch_retired event;

 

Step 2)  Set the Event Mask for the selected Event Mode

    Example:

        event.eventMask->MMNP = 1;

        event.eventMask->MMTP = 1;

 

Step 3)  Set the privilege level to capture data from with the SetCaptureMode method

OS_Only, // ring 0, driver  level only

USR_Only, // app level, privilege levels  1 2 and 3

OS_and_USR, // all levels 0, 1, 2 and 3

        Optionally, you can enable tagging in the SetCaptureMode method.

        Example:

                SetCaptureMode(OS_and_USR, TagEnable, 34);

 

Step 4) Optional Configuration

        At this point you can configure Tagging, Filtering, Overflow and Cascading options.  You can also select one of the legal counters for the selected Event Mode.

Step 5) Set the process priority to high

This reduces the noise from other processes interfering.  If you have an infinite loop in you code and you have these set, you may hang and need to reboot

Example:

    PME * pme = PME::Instance();

    pme->SetProcessPriority(ProcessPriorityHigh);

 

Step 6) Start using the counters

        Each Event Mode counter has the follow ability:

        Stop

        Start

        Clear - set to 0

        Read

        Write - write a 64 bit counter value

        

Step 7) Set the process priority to normal

        pme->SetProcessPriority(ProcessPriorityNormal);

 

Event Modes:

Event_TC_deliver_mode

Event_BPU_fetch_request

Event_ITLB_reference

Event_memory_cancel

Event_memory_complete

Event_load_port_replay

Event_store_port_replay

Event_MOB_load_replay

Event_page_walk_type

Event_BSQ_cache_reference

Event_IOQ_allocation

Event_IOQ_active_entries

Event_FSB_data_activity

Event_BSQ_allocation

Event_BSQ_active_entries

Event_SSE_input_assist

Event_packed_SP_uop

Event_packed_DP_uop

Event_scalar_SP_uop

Event_scalar_DP_uop

Event_64bit_MMX_uop

Event_128bit_MMX_uop

Event_x87_FP_uop

Event_x87_SIMD_moves_uop

Event_TC_misc

Event_global_power_events

Event_tc_ms_xfer

Event_uop_queue_writes

Event_retired_mispred_branch_type

Event_retired_branch_type

Event_resource_stall

Event_WC_Buffer

Event_b2b_cycles

Event_bnr

Event_snoop

Event_response

Event_front_end_event

Event_execution_event

Event_replay_event

Event_instr_retired

Event_uops_retired

Event_uop_type

Event_branch_retired

Event_mispred_branch_retired

Event_x87_assist

Event_machine_clear

Credits

http://www.gamasutra.com/features/wyatts_world/19990528/pentium3_08.htm

Used some tables from Mikael Pettersson's  pertctf

Used the detect code from Kamen Yotov's  ia32lib library