Hierarchical Profile Module

From PyWiki

Jump to: navigation, search

While nearing the final stages of my Python-Ogre based game (3d snake game on spheres), i decided it was time to do some profiling of my code. Not only because the game just felt too slow with regards to my system, but also to get some more knowledge about what i had written down over the past two years.

After trying the built-in Python modules 'profile' and 'cprofile', i decided they wouldn't help me much. IMO they show lots of information about internal Python stuff, but thereby clutter the view on the things i wanted too see, which is my own game flow.

Then i used this profiler for a few days: http://code.activestate.com/recipes/496938/

However because i wanted to be able to run profiles nested, and to be able to measure average times, i decided to brew my own... Of course, this method is intrusive; you have to modify your code, which is a bit of a downside.

Contents

Assumption

For my own game i use the sample framework, thus eventually "..renderer\OGRE\sf_OIS.py". I have put my profile module at the same level to be able to see how my game code ('mainloop') compares to the full Ogre cycles. But more on that later.

Before you start

Store the Profile Module

I'd recommend to save the module's sources (see bottom of page) into "..Lib\site-packages\ogre\renderer\OGRE\Profiler.py", as stated earlier. However you are free to place it anywhere else, for example if you don't use the sample framework. The module declares an instance called "GProfiler". That's the one we need.

Use Psyco

Profiling a Python application without Psyco installed, is like profiling a C++ debug build; it's not so relevant. Averagely, the percentages in the results will probably match, but there's always the risk of spending time on the wrong (read: not really slow) pieces of code.

Usage

Assuming you have stored the file in the aforementioned location, use this line to access the profiler:

from ogre.renderer.OGRE.Profiler import GProfiler

Now, there's only three methods which you need: 'start(name)', 'stop(name)' and 'dump'. Start and stop should match, but you may nest your calls. It doesn't matter where you put your calls, here's some quick samples:

# Option 1: surround a function call
GProfiler.start("Somefunction")
SomeFunction()
GProfiler.stop("Somefunction")
 
# Option 2: time inside a function
def SomeFunction(self):
  GProfiler.start("InsideSomeFunction")
  # Some logic here...
  GProfiler.stop("InsideSomeFunction")
 
# Option 3: time some statement
GProfiler.start("Pow")
result = math.pow(2, 64)
GProfiler.stop("Pow")
 
# Print results
GProfiler.dump()

Disable

To quickly disable the profiler, call GProfiler.setEnabled(False). However since this still invokes all calls to 'start' and 'stop', you'd best comment or remove all these calls before shipping your game.

Samples

Now show some samples with results printed. Let's start simple:

GProfiler.start("UpdateHero")
# Game logic here...
GProfiler.stop("UpdateHero")
GProfiler.start("UpdateGUI")
# Game logic here...
GProfiler.stop("UpdateGUI")
GProfiler.dump()

This will output the following:

Results:
-UpdateHero (calls: 1, tm: 0.000000s, avg: 0.000000s, 50.00%)
-UpdateGUI (calls: 1, tm: 0.000000s, avg: 0.000000s, 50.00%)

It shows that each 'slot' has 1 call, and zero time. When no time could be measured (code too fast) then the percentage will be calculated as 100% / slots. Also, both slots are at the same level. (Same indentation.)

Now a more real sample, for which i modified the Demo_Grass.py sample application. In order to get percentages relative to the entire Ogre cycle, i have wrapped the "startRendering" call in sf_OIS.py as follows:

# ...
GProfiler.start("startRendering")
self.root.startRendering()
GProfiler.stop("startRendering")
# ...

Followed by some start/stop calls inside the demo itself: LightWibbler::setValue, GrassListener::waveGrass and GrassListener::frameStarted. Running the sample for a while results in the following output:

Results:
-startRendering (calls: 1, tm: 25.477000s, avg: 25.477000s, 100.00%)
--frameStarted (calls: 580, tm: 2.674000s, avg: 0.004610s, 10.50%)
---waveGrass (calls: 580, tm: 2.583000s, avg: 0.004453s, 96.60%)
--UpdateLightWibbler (calls: 580, tm: 0.047000s, avg: 0.000081s, 0.18%)

Note: the 'dump' call was placed at the end of the application:

application = Grass_Application()
application.go()
GProfiler.dump() # <- output

It shows that 10.5% of the time is spent in our framelistener (frameStarted), and 0.18% on our light. Thus Ogre had almost 90% to spend on rendering frames. This remainder value should be kept to a maximum in order to get highest frame rates. Of this 10.5%, 96.6% is spent on the waveGrass method. And so on...

More on results

Of course, since the profiler methods need to be called, and some administration needs to be done, all the time results include some overhead. However, since this overhead is always the same (at least for everything which you profile) the percentages will remain valid. Also, once you have found and fixed your bottle necks, you can remove the profiler lines again.

As mentioned below in TODO, the results are not sorted yet. However the most important ones are often those with highest percentages or high average call time. But this can heavily depend on the type of application which you are writing.

TODO

  • The results are just printed as they got collected. There is no sorting (yet). Maybe i will add certain options later.
  • Results are now printed to stdout. Add some nice file-writing later.

Full sources

Now, what you've all been waiting for :)

"""
Provides hierarchical profiling of you Python(-Ogre) code. Free to use.
Date:       18-08-2008
Author:     scriptkid
Contact:    pkooman at gmail.com
"""
 
import time
 
TRACING = False
 
class TimedSlot:
    """ A possibly recursive structure to summarize ellapsed times between start/stop cycles for a names slot."""
    def __init__(self, aName, aParent):
        self._name = aName
	# When we are stopped, measure at the parent again
        self._parent = aParent
        self._calls = 0
        self._totalTime = 0
        self._lastStartTime = 0
        self._started = False
        self._children = [] # Possibly recurse
 
    def start(self):
        """ Marks the beginning of (more) timing on this slot."""
        if self._started:
            print "'%s' already started; denied." %self._name
        else:
            #print "Started '%s'" %self._name
            self._started = True	    
            self._calls += 1
	    # Start now
            self._lastStartTime = time.time()
 
    def stop(self):
        """ Marks the end of (more) timing on this slot."""
        if not self._started:
            print "'%s' was not started; denied." %self._name
        else:
            self._started = False
	    # Increase the total time of this slot. This includes the ellapsed time of children.
	    delta = time.time() - self._lastStartTime
            if TRACING:
                print "Add %f to '%s'" %(delta, self.getNamePath())
            self._totalTime += delta
            #print "Stopped at: %f" %self._totalTime
            return self._parent
 
    def startChild(self, aName):
        """ Marks the beginning of additional timing of a (new) child slot.
	This happens when you call start within an earlier start. """
        child = self._findChildByName(aName)
        if child:
            #print "restart child '%s'" %child.getName()
            child.start()
        else:
            child = TimedSlot(aName, self) # We are its parent
	    if TRACING:
		print "Add profile slot for '%s'" %child.getName()
            child.start()
            self._children.append(child)
        return child
 
    def _findChildByName(self, aName):
        """ Looks up the given child by name."""
        for child in self._children:
            if child.getName() == aName:
                return child
        return None
 
    # Member access
    def getName(self):
        """ What it sais """
        return self._name
 
    def getNamePath(self):
        """ For tracing purposes."""
        name = self._name
        parent = self._parent
        while parent:
            name = parent.getName() + "->" + name
            parent = parent.getParent()
        return name
 
    def isStarted(self):
        """ Returns whether this slot is currently being monitored."""
        return self._started
 
    def getCalls(self):
        """ Returns the number of 'starts' on this slot."""
        return self._calls;
 
    def getTotalTime(self):
        """ Returns the total time of all start/stop cycles on this slot."""
        return self._totalTime
 
    def getAverageTime(self):
        """ What it sais """
        return self._totalTime / max(self._calls, 1)
 
    def getPercentage(self):
        """ Our time / parent time, which is considered 100%"""
        if not self._parent:
            # No parent
            return 100 # Root
        elif self._parent.getTotalTime() == 0:
            return 100 / len(self._parent.getChildren())
        else:
            return (self._totalTime / self._parent.getTotalTime()) * 100
 
    def getParent(self):
        """ What it sais """
        return self._parent
 
    def getChildren(self):
        """ What it sais """
        return self._children    
 
class Profiler:
    """ A utility class for profiling execution time for code. Multile start/stops can be used
    to calculate an average and a total."""
 
    def __init__(self):
        # Create a root, easier for the administration
        self._root = None
        self._currentSlot = None
	self._enabled = True
 
    def setEnabled(self, aValue):
	""" This is a quick way to disable the profiling. When your game is done, for max. performance
	you'd better comment or remove all calls to start/stop."""
	self._enabled = aValue
 
    def start(self, aName):
	if not self._enabled:
	    return
        if not self._currentSlot:
            # Very first call, create helper root item and start it
            self._root = TimedSlot("Root", None) # No parent
	    # The first start/stop will count as 100% of the total time.
            self._root.start()
            self._currentSlot = self._root.startChild(aName)
        elif self._currentSlot.isStarted():
            if self._currentSlot.getName() == aName:
                print "Slot '%s' already started; denied." %(aName)
            else:
                # Start nested child
                self._currentSlot = self._currentSlot.startChild(aName)
        else:
            # Restart a stopped slot
            self._currentSlot.start()
 
    def stop(self, aName):
        """ Stops the current time slot, and adds the difference with the most recent 'start' for this slot."""
	if not self._enabled:
	    return
        if self._currentSlot.getName() <> aName:
            print "Start/stop don't match (got '%s', expected '%s')" %(aName, self._currentSlot.getName())
        else:
            # Stop and retrieve parent
            self._currentSlot = self._currentSlot.stop()
 
    def dump(self):
	if not self._enabled:
	    return
        # Stop all
        while self._currentSlot:
            self._currentSlot = self._currentSlot.stop()
        print "Results:"
        self._printOutput(self._root, 0)
 
    def _printOutput(self, aSlot, anIndent):
        """ Recursively print some stats."""
        if aSlot:
            if aSlot.getName() <> "Root":
                line = "%s%s (calls: %d, tm: %fs, avg: %fs, %3.2f%%)" %("-" * anIndent, aSlot.getName(), aSlot.getCalls(), aSlot.getTotalTime(), aSlot.getAverageTime(), aSlot.getPercentage())
                print line
            # Recurse
            for child in aSlot.getChildren():
                self._printOutput(child, anIndent+1)
 
    def reset(self):
        """ Prepare for a next run """
	if not self._enabled:
	    return
        self._root = None
        self._currentSlot = None
 
# Global profiler
GProfiler = Profiler()
Personal tools