The Water Demo Performance -- Part 2 -- Tweak the Python Code
From PyWiki
Step 1 -- rewrite the calculateNormals routine
Lets spend time making the Python Code as fast as possible so we'll break down the first 'loop'
for i in range(self.numFaces): p0 = vinds[3*i] p1 = vinds[3*i+1] p2 = vinds[3*i+2] v0= ogre.Vector3 (self.vertexBuffers[buf][3*p0], self.vertexBuffers[buf][3*p0+1], self.vertexBuffers[buf][3*p0+2]) v1 = ogre.Vector3 (self.vertexBuffers[buf][3*p1], self.vertexBuffers[buf][3*p1+1], self.vertexBuffers[buf][3*p1+2]) v2 = ogre.Vector3 (self.vertexBuffers[buf][3*p2], self.vertexBuffers[buf][3*p2+1], self.vertexBuffers[buf][3*p2+2]) diff1 = v2 - v1 diff2 = v0 - v1 fn = diff1.crossProduct(diff2) self.vNormals[p0] += fn self.vNormals[p1] += fn self.vNormals[p2] += fn
We know that self.numFaces = 2 * complexity * complexity which with complexity being set to 64 (by default) we are going through the loop 8192 times per frame -- a good place to optimise!
The first issue is likely to be the fact that we are creating (and deleting) 6 Vector3 objects in each pass (v0,v1,v2,diff1,diff2 and fn) with each one being a call into the Ogre library (via boost etc) -- we are also using vNormals as an Ogre Vector3 array.
Hence lets make a change and instead of using Vector3 lets change to simple Python Arrays (after all a Vector3 is simply a way to hang onto 3 floats) -- makes the code a little more complex but should make for a performance improvement. First lets change the way we use vNormals -- change from this:
## allocate space for normal calculation self.vNormals=[] for x in range ( self.numVertices ): self.vNormals.append(ogre.Vector3().ZERO )
to this:
## allocate space for normal calculation self.vNormals=array.array('f') for x in range ( self.numVertices * 3 ): self.vNormals.append(0)
and now we change the complete calculateNormals function to use pure Python 'objects'
def calculateNormals(self): ## zero normals for i in range(self.numVertices*3) : self.vNormals[i]= 0 ## first, calculate normals for faces, add them to proper vertices # use helper function vinds = buffer ( self.indexBuffer) vinds.lock (0, self.indexBuffer.getSizeInBytes(), ogre.HardwareBuffer.HBL_READ_ONLY) pNormals = self.normVertexBuffer.lock( 0, self.normVertexBuffer.getSizeInBytes(), ogre.HardwareBuffer.HBL_DISCARD) pNormalsAddress=(ctypes.c_float * (self.normVertexBuffer.getSizeInBytes()*3)).from_address(ogre.castAsInt(pNormals)) # make life easier (and faster) by using a local variables buf = self.vertexBuffers[self.currentBufNumber] vNormals = self.vNormals ## AJM so here's a case where accessing a C++ object from python shows a performance hit !! for count in range(self.numFaces) : p0 = vinds[3*count] p1 = vinds[3*count+1] p2 = vinds[3*count+2] # this is slow # v0= ogre.Vector3 (self.vertexBuffers[buf][3*p0], self.vertexBuffers[buf][3*p0+1], self.vertexBuffers[buf][3*p0+2]) # v1 = ogre.Vector3 (self.vertexBuffers[buf][3*p1], self.vertexBuffers[buf][3*p1+1], self.vertexBuffers[buf][3*p1+2]) # v2 = ogre.Vector3 (self.vertexBuffers[buf][3*p2], self.vertexBuffers[buf][3*p2+1], self.vertexBuffers[buf][3*p2+2]) # so use python arrays instead of Vector3's i0 = 3*p0 i1 = 3*p1 i2 = 3*p2 v0 = [buf[i0], buf[i0+1], buf[i0+2]] v1 = [buf[i1], buf[i1+1], buf[i1+2]] v2 = [buf[i2], buf[i2+1], buf[i2+2]] # Do the vector subtraction by 'hand' instead of original # diff2 = v0 - v1 diff1 = [v2[0]-v1[0],v2[1]-v2[1],v2[2]-v2[2]] diff2 = [v0[0]-v1[0],v0[1]-v2[1],v0[2]-v2[2]] # and now we need to do a crossProduct by hand.. # fn = ogre.Vector3(*diff1).crossProduct(ogre.Vector3(*diff2)) fn = [diff1[1] * diff2[2] - diff1[2] * diff2[1], diff1[2] * diff2[0] - diff1[0] * diff2[2], diff1[0] * diff2[1] - diff1[1] * diff2[0]] # And of course now add the values into the normals # self.vNormals[p0] += fn # self.vNormals[p1] += fn # self.vNormals[p2] += fn vNormals[i0] += fn[0] vNormals[i0+1] += fn[1] vNormals[i0+2] += fn[2] vNormals[i1] += fn[0] vNormals[i1+1] += fn[1] vNormals[i1+2] += fn[2] vNormals[i2] += fn[0] vNormals[i2+1] += fn[1] vNormals[i2+2] += fn[2] ## now normalize vertex normals complexity = self.complexity for y in range(complexity) : for x in range(complexity) : numPoint = y*(complexity+1) + x v = 3*numPoint n = ogre.Vector3(vNormals[v],vNormals[v+1],vNormals[v+2]) n.normalise() v = 3*numPoint pNormalsAddress [v] = n.x pNormalsAddress [v+1] = n.y pNormalsAddress [v+2] = n.z self.indexBuffer.unlock() self.normVertexBuffer.unlock()
This takes the frame rate up to 13 FPS (complexity == 64 and psyco enabled) -- a big improvement over the previous <3 FPS but still way short of the C++ version. And making complexity == 32 then we get nearly 50 FPS
However we need to take this another step further
