Reducing Vertex Bandwidth Overhead
After weeks of optimizing our voxel game engine, we still weren’t getting the kind of frame-rate we needed, so the decision was made to finally switch to using VBO‘s (Vertex Buffer Objects) to pass data to the GPU instead of passing vertex arrays every frame. This predictably improved performance a great deal – gaining an extra 16FPS!
For the non-technically minded; many performance issues are caused by passing a lot of data from the CPU to the GPU, so by storing that data once on the GPU via a VBO, we reduce the amount of bandwidth used, and therefore rendering instructions are much faster. If all the instructions needed to draw a single frame take more than 16 milliseconds, the frame-rate drops below 60FPS. Which is why VBO’s are the correct way to render complex scenes with many thousands of vertices.
So that’s done, and we’re back to getting great performance, but there’s more we can do..
Another impact on performance is fill-rate, or the number of pixels that need to be processed, which is dependent on screen resolution, over-draw, and shader complexity. On mobile, there’s no straightforward way to natively change the screen resolution – on each device, you get what you’re given, and as the definition of mobile screens gets higher, so too do the demands on your graphics engine. So what’s the answer?
Well, we can do what many modern console games do, which is to render the scene to an off-screen texture, which is then upscaled and overlaid on the screen; allowing us to draw the scene on a much smaller viewport, so the GPU has less work to do.
If you’re interested in the technical details, Intel has published a very easy to understand article on Dynamic Resolution Rendering for GLES2.0, which uses the same techniques as our implementation.
Since we’re now rendering to a full-screen texture using the technique above, there’s the added bonus that we can also apply our DUDV (water ripples) shader to the screen for a nice underwater ripple effect, and it’s relatively cheap to render, with no noticable impact on performance. This effect would not be possible if we were simply rendering to the backbuffer as before.
Chunk Meshing & Culling (Errors)
Finally, after optimizing work done by the GPU, it’s time to revisit our CPU-side code and see if we can tighten up the way we generate chunk meshes and cull hidden surfaces; which we are already doing, but there is room for improvement!
Our hidden-surface removal is working pretty well, but after a quick glance in wireframe mode, we can see that some chunks are loading with errors, causing them to render more triangles than necessary (outer surfaces which should be occluded by neighboring chunks are not hidden) – but only sometimes – so why is this happening?
This weird behavior is caused by the way we’re streaming chunks, loading them in-and-out of memory as they’re needed. If a single chunk is loaded in and built before it’s neighboring chunks are even loaded, its sides are not properly culled because its neighbors don’t yet exist!
We can fix this by ensuring a chunks neighbors are loaded in (but not necessarily built) before we build the mesh. This way, the occlusion algorithm has actual data to work with on all sides instead of poking around in empty spaces.
The alternative solution would be to re-build meshes as more data is streamed in, but this would be very processor-intensive and wasteful, so we just have to be smarter about how we stream chunk data in the first place.
In the next post, we’ll talk more about gameplay features, but for now, let us know in the comments or via our contact page if you have any questions about this project! You can also stay up-to-date and get early access to info, screenshots and videos by subscribing to our newsletter.