This week's NXT Dev Blog is a deep-dive into how our brave graphics architects tackled the challenge of making RuneScape's 15 years of content look great and perform well on an unprecedented range of hardware, using a unique range of rendering techniques, both old and new.
If you're an aspiring graphics developer or tech enthusiast, this'll be of special interest – read on!
Until now, previous dev blogs have only really scratched the surface in terms of describing some of the exciting features that make the new RuneScape client so awesome. This blog will take a more detailed look at some of those features, how they're implemented, including a discussion of the rationale behind why such features were adopted.
One of the biggest challenges during NXT's development has been improving visual fidelity and performance while ensuring the game still looks like the RuneScape you all love. This is how we've achieved it.
Global illumination (GI) is how games and movies model the indirect lighting in a scene (i.e. bounced lighting). Without any global illumination, every pixel in shadow would be black.
No Global Illumination
This is extremely difficult to solve at interactive rates, and even today the majority of games use an offline pre-processing algorithm for baking GI results into texture maps (aka light maps), so they can quickly be looked up in run time.
The classic technique for previous-gen games is pre-baked radiosity (e.g. Quake and Half-Life series), but more recently games have started to bake more detailed GI data into their light maps like spherical harmonics and surface-to-surface visibility, giving the added benefit of GI working with moving lights normal maps.
At the bleeding edge, pure real-time solutions have started to emerge (light propagation volumes and voxel cone tracing), but such techniques are still maturing, and also require serious, cutting-edge GPU technology to work effectively.
This is something that we really wanted to improve on for the new client, but due to tool limitations and the sheer size of RuneScape, an offline GI solution was not a viable option. On top of that, we wanted something that would be supported on all of our target hardware.
So we reached into our graphics tool box and opted for a modern version of hemisphere lighting, using irradiance environment maps by way of spherical harmonics.
Hemisphere lighting uses a manually placed sphere in each environment to define a gradient of sky to ground colour. The surface normals of the scene geometry are then used to pick a colour from this sphere. For example, if a surface is pointing up facing the sky, the colour at the top of the sphere will be chosen, and vice versa.
Having artists create all these light spheres would have been very time consuming, and an added source of maintenance cost. Therefore, we opted for a real-time, programmatic solution.
This involves rendering a light probe (global environment map) high in the sky over several frames after each map square is loaded, baking the spherical harmonics from that in real time, giving us a highly compressed irradiance environment map in the form of spherical harmonic coefficients.
These coefficients are then used in the pixel shader with some clever maths, taking the surface normal as input to give us the irradiance at that pixel. What this ultimately produces is a single indirect light bounce from the sunlight (irradiance lighting) - or as we like to call it, hemisphere lighting on steroids!
On top of the irradiance lighting, we've also added ambient occlusion into our lighting blender. Ambient occlusion simulates soft, small-scale shadows from the environment lighting based on how visible a surface is. For this we've chosen a form of screen-space ambient occlusion called horizon-based ambient occlusion, which is pretty much the best out there right now.
However, unlike most games that still apply SSAO as a post process - which can give pretty poor and unconvincing results - we apply our ambient occlusion during the forward lighting pass only to indirect ambient lighting. Therefore pixels that are directly lit will not exhibit too much ambient occlusion, which is more physically correct. Our SSAO pass is also performed at full resolution, giving more stable results .
The following screenshots show both our irradiance and ambient occlusion solutions combined. We hope you agree that the results are a significant improvement over what the old Java client currently offers.
Java Style Global Illumination
New NXT Global Illumination
Java Style Global Illumination
New NXT Global Illumination
HDR, Gamma Correct Rendering and Tone Mapping
Another reason why the Java client looks so flat, dull and oversaturated is that it can't display the full range of colour and light intensities in the scene. For this situation to improve, we first need to ensure that the GPU is carrying out all its lighting calculation in linear space.
All GPUs perform their calculations using high-precision, floating-point mathematics, but in order to get the full benefit of that, the inputs going into the lighting equations during shader execution must also be in linear space. Therefore we needed to ensure that the textures for the game - which are stored in sRGB space when saving from Photoshop - were converted to linear space before being used by the shaders.
The same process is also applied to other artist-defined inputs, such as light and fog colours. Most GPUs can do this sRGB to linear space conversion in hardware, but when unavailable, we also have a way to perform this manually. Doing this ensures that the high dynamic range of lighting is not polluted by nonlinear inputs and the lighting in the scene remains consistent. This also helps to avoid areas that are heavily lit from burning out too quickly, which can destroy lighting detail.
The next component required to achieve full HDR (high dynamic range) rendering is ensuring that we store the results of these linear space lighting calculations into off-screen textures that themselves can preserve the linearity by using floating point formats. However, floating point textures can be expensive, so we always try and use a packed float texture format when available.
The icing on the HDR cake comes in the form of tone mapping, which is a process that maps one colour range to another. In our case, that means converting HDR linear lighting results into a range that the monitor can handle, as monitors are only capable of displaying LDR (low dynamic range) values.
Without tone mapping, a straightforward conversion from a high to low dynamic range would result in a significant loss of lighting information, and an undesirable look. We were therefore required to work tirelessly with the artists in evaluating many different tone mapping formulas to achieve a result that best matched the existing RuneScape look, while still preserving a good dynamic range of colour and light intensities. This came in the form of filmic tone mapping.
With most games that are able to compute their global illumination offline, shadows cast from static scene geometry from dominant light sources (i.e. sunlight) are more often than not included as part of this light baking process (shadows are actually part of global illumination, but we won't get into that here). Once again, this was not an option for us, so we opted for a fully dynamic solution. The challenge of developing any fully dynamic real-time shadow system is achieving good quality and performance combined.
For quality and performance, the best technique for rendering real-time shadows on modern GPUs is one based on shadow mapping. However, there are two major problems with any shadow mapping algorithm: projective and perspective aliasing, primarily due to a lack of resolution in the shadows map. Without sufficient resolution, multiple shadow map texels can map to a single screen pixel, giving rise to series aliasing artefacts.
We opted for a parallel split cascaded shadow map scheme where the viewable scene is split into segments, with each segment pertaining to a single shadow cascade, thus giving an improved shadow texel map to screen space pixel ratio. This greatly improves the perspective aliasing that would otherwise be visible if using a single shadow map for the entire scene. The downside, however, is that the scene needs to be rendered again for each shadow map cascade, which massively increases the number of objects drawn each frame, and can seriously hurt client performance.
A multi-pronged attack was required in order to mitigate this explosion of draw calls. There are multiple levels of visibility culling applied to each shadow cascade render pass, which includes view frustum culling, distance-based culling, shadow map area culling and shadow caster volume culling. On top of that, we can cull more objects by making various assumptions about the scene. For example, we don't render terrain chunks into the shadow maps that are shallow as they're unlikely to cast any shadows onto the scene, and we only update far cascades on alternate frames to reduce the average number of draw calls.
Additionally, to further reduce the problem of shadow map aliasing, for the far cascades that cover more of the visible scene, we employ an algorithm commonly known as ‘unit cube clipping' that attempts to tightly fit the shadow map orthographic projection to the visible casters/receivers. This can significantly improve shadow map usage in many scenes.
We need to ensure that each shadow map draw call is as cheap as possible. To achieve this a number of tricks are used: disabling colour writes, custom cut down vertex shaders, null pixel shaders and minimal vertex formats. These help to keep GPU usage to a minimum during shadow map generation.
The final piece in terms of quality and performance for any shadow mapping technique is the filtering used to produce soft edge shadows. We take full advantage of GPU PCF (Percentage Closer Filtering) hardware shadow map filtering which gives us smoothly interpolated shadow map samples. This - combined with multi-tap filter kernels and special texture lookup functions to reduce general purpose register usage - allows us to achieve soft-edged shadows at a high rate of performance.
If you've got this far then you'll probably now have an appreciation for how much of a Jedi art real-time shadow map rendering can be, but the results are well worth the labour!
Light Indexed Deferred Lighting
The environments in RuneScape have a lot of lights! The new client does all its lighting calculations per-pixel without any offline light baking, which means that we have to treat every light as dynamic. One modern approach for such dynamically lit scenes is fully deferred shading. However, going fully deferred has its own shortcomings, and wouldn't be a viable solution for us while still having to support such low spec hardware.
So sticking with our forward lighting rendering pipeline, the standard approach of lighting with 8-16 lights per object just wasn't sufficient due to the size of individual geometry batches. We had think outside the box again!
We opted for a solution known as light-indexed deferred lighting, which is a kind of middle ground between fully deferred shading and forward lighting. It allows us to support up to four unique lights per pixel and fits nicely into our forward-lighting rendering pipeline. This pretty much solved all our n-light problems for large static geometry batches, while still allowing for MSAA support and different lighting formulas for future material variation.
Many Point Lights
Atmospheric Light Scattering
With increased draw distances, we knew that the Java distance fog just wasn't going to cut it. We started off by completely removing the distance based fog and replacing it with a physically based atmospheric light scattering technique, which looked great, but wasn't allowing us to completely mask off the edge of the world. So we decided to combine the old distance fog with our new atmospheric light scattering for a hybrid solution that augments the old fog. The end result gives a much more natural look to the scene - especially at higher draw distances – and also helps to give a better perception of scene depth.
One of the key features of our new effects system has to be the new water rendering. As with pretty much everything else, the shader was rebuilt from scratch, but we also made a bold decision to go back to the old Java water data set, in order to fix various issues that remained in the patch system that we initially inherited from HTML5. We still make use of the water patch data for the real-time planar reflections, but the water geometry itself is now once again constructed from the Java data set, which meant that the artists were no longer required to go back and redo all the patches, saving significant development time.
The shader itself is built from many elements to achieve the final look. The two primary components of any water rendering system is support for real-time reflections and refractions, so those were a must. On top of that, we ensure that the lighting on the water correctly interacts with our shadow system, so direct specular is now correctly masked out where shadows are cast. The wave effect is also improved by changing how the water normal maps are sampled in areas where there is little or no distortion, which greatly reduces tiling artefacts that are commonly seen in water effects. And finally, with the old data set at hand, we had access to improved depth information of underwater terrain, allowing us to fade out various components like fog and wave distortion as water approaches the shore line.
We hope you'll all agree, the results are mouth-watering!
Shadows and Lighting Water Interaction
Draw Call Reduction Techniques
Although not a feature per se, I would be remiss not to mention one of the main reasons why we're able to achieve better performance than Java while still rendering so much more. I already touched briefly on how we were able to significantly reduce draw calls for our shadow map rendering, but even more important was reducing draw calls for our forward lighting and scene depth passes.
Probably the most expensive thing a game engine can do in terms of CPU/GPU time is submitting objects to be drawn on screen. The cost is twofold: first you have to pay the price of the graphics driver overhead on draw call submission when building GPU command buffers, and secondly the actual GPU cost involved in processing vertices and pixels.
I've already mentioned about the standard view frustum and distance based object culling, but with forward lighting pass draw calls being significantly more costly than simple shadow pass draw calls, we needed to go that extra mile.
The biggest draw call saving comes from our dynamic geometry batching system. When objects are loaded, they are also put into groups that share the same material and then physically stitched together so they can be rendered as a single draw call. There is a downside to this, however, in that a complex texture atlasing system is required so that all the textures for these batched objects can be accessed from a single texture page. This has the knock-on effect of bloating the size of the model vertices due to the extra information needed to access an object's texture in an atlas, along with increasing pixel shader instruction counts. However, this additional cost is still small in comparison to the net gain achieved in performance by massively reducing draw calls through batching. This same batching system is also used by all rendering passes to further reduce draw calls.
Each colour denotes a single draw call batch
The final piece in the draw call reduction puzzle is our unique and innovative occlusion culling system. Unlike most occlusion culling solutions that require offline scene processing to generate potentially visible sets at points in the world, or hand-crafted occluder geometry for real-time solutions, our approach requires neither. This was primarily born out of necessity with it not being feasible to process the vast amounts of environments in RuneScape offline, neither generate proxy occluder geometry for an online solution due to limited art resources. We therefore developed a hybrid technique that combines a CPU-side software rasteriser for carrying out occlusion queries, but instead of generating the scene depth data CPU side, we transfer GPU scene depth buffer data generated from previous frames to feed our CPU side software occlusion queries. The GPU depth buffer read-back and software occlusion queries do come with a high fixed cost, but in scenes with high depth complexity, the draw call reduction is high enough from occlusion that we start to see some serious net gains in performance, especially on machines with poorly performing graphics drivers.
I hope this dev blog has been insightful, and has shown just how challenging it has been to provide you with a new client that runs and looks better than Java with 15 year old content.
We believe that the decisions made during this project have laid strong foundations that RuneScape can now build upon to deliver you even better visuals at greater performance for years to come.
Lead Graphics Programmer