New Feature Release - Shader Decompiler Rewrite
Written by CaptV0rt3x on July 10 2021

Greetings, yuz-ers! The long awaited day is finally here. We are very excited to present to you, Project Hades, our shader decompiler rewrite! This massive update includes huge performance improvements, countless bug fixes, and more. Let’s get started!

 

Project Hades is now available in the latest yuzu Early Access build. As always, we ask that you test various games with these builds and if you encounter any issues, bugs, or crashes, please reach out to us via the Discord Patreon channels.

Notice

The entire shader generation process has been redesigned from the ground up, thus existing shader caches have been invalidated. Users will need to build their shader caches again, from scratch, with Project Hades.

What is Project Hades ?

Project Hades is the codename for our shader decompiler code rewrite, although at this point it’s become much more than that.

For those who don’t know what a shader decompiler is, you’ll need to understand the process of how games render (show/display) anything. Shaders are special programs that are coded to perform various tasks on GPUs, typically relating to rendering graphics on display. Shaders are usually written in high-level shader languages compatible with the graphics API in use - e.g. OpenGL Shading Language (GLSL), Standard Portable Intermediate Representation - V (SPIR-V), and OpenGL ARB Assembly language (GLASM). Games often use hundreds or thousands of these shaders to tell the GPU what to render and how to do it.

 yuzu Shader Generation

yuzu Shader Generation

In the case of Switch games, they also use shaders to render graphics on the Switch itself. However, since these shaders are precompiled for the Switch’s GPU, yuzu cannot use them directly to render graphics using the host GPU (User’s GPU). Therefore, yuzu first decompiles these shaders into something called IR, or Intermediate Representation, which is then used to generate the high-level GLSL/SPIR-V/GLASM shaders used by the graphics APIs and drivers to render games on the host GPU.

Shader decompilation is the process of translating the guest (in this case, the Nintendo Switch) GPU machine code to a representation that can be compiled on the host (User’s PC) GPU. Shader compilation is the process of taking that representation and sending it to the host GPU driver to get compiled and then executed on the user’s GPU.

Goals

The main goal of Project Hades was to redesign the decompiler and shader generation code with a focus on simplicity and accuracy. It aimed to make both decompilation and compilation faster overall, thus improving the performance. Rewriting the decompiler would allow us to audit it through unit testing, following a design similar to dynarmic, allowing proper program analysis and optimizations over fast-to-emit intermediate representation.

 

 Dark Souls

Dark Souls

 Dragon Quest XI

Dragon Quest XI

Taking a leaf from dynarmic’s book, the developers opted to use an SSA representation, as it would work very nicely with the SPIR-V IR used for shaders, thanks to its native support for SSA. As for the unit testing, Rodrigo wrote homebrew tests for the hardware which helped the developers accurately emulate hardware behaviour.

But this was just the beginning. Over the course of a few months, the developers would go on to face and overcome many hurdles with the code rewrite and the Project’s goals would expand to accommodate much more.

Overview of changes

Project Hades was a collaborative effort from developers Rodrigo, Blinkhawk, and epicboy. They distributed the required work among themselves and spent countless hours in coding, unit testing, game testing, and performance validation.

Blinkhawk mainly worked on implementing miscellaneous instructions, including texture sampling instructions required for decompilation. He added support for Nvidia’s VertexA shader stage, a non-standard shader stage available on Nvidia hardware that is executed before the regular VertexB shader stage.

This allowed games such as Catherine: Full Body, Bravely Default 2, and A Hat in Time to render graphics for the first time. Blinkhawk also fixed an issue in yuzu’s texture cache relating to the texture streaming used in Unreal Engine 4 (UE4) games, resolving many of their rendering issues.

Note: Due to a race condition in our GPU Emulation, to render Catherine Full Body correctly, you may need to disable Asynchronous GPU Emulation.

 

Catherine: Full Body

 A Hat in Time

A Hat in Time

Bravely Default II

epicboy implemented almost all of the arithmetic, logical, and floating-point instructions, as well as developing the entire GLSL backend. GLSL is the default backend, when the OpenGL API is selected in the yuzu configuration settings.

The GLSL backend rewrite was not part of the initial plan for Project Hades, as the developers only intended to work on GLASM and SPIR-V, but it was later included due to how buggy and slow some OpenGL SPIR-V compilers are. That said, some OpenGL drivers benefit greatly from the use of SPIR-V shaders, so the choice of using SPIR-V on OpenGL is left as an experimental setting.

 

 Mario and Rabbids

Mario and Rabbids

 Monochrome

Monochrome

Rodrigo designed the overall structure to support these changes, and developed multiple optimization passes to improve performance wherever possible. In addition to that, he rewrote the entire GLASM (GL Assembly) backend, and integrated the existing frontend rasterizers with the new backends.

The GLASM backend is a special path where the decompiled shaders (assembly language) skip the shader compilation steps on the host GPU, thus improving the performance. Unfortunately, GLASM is only supported by Nvidia GPUs, limiting the scope of this performance boost.

 

 Crash Bandicoot 4

Crash Bandicoot 4

 World of Final Fantasy

World of Final Fantasy

While these were the major changes, there were also many other minor improvements. Changing how the shaders are generated also meant changing the way shader information is presented to the renderers. This was overhauled for simplicity, which led to some new features, and ease of caching.

Vulkan Pipeline Caching

All the information required to generate pipelines from scratch now gets cached to disk, thereby removing shader stutter almost completely on Vulkan. In contrast, OpenGL can still encounter shader stutters due to unpredictable or undocumented state changes. Vulkan still can have minor stutters when new shaders are detected, but it hasn’t been noticeable during our testing.

Asynchronous Pipeline Creation

yuzu already supported Asynchronous Shaders, where draw calls are skipped (rendering is paused) until the shader, or in Vulkan’s case, the pipeline, is compiled. This is nice in some cases because it allows for more consistent play sessions, minimizing stutter from shader compilations. But this also has a big drawback: it introduces graphical glitches, which are sometimes persistent throughout the play session. While this is not an issue after restarting emulation and having the shaders cached, it’s not optimal.

A better way to implement this for Vulkan was to build pipelines in parallel without skipping draw calls. In other words, continue processing GPU commands while the pipelines are being built. This allows building one pipeline per CPU thread (minus one) in parallel while the game is executing. This results in reduced stutter that is, in a way, similar to skipping draw calls.

How does this work?
To understand why this is possible, it's necessary to explain how yuzu's Vulkan command recording works. Commands are recorded and deferred to a separate thread for processing (sometimes called the "Command Submission (CS) thread").

This thread runs in parallel to the main GPU thread. This means the CS thread can build the pipelines sequentially while the main GPU thread continues its execution, periodically pushing new commands to the CS thread.

Sadly, this is not possible at the moment on OpenGL, because drivers wait more often for their CS thread than yuzu on its own CS thread. It may be possible if we optimize the whole OpenGL backend to avoid "glGen*" and "glGetSynciv" calls within a draw call.

Even more!!

On top of these big improvements, we also have many minor optimizations. Some notable ones are listed below:

  • Project Hades keeps track of the number of bytes used in constant buffers and passes this information to the buffer cache. This reduces the number of uploaded bytes on some titles, thus improving performance.
  • Vulkan command submission to the GPU now happens on the separate CS thread, increasing performance by 1 to 2 FPS in Super Mario Odyssey, although presentation to screen is still being synchronized.
  • Synchronization for texture buffers between the texture cache and the buffer cache, fixing some crashes on Koei Tecmo games.
  • Generate specialized Vulkan descriptor pools, sharing pools within similar pipelines. This reduces memory consumption and boot time on most drivers, saving ~700 MiB of VRAM on AMD compared to the previous approach.
  • Usage of VK_KHR_push_descriptor when available. Reduces the overhead of updating descriptor sets on Nvidia by 57% and by 10% on Intel (measured on Super Smash Bros. Ultimate 1v1 on Final Destination). It also reduces memory consumption but this hasn’t been measured.
  • Usage of VK_EXT_conservative_rasterization and VK_EXT_provoking_vertex when available.
  • Use specialized “pre-draw” functions per pipeline to reduce unnecessary work.
  • Texture Reaper, which cleans the least used resources in your VRAM to reduce VRAM usage. We will cover this and others, in detail, in our next progress report.

Graphical fixes

Thanks to the redesign and reimplementation of our entire shader generation code, the developers were able to investigate and identify the causes for graphical glitches in many games. In fact, some games like Yoshi's Crafted World, Trials of Mana, Minecraft Dungeons, and many others, now render almost perfectly. The Legend of Zelda: Breath of the Wild is now fully playable on Vulkan.

 Breath of the Wild (fixed runes in Vulkan)

Breath of the Wild (fixed runes in Vulkan)

The Legend of Zelda: Breath of the Wild (EA Vs. HADES)

 Yoshi's Crafted World

Yoshi's Crafted World

 Bravely Default II

Bravely Default II

Minecraft Dungeons

The broken bloom, causing sand and fog in Super Mario Odyssey to render incorrectly, is now fixed!

Super Mario Odyssey (EA Vs. HADES)

Thanks to the implementation of tessellation shaders, the sand in Luigi's Mansion 3 is no longer broken!

Various graphical glitches, crashes and general stability issues in Fire Emblem: Three houses, Hyrule Warriors: Age of Calamity, Marvel Ultimate Alliance 3, Persona 5 Strikers, and Xenoblade Chronicles were also fixed.

Fire Emblem: Three Houses

Hyrule Warriors: Age of Calamity

 Marvel Ultimate Alliance 3

Marvel Ultimate Alliance 3

 Hyrule Warriors Definitive Edition

Hyrule Warriors Definitive Edition

Persona 5 Strikers

Xenoblade Chronicles (EA Vs. HADES)
Xenoblade Chronicles (EA Vs. HADES)

Xenoblade Chronicles (EA Vs. HADES)

Hollow Knight's issue with transparent textures has been fixed.

Hollow Knight (EA Vs. HADES)

Kirby Star Allies, Mario Kart 8 deluxe, Tony Hawk Pro Skater, Story of Seasons, and Clubhouse games, were among many other titles that saw graphical glitches fixed. Rune Factory 4 renders perfectly now and Rune Factory 5 has improved rendering.

Kirby Star Allies (EA Vs. HADES)
Kirby Star Allies (EA Vs. HADES)

Kirby Star Allies (EA Vs. HADES)

Mario Kart 8 Deluxe (EA Vs. HADES)

Tony Hawk Pro Skater (EA Vs. HADES)

Story of Seasons (EA Vs. HADES)

Rune Factory 4 (EA Vs. HADES)
Rune Factory 4 (EA Vs. HADES)

Rune Factory 4 (EA Vs. HADES)

Rune Factory 5

 Trials of Mana

Trials of Mana

 Clubhouse

Clubhouse

 Farming Simulator 20

Farming Simulator 20

And many more!!

Densha de Go

Final Fantasy XII

Hellblade: Senua's Sacrifice

Spyro Reignited Trilogy

Alright! Let’s talk numbers now!

Project Hades rewrote a vast majority of the GPU code and made tons of improvements and optimizations to boost the performance. Since the changes touched on many areas of the GPU emulation, we observed performance improvements in many titles.

Below are some performance comparison charts between yuzu Early Access (1860) and Project Hades on our recommended specifications using Vulkan API. Please note that at the time of comparison, EA 1860 was almost equivalent to yuzu Mainline, with no changes that would significantly affect performance.

 Recommended Specs (* is OpenGL)

Recommended Specs (* is OpenGL)

That’s not all. The improvements made to the Vulkan backend in Project Hades have greatly improved performance for AMD GPU users on Linux (RADV).

 Linux AMD (RADV)

Linux AMD (RADV)

These are just a small testament to the performance improvements that Hades brings. We will be sharing more performance charts with our next progress report.

Fin

Our development efforts were massively accelerated by our testers, who tested dozens of titles for bugs, fixes, and performance regressions. Since our testing couldn’t realistically cover all titles, we request that you test and play your favourite games in yuzu and experience the improvements yourselves. While testing, if you come across any regressions, glitches, bugs, or crashes, please reach out to us via our Discord Patreon Channels. This will assist us in identifying and fixing any potential issues Project Hades might present.

That’s all we have for now, until next time! Happy emulating!

 

Please consider supporting us on Patreon!
If you would like to contribute to this project, check out our GitHub!


Advertisement

Advertisement