glTF in Unity optimization - 4. New Mesh API - The Failed Attempt
This is part 4 of a mini-series.
Goal
With version 2019.3 Unity introduced a new Advanced Mesh API for creating meshes and announced the following advantages
- Faster (yay!)
- Flexible vertex attribute data layout
It's faster, since it omits all validation checks the simple API does. In this post we will see if that's true for my cases.
The plan I have in mind is start from the rear end (mesh data submission) and gradually improve the workflow towards data retrieval from glTF buffers.
- Replace simple by advanced API calls
- Replace existing data structures (C# arrays) by Unity NativeArrays
- Experiment with data types (instead of using floats for everything, use smaller types; esp. if the original glTF type is not a float)
Step 1: Advanced API calls
First thing I did was replacing the simple API calls by the advanced ones (see commit).
This is an approach from the rear end, where all vertex data is already retrieved from buffers in the form of arrays (e.g. Vector3[]
) and ready to be pushed. I created one vertex stream per attribute.
Test 1: Full high resolution mesh
First comparison loading a high resolution mesh with UVs, normals and tangents (repeated 10 times)
glTFast 0.11.0 | glTFast dev | speedup | |
---|---|---|---|
SetVertices | 20.51 ms | 3.17 ms | 6.5 x |
SetIndices | 66.42 ms | 29.34 ms | 2.3 x |
SetUVs | 31.99 ms | 2.07 ms | 15.45 x |
SetNormals | 54.14 ms | 3.16 ms | 17.1 x |
SetTangents | 93.80 ms | 4.42 ms | 21.2 x |
RecalculateBounds | 34.63 ms | 30.02 ms | 1.2 x |
UploadMeshData | 117.70 ms | 121.69 ms | 1.0 x |
That's some great improvements 😀! Vertex data is 6x to 21x times faster and setting indices is twice as fast. As a result the total loading time for 10 huge meshes went down from 8.0 sec to 5.5 sec ( 45% faster ).
Test 2: High resolution mesh without normals and tangents
The second test is the same mesh, but without normals and tangents
Note: In previous posts of this miniseries it became clear that normal/tangent calculations are a bottleneck. Still I'd like to see if the new mesh API improves the situation.
glTFast 0.11.0 | glTFast dev | speedup | |
---|---|---|---|
SetVertices | 32.64 ms | 3.09 ms | 10.6 x |
SetIndices | 69.80 ms | 27.38 ms | 2.6 x |
SetUVs | 35.42 ms | 2.27 ms | 15.6 x |
RecalculateNormals | 130.61 ms | 75.62 ms | 1.7 x |
RecalculateTangents | 960.36 ms | 892.66 ms | 1.1 x |
RecalculateBounds | 35.81 ms | 26.77 ms | 1.3 x |
UploadMeshData | 134.27 ms | 132.60 ms | 1.0 x |
The normal and especially the tangent calculations are still devastating, but they got a bit faster. Other than that we see similar results. Setting positions is even 10 times faster. The total time went down from 16.4 sec to 14.5 sec ( 13% faster ).
Test 3: Generic sample sets
Moving on to sample set with more variation and practical real-world features.
glTFast 0.11.0 | glTFast dev | speedup | |
---|---|---|---|
glTF sample models | 9.48 sec | 8.82 sec | +7.5% |
furniture set | 9.08 sec | 8.32 sec | +10% |
This looked promising at first sight, but I saw that I introduced regressions (like not supporting a second UV set or vertex colors). When I tried to fix those to re-run the tests, suddenly Unity crashed 😱 .
I tracked down the crash at Mesh.SetVertexBufferParams
. Turns out the troubling mesh primitive uses positions, normals, tangents and two sets of texture coordinates. Reading the docs carefully I found that Unity supports up to four vertex streams maximum and my "one stream per attribute" approach exceeds this limit and causes Unity to crash.
I filed a bug report and Unity fixed it in 2020.2. It still won't work, but at least fellow developers don't have to wonder why it crashes anymore 💯.
I investigated a bit in the code base and came to the conclusion, that for the moment this is a dead end. The experimental Mesh API support stops here 🛑.
Next up
In order to use the advanced Mesh API I have to group vertex attributes in a way that results in 4 vertex streams or less, no matter how many attributes there are.
This totally screwed up my plan of starting at the "rear end" and improve from there in tiny steps. I'll have to refactor the data retrieval from start to end in order to support this. The positive initial results motivate me to do exactly that, so I decided to draw a line under the current version of glTFast and call it version 1.0 before I proceed doing this major refactor.
On the plus side, I'll build the refactored version based on NativeArrays from the start, so that's two things at one sweep.
Follow me on twitter or subscribe the feed to not miss updates on this topic.
If you liked this read, feel free to