The added interpolation triangle unit helps exactly with that. With the Turing RT cores, performing intersection tests with objects (triangles) that were in motion was difficult (and slower) as the applied motion blur made it harder to pin-point which triangle was hit by the ray. The 2nd Gen RT cores introduce a rather interesting feature, called motion blur acceleration. While the basic BVH acceleration and ray-triangle testing is unchanged, NVIDIA has added an additional component to the RTCore for interpolating the triangle position before the ray-triangle intersection testing. AMD Navi vs NVIDIA Turing: Comparing the Radeon and GeForce Graphics Architectures.Total L1 bandwidth for the RTX 3080 is 219 GB/sec versus 116 GB/sec for RTX 2080 Super. To allow scheduling of both integer and floating-point workloads, the L1 cache bandwidth had to be doubled: 128 bytes/clock per Ampere SM versus 64 bytes/clock in Turing. This is, however, still a notable step up over Turing as Integer workloads are much lower compared to FP32 and as such, shader utilization should be notably better with this configuration. As I said, the 128 FMA per SM figure is an impractical best case scenario and for the most part, you’ll get around 75 to 90 FMA. This also means that when you have INT32 instructions as well which are mostly compute, then some of those FP32/INT32 cores will be used for the latter, reducing the peak FP32 bandwidth. This is why we don’t see an increase of 2x in performance despite the fact that the core count increases by the same figure. This means that the 2x FP32 or 128 FMA per clock of performance that NVIDIA is touting will only be true when the workloads are purely composed of FP32 instructions which is rarely the case.
0 Comments
Leave a Reply. |