This video may not change your life, but my #FluidX3D software will, if you do research in #CFD. On the same #GPU(s) it’s 100-2000x faster than expensive commercial solvers. The entire source code is on GitHub:
This 10s video shows 10s in real time with 1m/s wind speed. 476×952×476 #LBM grid (215 million voxels), 28k time steps, 23 minutes for compute rendering on my PC with Titan Xp GPU.
How is it possible to squeeze 215 million grid points in only 12GB?
I’m using two techniques here, which together form the holy grail of lattice Boltzmann, cutting memory demand down to only 55 Bytes/node for D3Q19 LBM, or 1/6 of conventional LBM codes:
1. In-place streaming with Esoteric-Pull. This almost cuts memory demand in half and slightly increases performance due to implicit bounce-back boundaries.
Paper:
2. Decoupled arithmetic precision (FP32) and memory precision (FP16): all arithmetic is done in FP32, but LBM density distribution functions in memory are compressed to FP16. This almost cuts memory demand in half and almost doubles performance, without impacting overall accuracy for most setups.
Paper:
Graphics are done directly in FluidX3D with #OpenCL, with the raw simulation data already residing in ultra-fast video memory. No volumetric data (1 frame of the velocity field is !) ever has to be copied to the CPU or hard drive, but only rendered 1080p frames (8MB) instead. Once on the CPU side, a copy of the frame is made in memory and a thread is detached to handle the slow .png compression, all while the simulation is already continuing.
Paper: