Home

Awesome

FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.

<a href="https://youtu.be/-MkRBeQkLk8"><img src="https://img.youtube.com/vi/o3TPN142HxM/maxresdefault.jpg" width="50%"></img></a><a href="https://youtu.be/oC6U1M0Fsug"><img src="https://img.youtube.com/vi/oC6U1M0Fsug/maxresdefault.jpg" width="50%"></img></a><br> <a href="https://youtu.be/XOfXHgP4jnQ"><img src="https://img.youtube.com/vi/XOfXHgP4jnQ/maxresdefault.jpg" width="50%"></img></a><a href="https://youtu.be/K5eKxzklXDA"><img src="https://img.youtube.com/vi/K5eKxzklXDA/maxresdefault.jpg" width="50%"></img></a> (click on images to show videos on YouTube)

<details><summary>Update History</summary> </details>

How to get started?

Read the FluidX3D Documentation!

Compute Features - Getting the Memory Problem under Control

<!-- markdown equations don't render properly in mobile browser - streaming (part 2/2): $$j=0\\ \textrm{for}\\ i=0$$ $$j=t\\%2\\ ?\\ i\\ :\\ (i\\%2\\ ?\\ i+1\\ :\\ i-1)\\ \textrm{for}\\ i\in[1,q-1]$$ $$f_i^\textrm{temp}(\vec{x},t)=f_j(i\\%2\\ ?\\ \vec{x}\\ :\\ \vec{x}-\vec{e}_i,\\ t)$$ - collision: $$\rho(\vec{x},t)=\left(\sum_i f_i^\textrm{temp}(\vec{x},t)\right)+1$$ $$\vec{u}(\vec{x},t)=\frac{1}{\rho(\vec{x},t)}\sum_i\vec{c}_i f_i^\textrm{temp}(\vec{x},t)$$ $$f_i^\textrm{eq-shifted}(\vec{x},t)=w_i \rho \cdot\left(\frac{(\vec{u} _{^{^\circ}}\vec{c}_i)^2}{2 c^4}-\frac{\vec{u} _{^{^\circ}}\vec{u}}{2 c^2}+\frac{\vec{u} _{^{^\circ}}\vec{c}_i}{c^2}\right)+w_i (\rho-1)$$ $$f_i^\textrm{temp}(\vec{x},\\ t+\Delta t)=f_i^\textrm{temp}(\vec{x},t)+\Omega_i(f_i^\textrm{temp}(\vec{x},t),\\ f_i^\textrm{eq-shifted}(\vec{x},t),\\ \tau)$$ - streaming (part 1/2): $$j=0\\ \textrm{for}\\ i=0$$ $$j=t\\%2\\ ?\\ (i\\%2\\ ?\\ i+1\\ :\\ i-1)\\ :\\ i\\ \textrm{for}\\ i\in[1,q-1]$$ $$f_j(i\\%2\\ ?\\ \vec{x}+\vec{e}_i\\ :\\ \vec{x},\\ t+\Delta t)=f_i^\textrm{temp}(\vec{x},\\ t+\Delta t)$$ -->

Solving the Visualization Problem

Solving the Compatibility Problem

Single-GPU/CPU Benchmarks

Here are performance benchmarks on various hardware in MLUPs/s, or how many million lattice cells are updated per second. The settings used for the benchmark are D3Q19 SRT with no extensions enabled (only LBM with implicit mid-grid bounce-back boundaries) and the setup consists of an empty cubic box with sufficient size (typically 256³). Without extensions, a single lattice cell requires:

In consequence, the arithmetic intensity of this implementation is 2.37 (FP32/FP32) or 5.27 (FP32/FP16S) or 16.56 (FP32/FP16C) FLOPs/Byte. So performance is only limited by memory bandwidth. The table in the left 3 columns shows the hardware specs as found in the data sheets (theoretical peak FP32 compute performance, memory capacity, theoretical peak memory bandwidth). The right 3 columns show the measured FluidX3D performance for FP32/FP32, FP32/FP16S, FP32/FP16C floating-point precision settings, with the (roofline model efficiency) in round brackets, indicating how much % of theoretical peak memory bandwidth are being used.

If your GPU/CPU is not on the list yet, you can report your benchmarks here.

gantt

title FluidX3D Performance [MLUPs/s] - FP32 arithmetic, (fastest of FP32/FP16S/FP16C) memory storage
dateFormat X
axisFormat %s
%%{
	init: {
		'theme': 'forest',
		'themeVariables': {
			'sectionBkgColor': '#99999999',
			'sectionBkgColor2': '#99999999',
			'altSectionBkgColor': '#00000000',
			'titleColor': '#7F7F7F',
			'textColor': '#7F7F7F',
			'taskTextColor': 'lightgray',
			'taskBorderColor': '#487E3A'
		}
	}
}%%

section MI300X
	38207 :crit, 0, 38207
section MI250 (1 GCD)
	9030 :crit, 0, 9030
section MI210
	9547 :crit, 0, 9547
section MI100
	8542 :crit, 0, 8542
section MI60
	5111 :crit, 0, 5111
section Radeon VII
	7778 :crit, 0, 7778
section GPU Max 1100
	6209 :done, 0, 6209
section GH200 94GB GPU
	34689 : 0, 34689
section H100 NVL
	32613 : 0, 32613
section H100 PCIe
	20624 : 0, 20624
section A100 SXM4 80GB
	18448 : 0, 18448
section PG506-242/243
	15654 : 0, 15654
section A100 PCIe 80GB
	17896 : 0, 17896
section A100 SXM4 40GB
	16013 : 0, 16013
section A100 PCIe 40GB
	16035 : 0, 16035
section CMP 170HX
	12392 : 0, 12392
section A30
	9721 : 0, 9721
section V100 SXM2 32GB
	8947 : 0, 8947
section V100 PCIe 16GB
	10325 : 0, 10325
section GV100
	6641 : 0, 6641
section Titan V
	7253 : 0, 7253
section P100 PCIe 16GB
	5950 : 0, 5950
section P100 PCIe 12GB
	4141 : 0, 4141
section GTX TITAN
	2500 : 0, 2500
section K40m
	1868 : 0, 1868
section K80 (1 GPU)
	1642 : 0, 1642
section K20c
	1507 : 0, 1507
section RX 7900 XTX
	7716 :crit, 0, 7716
section PRO W7900
	5939 :crit, 0, 5939
section RX 7900 XT
	5986 :crit, 0, 5986
section PRO W7800
	4426 :crit, 0, 4426
section PRO W7700
	2943 :crit, 0, 2943
section RX 7600
	2561 :crit, 0, 2561
section PRO W7600
	2287 :crit, 0, 2287
section PRO W7500
	1682 :crit, 0, 1682
section RX 6900 XT
	4227 :crit, 0, 4227
section RX 6800 XT
	4241 :crit, 0, 4241
section PRO W6800
	3361 :crit, 0, 3361
section RX 6700 XT
	2908 :crit, 0, 2908
section RX 6800M
	3213 :crit, 0, 3213
section RX 6700M
	2429 :crit, 0, 2429
section RX 6600
	1839 :crit, 0, 1839
section RX 6500 XT
	1030 :crit, 0, 1030
section RX 5700 XT
	3253 :crit, 0, 3253
section RX 5700
	3167 :crit, 0, 3167
section RX 5600 XT
	2214 :crit, 0, 2214
section RX Vega 64
	3227 :crit, 0, 3227
section RX 590
	1688 :crit, 0, 1688
section RX 580 4GB
	1848 :crit, 0, 1848
section RX 580 2048SP 8GB
	1622 :crit, 0, 1622
section R9 390X
	2217 :crit, 0, 2217
section HD 7850
	635 :crit, 0, 635
section Arc A770 LE
	4568 :done, 0, 4568
section Arc A750 LE
	4314 :done, 0, 4314
section Arc A580
	3889 :done, 0, 3889
section Arc A380
	1115 :done, 0, 1115
section RTX 4090
	11496 : 0, 11496
section RTX 6000 Ada
	10293 : 0, 10293
section L40S
	7637 : 0, 7637
section RTX 4080 Super
	8218 : 0, 8218
section RTX 4080
	7933 : 0, 7933
section RTX 4070 Ti Super
	7295 : 0, 7295
section RTX 4070
	5016 : 0, 5016
section RTX 4080M
	5114 : 0, 5114
section RTX 4000 Ada
	4221 : 0, 4221
section RTX 4060
	3124 : 0, 3124
section RTX 4070M
	3092 : 0, 3092
section RTX 2000 Ada
	2526 : 0, 2526
section RTX 3090 Ti
	10956 : 0, 10956
section RTX 3090
	10732 : 0, 10732
section RTX 3080 Ti
	9832 : 0, 9832
section RTX 3080 12GB
	9657 : 0, 9657
section RTX A6000
	8814 : 0, 8814
section RTX 3080 10GB
	8118 : 0, 8118
section RTX 3080M Ti
	5908 : 0, 5908
section RTX 3070
	5096 : 0, 5096
section RTX 3060 Ti
	5129 : 0, 5129
section RTX A4000
	4945 : 0, 4945
section RTX A5000M
	4461 : 0, 4461
section RTX 3060
	4070 : 0, 4070
section RTX 3060M
	4012 : 0, 4012
section RTX 3050M Ti
	2341 : 0, 2341
section RTX 3050M
	2339 : 0, 2339
section Titan RTX
	7554 : 0, 7554
section RTX 6000
	6879 : 0, 6879
section RTX 8000 Passive
	5607 : 0, 5607
section RTX 2080 Ti
	6853 : 0, 6853
section RTX 2080 Super
	5284 : 0, 5284
section RTX 5000
	4773 : 0, 4773
section RTX 2070 Super
	4893 : 0, 4893
section RTX 2060 Super
	5035 : 0, 5035
section RTX 4000
	4584 : 0, 4584
section RTX 2060 KO
	3376 : 0, 3376
section RTX 2060
	3604 : 0, 3604
section GTX 1660 Super
	3551 : 0, 3551
section T4
	2887 : 0, 2887
section GTX 1660 Ti
	3041 : 0, 3041
section GTX 1660
	1992 : 0, 1992
section GTX 1650M 896C
	1858 : 0, 1858
section GTX 1650M 1024C
	1400 : 0, 1400
section T500
	665 : 0, 665
section Titan Xp
	5495 : 0, 5495
section GTX 1080 Ti
	4877 : 0, 4877
section GTX 1080
	3182 : 0, 3182
section GTX 1060 6GB
	1925 : 0, 1925
section GTX 1060M
	1882 : 0, 1882
section GTX 1050M Ti
	1224 : 0, 1224
section P1000
	839 : 0, 839
section GTX 970
	1721 : 0, 1721
section M4000
	1519 : 0, 1519
section M60 (1 GPU)
	1571 : 0, 1571
section GTX 960M
	872 : 0, 872
section GTX 770
	1215 : 0, 1215
section GTX 680 4GB
	1274 : 0, 1274
section K2000
	444 : 0, 444
section GT 630 (OEM)
	185 : 0, 185
section NVS 290
	9 : 0, 9
section Arise 1020
	6 :active, 0, 6
section M2 Max (38-CU, 32GB)
	4641 :done, 0, 4641
section M1 Ultra (64-CU, 128GB)
	8418 :done, 0, 8418
section M1 Max (24-CU, 32GB)
	4496 :done, 0, 4496
section M1 Pro (16-CU, 16GB)
	2329 :done, 0, 2329
section M1 (8-CU, 16GB)
	759 :done, 0, 759
section Radeon Graphics (7800X3D)
	498 :crit, 0, 498
section 780M (Z1 Extreme)
	860 :crit, 0, 860
section Vega 8 (4750G)
	511 :crit, 0, 511
section Vega 8 (3500U)
	288 :crit, 0, 288
section Arc 140V GPU (16GB)
	1189 :done, 0, 1189
section Arc Graphics (Ultra 9 185H)
	724 :done, 0, 724
section Iris Xe Graphics (i7-1265U)
	621 :done, 0, 621
section UHD Xe 32EUs
	245 :done, 0, 245
section UHD 770
	475 :done, 0, 475
section UHD 630
	301 :done, 0, 301
section UHD P630
	288 :done, 0, 288
section HD 5500
	192 :done, 0, 192
section HD 4600
	115 :done, 0, 115
section Orange Pi 5 Mali-G610 MP4
	232 :active, 0, 232
section Samsung Mali-G72 MP18 (S9+)
	230 :active, 0, 230
section 2x EPYC 9754
	5179 :crit, 0, 5179
section 2x EPYC 9654
	1814 :crit, 0, 1814
section 2x EPYC 7352
	739 :crit, 0, 739
section 2x EPYC 7313
	498 :crit, 0, 498
section 2x EPYC 7302
	784 :crit, 0, 784
section 2x 6980P
	7875 :done, 0, 7875
section 2x 6979P
	8135 :done, 0, 8135
section 2x Platinum 8592+
	3135 :done, 0, 3135
section 2x CPU Max 9480
	2037 :done, 0, 2037
section 2x Platinum 8480+
	2162 :done, 0, 2162
section 2x Platinum 8380
	1410 :done, 0, 1410
section 2x Platinum 8358
	1285 :done, 0, 1285
section 2x Platinum 8256
	396 :done, 0, 396
section 2x Platinum 8153
	691 :done, 0, 691
section 2x Gold 6248R
	755 :done, 0, 755
section 2x Gold 6128
	254 :done, 0, 254
section Phi 7210
	415 :done, 0, 415
section 4x E5-4620 v4
	460 :done, 0, 460
section 2x E5-2630 v4
	264 :done, 0, 264
section 2x E5-2623 v4
	125 :done, 0, 125
section 2x E5-2680 v3
	304 :done, 0, 304
section GH200 Neoverse-V2
	1323 : 0, 1323
section TR PRO 7995WX
	1715 :crit, 0, 1715
section TR 3970X
	463 :crit, 0, 463
section TR 1950X
	273 :crit, 0, 273
section Ryzen 7800X3D
	363 :crit, 0, 363
section Ryzen 5700X3D
	229 :crit, 0, 229
section FX-6100
	22 :crit, 0, 22
section Athlon X2 QL-65
	3 :crit, 0, 3
section Ultra 7 258V
	287 :done, 0, 287
section Ultra 9 185H
	317 :done, 0, 317
section i7-13700K
	504 :done, 0, 504
section i7-1265U
	128 :done, 0, 128
section i9-11900KB
	208 :done, 0, 208
section i9-10980XE
	286 :done, 0, 286
section E-2288G
	198 :done, 0, 198
section i7-9700
	103 :done, 0, 103
section i5-9600
	147 :done, 0, 147
section i7-8700K
	152 :done, 0, 152
section E-2176G
	201 :done, 0, 201
section i7-7700HQ
	108 :done, 0, 108
section E3-1240 v5
	141 :done, 0, 141
section i5-5300U
	37 :done, 0, 37
section i7-4770
	104 :done, 0, 104
section i7-4720HQ
	80 :done, 0, 80
section N2807
	7 :done, 0, 7
<details><summary>Single-GPU/CPU Benchmark Table</summary>

Colors: 🔴 AMD, 🔵 Intel, 🟢 Nvidia, ⚪ Apple, 🟡 ARM, 🟤 Glenfly

DeviceFP32<br>[TFlops/s]Mem<br>[GB]BW<br>[GB/s]FP32/FP32<br>[MLUPs/s]FP32/FP16S<br>[MLUPs/s]FP32/FP16C<br>[MLUPs/s]
🔴 Instinct MI300X163.40192530020711 (60%)38207 (56%)31169 (45%)
🔴 Instinct MI250 (1 GCD)45.266416385638 (53%)9030 (42%)8506 (40%)
🔴 Instinct MI21045.266416386517 (61%)9547 (45%)8829 (41%)
🔴 Instinct MI10046.143212285093 (63%)8133 (51%)8542 (54%)
🔴 Instinct MI6014.753210243570 (53%)5047 (38%)5111 (38%)
🔴 Radeon VII13.831610244898 (73%)7778 (58%)5256 (40%)
🔵 Data Center GPU Max 110022.224812293487 (43%)6209 (39%)3252 (20%)
🟢 GH200 94GB GPU66.9194400020595 (79%)34689 (67%)19407 (37%)
🟢 H100 NVL60.3294393820018 (78%)32613 (64%)17605 (34%)
🟢 H100 PCIe51.0180200011128 (85%)20624 (79%)13862 (53%)
🟢 A100 SXM4 80GB19.4980203910228 (77%)18448 (70%)11197 (42%)
🟢 A100 PCIe 80GB19.498019359657 (76%)17896 (71%)10817 (43%)
🟢 PG506-243 / PG506-24222.146416388195 (77%)15654 (74%)12271 (58%)
🟢 A100 SXM4 40GB19.494015558522 (84%)16013 (79%)11251 (56%)
🟢 A100 PCIe 40GB19.494015558526 (84%)16035 (79%)11088 (55%)
🟢 CMP 170HX6.32814937684 (79%)12392 (64%)6859 (35%)
🟢 A3010.32249335004 (82%)9721 (80%)5726 (47%)
🟢 Tesla V100 SXM2 32GB15.67329004471 (76%)8947 (77%)7217 (62%)
🟢 Tesla V100 PCIe 16GB14.13169005128 (87%)10325 (88%)7683 (66%)
🟢 Quadro GV10016.66328703442 (61%)6641 (59%)5863 (52%)
🟢 Titan V14.90126533601 (84%)7253 (86%)6957 (82%)
🟢 Tesla P100 16GB9.52167323295 (69%)5950 (63%)4176 (44%)
🟢 Tesla P100 12GB9.52125492427 (68%)4141 (58%)3999 (56%)
🟢 GeForce GTX TITAN4.7162881460 (77%)2500 (67%)1113 (30%)
🟢 Tesla K40m4.29122881131 (60%)1868 (50%)912 (24%)
🟢 Tesla K80 (1 GPU)4.1112240916 (58%)1642 (53%)943 (30%)
🟢 Tesla K20c3.525208861 (63%)1507 (56%)720 (27%)
🔴 Radeon RX 7900 XTX61.44249603665 (58%)7644 (61%)7716 (62%)
🔴 Radeon PRO W790061.30488643107 (55%)5939 (53%)5780 (52%)
🔴 Radeon RX 7900 XT51.61208003013 (58%)5856 (56%)5986 (58%)
🔴 Radeon PRO W780045.20325761872 (50%)4426 (59%)4145 (55%)
🔴 Radeon PRO W770028.30165761547 (41%)2943 (39%)2899 (39%)
🔴 Radeon RX 760021.7582881250 (66%)2561 (68%)2512 (67%)
🔴 Radeon PRO W760020.0082881179 (63%)2263 (61%)2287 (61%)
🔴 Radeon PRO W750012.208172856 (76%)1630 (73%)1682 (75%)
🔴 Radeon RX 6900 XT23.04165121968 (59%)4227 (64%)4207 (63%)
🔴 Radeon RX 6800 XT20.74165122008 (60%)4241 (64%)4224 (64%)
🔴 Radeon PRO W680017.83325121620 (48%)3361 (51%)3180 (48%)
🔴 Radeon RX 6700 XT13.21123841408 (56%)2883 (58%)2908 (58%)
🔴 Radeon RX 6800M11.78123841439 (57%)3190 (64%)3213 (64%)
🔴 Radeon RX 6700M10.60103201194 (57%)2388 (57%)2429 (58%)
🔴 Radeon RX 66008.938224963 (66%)1817 (62%)1839 (63%)
🔴 Radeon RX 6500 XT5.774144459 (49%)1011 (54%)1030 (55%)
🔴 Radeon RX 5700 XT9.7584481368 (47%)3253 (56%)3049 (52%)
🔴 Radeon RX 57007.7284481521 (52%)3167 (54%)2758 (47%)
🔴 Radeon RX 5600 XT6.7362881136 (60%)2214 (59%)2148 (57%)
🔴 Radeon RX Vega 6413.3584841875 (59%)2878 (46%)3227 (51%)
🔴 Radeon RX 5905.5382561257 (75%)1573 (47%)1688 (51%)
🔴 Radeon RX 580 4GB6.504256946 (57%)1848 (56%)1577 (47%)
🔴 Radeon RX 580 2048SP 8GB4.948224868 (59%)1622 (56%)1240 (43%)
🔴 Radeon R9 390X5.9183841733 (69%)2217 (44%)1722 (35%)
🔴 Radeon HD 78501.842154112 (11%)120 ( 6%)635 (32%)
🔵 Arc A770 LE19.66165602663 (73%)4568 (63%)4519 (62%)
🔵 Arc A750 LE17.2085122555 (76%)4314 (65%)4047 (61%)
🔵 Arc A58012.2985122534 (76%)3889 (58%)3488 (52%)
🔵 Arc A3804.206186622 (51%)1097 (45%)1115 (46%)
🟢 GeForce RTX 409082.582410085624 (85%)11091 (85%)11496 (88%)
🟢 RTX 6000 Ada91.10489604997 (80%)10249 (82%)10293 (83%)
🟢 L40S91.61488643788 (67%)7637 (68%)7617 (68%)
🟢 GeForce RTX 4080 Super52.22167364089 (85%)7660 (80%)8218 (86%)
🟢 GeForce RTX 408055.45167173914 (84%)7626 (82%)7933 (85%)
🟢 GeForce RTX 4070 Ti Super44.10166723694 (84%)6435 (74%)7295 (84%)
🟢 GeForce RTX 407029.15125042646 (80%)4548 (69%)5016 (77%)
🟢 GeForce RTX 4080M33.85124322577 (91%)5086 (91%)5114 (91%)
🟢 RTX 4000 Ada26.73203602130 (91%)3964 (85%)4221 (90%)
🟢 GeForce RTX 406015.1182721614 (91%)3052 (86%)3124 (88%)
🟢 GeForce RTX 4070M18.2582561553 (93%)2945 (89%)3092 (93%)
🟢 RTX 2000 Ada12.00162241351 (92%)2452 (84%)2526 (87%)
🟢 GeForce RTX 3090 Ti40.002410085717 (87%)10956 (84%)10400 (79%)
🟢 GeForce RTX 309039.05249365418 (89%)10732 (88%)10215 (84%)
🟢 GeForce RTX 3080 Ti37.17129125202 (87%)9832 (87%)9347 (79%)
🟢 GeForce RTX 3080 12GB32.26129125071 (85%)9657 (81%)8615 (73%)
🟢 RTX A600040.00487684421 (88%)8814 (88%)8533 (86%)
🟢 GeForce RTX 3080 10GB29.77107604230 (85%)8118 (82%)7714 (78%)
🟢 GeForce RTX 3080M Ti23.61165122985 (89%)5908 (89%)5780 (87%)
🟢 GeForce RTX 307020.3184482578 (88%)5096 (88%)5060 (87%)
🟢 GeForce RTX 3060 Ti16.4984482644 (90%)5129 (88%)4718 (81%)
🟢 RTX A400019.17164482500 (85%)4945 (85%)4664 (80%)
🟢 RTX A5000M16.59164482228 (76%)4461 (77%)3662 (63%)
🟢 GeForce RTX 306013.17123602108 (90%)4070 (87%)3566 (76%)
🟢 GeForce RTX 3060M10.9463362019 (92%)4012 (92%)3572 (82%)
🟢 GeForce RTX 3050M Ti7.6041921181 (94%)2341 (94%)2253 (90%)
🟢 GeForce RTX 3050M7.1341921180 (94%)2339 (94%)2016 (81%)
🟢 Titan RTX16.31246723471 (79%)7456 (85%)7554 (87%)
🟢 Quadro RTX 600016.31246723307 (75%)6836 (78%)6879 (79%)
🟢 Quadro RTX 8000 Passive14.93486242591 (64%)5408 (67%)5607 (69%)
🟢 GeForce RTX 2080 Ti13.45116163194 (79%)6700 (84%)6853 (86%)
🟢 GeForce RTX 2080 Super11.3484962434 (75%)5284 (82%)5087 (79%)
🟢 Quadro RTX 500011.15164482341 (80%)4766 (82%)4773 (82%)
🟢 GeForce RTX 2070 Super9.2284482255 (77%)4866 (84%)4893 (84%)
🟢 GeForce RTX 2060 Super7.1884482503 (85%)5035 (87%)4463 (77%)
🟢 Quadro RTX 40007.1284162284 (84%)4584 (85%)4062 (75%)
🟢 GeForce RTX 2060 KO6.7463361643 (75%)3376 (77%)3266 (75%)
🟢 GeForce RTX 20606.7463361681 (77%)3604 (83%)3571 (82%)
🟢 GeForce GTX 1660 Super5.0363361696 (77%)3551 (81%)3040 (70%)
🟢 Tesla T48.14153001356 (69%)2869 (74%)2887 (74%)
🟢 GeForce GTX 1660 Ti5.4862881467 (78%)3041 (81%)3019 (81%)
🟢 GeForce GTX 16605.0761921016 (81%)1924 (77%)1992 (80%)
🟢 GeForce GTX 1650M 896C2.724192963 (77%)1836 (74%)1858 (75%)
🟢 GeForce GTX 1650M 1024C3.204128706 (84%)1214 (73%)1400 (84%)
🟢 T5003.04480339 (65%)578 (56%)665 (64%)
🟢 Titan Xp12.15125482919 (82%)5495 (77%)5375 (76%)
🟢 GeForce GTX 1080 Ti12.06114842631 (83%)4837 (77%)4877 (78%)
🟢 GeForce GTX 10809.7883201623 (78%)3100 (75%)3182 (77%)
🟢 GeForce GTX 1060 6GB4.576192997 (79%)1925 (77%)1785 (72%)
🟢 GeForce GTX 1060M4.446192983 (78%)1882 (75%)1803 (72%)
🟢 GeForce GTX 1050M Ti2.494112631 (86%)1224 (84%)1115 (77%)
🟢 Quadro P10001.89482426 (79%)839 (79%)778 (73%)
🟢 GeForce GTX 9704.174224980 (67%)1721 (59%)1623 (56%)
🟢 Quadro M40002.578192899 (72%)1519 (61%)1050 (42%)
🟢 Tesla M60 (1 GPU)4.828160853 (82%)1571 (76%)1557 (75%)
🟢 GeForce GTX 960M1.51480442 (84%)872 (84%)627 (60%)
🟢 GeForce GTX 7703.332224800 (55%)1215 (42%)876 (30%)
🟢 GeForce GTX 680 4GB3.334192783 (62%)1274 (51%)814 (33%)
🟢 Quadro K20000.73264312 (75%)444 (53%)171 (21%)
🟢 GeForce GT 630 (OEM)0.46229151 (81%)185 (50%)78 (21%)
🟢 Quadro NVS 2900.031/469 (22%)4 ( 5%)4 ( 5%)
🟤 Arise 10201.502196 ( 5%)6 ( 2%)6 ( 2%)
⚪ M2 Max GPU 38CU 32GB9.73224002405 (92%)4641 (89%)2444 (47%)
⚪ M1 Ultra GPU 64CU 128GB16.38988004519 (86%)8418 (81%)6915 (67%)
⚪ M1 Max GPU 24CU 32GB6.14224002369 (91%)4496 (87%)2777 (53%)
⚪ M1 Pro GPU 16CU 16GB4.10112001204 (92%)2329 (90%)1855 (71%)
⚪ M1 GPU 8CU 16GB2.051168384 (86%)758 (85%)759 (86%)
🔴 Radeon 780M (Z1 Extreme)8.298102443 (66%)860 (65%)820 (62%)
🔴 Radeon Graphics (7800X3D)0.5612102338 (51%)498 (37%)283 (21%)
🔴 Radeon Vega 8 (4750G)2.152757263 (71%)511 (70%)501 (68%)
🔴 Radeon Vega 8 (3500U)1.23738157 (63%)282 (57%)288 (58%)
🔵 Arc 140V GPU (16GB)3.9916137393 (44%)1189 (67%)608 (34%)
🔵 Arc Graphics (Ultra 9 185H)4.811490271 (46%)710 (61%)724 (62%)
🔵 Iris Xe Graphics (i7-1265U)1.921377342 (68%)621 (62%)574 (58%)
🔵 UHD Graphics Xe 32EUs0.742551128 (38%)245 (37%)216 (32%)
🔵 UHD Graphics 7700.823090342 (58%)475 (41%)278 (24%)
🔵 UHD Graphics 6300.46751151 (45%)301 (45%)187 (28%)
🔵 UHD Graphics P6300.465142177 (65%)288 (53%)137 (25%)
🔵 HD Graphics 55000.3532675 (45%)192 (58%)108 (32%)
🔵 HD Graphics 46000.38226105 (63%)115 (35%)34 (10%)
🟡 Mali-G610 MP4 (Orange Pi 5)0.061634130 (58%)232 (52%)93 (21%)
🟡 Mali-G72 MP18 (Samsung S9+)0.24429110 (59%)230 (62%)21 ( 6%)
🔴 2x EPYC 975450.7930729223276 (54%)5077 (42%)5179 (43%)
🔴 2x EPYC 965443.6215369221381 (23%)1814 (15%)1801 (15%)
🔴 2x EPYC 73523.53512410739 (28%)106 ( 2%)412 ( 8%)
🔴 2x EPYC 73133.07128410498 (19%)367 ( 7%)418 ( 8%)
🔴 2x EPYC 73023.07128410784 (29%)336 ( 6%)411 ( 8%)
🔵 2x Xeon 6980P98.30614416907875 (71%)5112 (23%)5610 (26%)
🔵 2x Xeon 6979P92.16307216908135 (74%)4175 (19%)4622 (21%)
🔵 2x Xeon Platinum 8592+31.1310247173135 (67%)2359 (25%)2466 (26%)
🔵 2x Xeon CPU Max 948027.242566142037 (51%)1520 (19%)1464 (18%)
🔵 2x Xeon Platinum 8480+28.675126142162 (54%)1845 (23%)1884 (24%)
🔵 2x Xeon Platinum 838023.5520484101410 (53%)1159 (22%)1298 (24%)
🔵 2x Xeon Platinum 835821.302564101285 (48%)1007 (19%)1120 (21%)
🔵 2x Xeon Platinum 82563.891536282396 (22%)158 ( 4%)175 ( 5%)
🔵 2x Xeon Platinum 81538.19384256691 (41%)290 ( 9%)328 (10%)
🔵 2x Xeon Gold 6248R18.43384282755 (41%)566 (15%)694 (19%)
🔵 2x Xeon Gold 61285.22192256254 (15%)185 ( 6%)193 ( 6%)
🔵 Xeon Phi 72105.32192102415 (62%)193 (15%)223 (17%)
🔵 4x Xeon E5-4620 v42.69512273460 (26%)275 ( 8%)239 ( 7%)
🔵 2x Xeon E5-2630 v41.4164137264 (30%)146 ( 8%)129 ( 7%)
🔵 2x Xeon E5-2623 v40.6764137125 (14%)66 ( 4%)59 ( 3%)
🔵 2x Xeon E5-2680 v31.92128137304 (34%)234 (13%)291 (16%)
🟢 GH200 Neoverse-V2 CPU7.884803841323 (53%)853 (17%)683 (14%)
🔴 Threadripper PRO 7995WX15.362563331134 (52%)1697 (39%)1715 (40%)
🔴 Threadripper 3970X3.79128102376 (56%)103 ( 8%)463 (35%)
🔴 Threadripper 1950X0.8712885273 (49%)43 ( 4%)151 (14%)
🔴 Ryzen 7 7800X3D1.0832102296 (44%)361 (27%)363 (27%)
🔴 Ryzen 7 5700X3D0.873251229 (68%)135 (20%)173 (26%)
🔴 FX-61000.16162611 ( 7%)11 ( 3%)22 ( 7%)
🔴 Athlon X2 QL-650.034113 ( 4%)2 ( 2%)3 ( 2%)
🔵 Core Ultra 7 258V0.5632137287 (32%)123 ( 7%)167 ( 9%)
🔵 Core Ultra 9 185H1.791690317 (54%)267 (23%)288 (25%)
🔵 Core i7-13700K2.516490504 (86%)398 (34%)424 (36%)
🔵 Core i7-1265U1.233277128 (26%)62 ( 6%)58 ( 6%)
🔵 Core i9-11900KB0.843251109 (33%)195 (29%)208 (31%)
🔵 Core i9-10980XE3.2312894286 (47%)251 (21%)223 (18%)
🔵 Xeon E-2288G0.953243196 (70%)182 (33%)198 (36%)
🔵 Core i7-97000.776443103 (37%)62 (11%)95 (17%)
🔵 Core i5-96000.601643146 (52%)127 (23%)147 (27%)
🔵 Core i7-8700K0.711651152 (45%)134 (20%)116 (17%)
🔵 Xeon E-2176G0.716442201 (74%)136 (25%)148 (27%)
🔵 Core i7-7700HQ0.36123881 (32%)82 (16%)108 (22%)
🔵 Xeon E3-1240 v50.503234141 (63%)75 (17%)88 (20%)
🔵 Core i7-47700.441626104 (62%)69 (21%)59 (18%)
🔵 Core i7-4720HQ0.33162680 (48%)23 ( 7%)60 (18%)
🔵 Celeron N28070.014117 (10%)3 ( 2%)3 ( 2%)
</details>

Multi-GPU Benchmarks

Multi-GPU benchmarks are done at the largest possible grid resolution with cubic domains, and either 2x1x1, 2x2x1 or 2x2x2 of these domains together. The (percentages in round brackets) are single-GPU roofline model efficiency, and the (multiplicators in round brackets) are scaling factors relative to benchmarked single-GPU performance.

<details><summary>Multi-GPU Benchmark Table</summary>

Colors: 🔴 AMD, 🔵 Intel, 🟢 Nvidia, ⚪ Apple, 🟡 ARM, 🟤 Glenfly

DeviceFP32<br>[TFlops/s]Mem<br>[GB]BW<br>[GB/s]FP32/FP32<br>[MLUPs/s]FP32/FP16S<br>[MLUPs/s]FP32/FP16C<br>[MLUPs/s]
🔴 1x Instinct MI250 (1 GCD)45.266416385638 (53%)9030 (42%)8506 (40%)
🔴 1x Instinct MI250 (2 GCD)90.5212832779460 (1.7x)14313 (1.6x)17338 (2.0x)
🔴 2x Instinct MI250 (4 GCD)181.04256655416925 (3.0x)29163 (3.2x)29627 (3.5x)
🔴 4x Instinct MI250 (8 GCD)362.085121310727350 (4.9x)52258 (5.8x)53521 (6.3x)
🔴   1x Instinct MI21045.266416386347 (59%)8486 (40%)9105 (43%)
🔴   2x Instinct MI21090.5212832777245 (1.1x)12050 (1.4x)13539 (1.5x)
🔴   4x Instinct MI210181.0425665548816 (1.4x)17232 (2.0x)16892 (1.9x)
🔴   8x Instinct MI210362.085121310713546 (2.1x)27996 (3.3x)27820 (3.1x)
🔴 16x Instinct MI210724.1610242621418094 (2.9x)37360 (4.4x)37922 (4.2x)
🔴 24x Instinct MI2101086.2415363932222056 (3.5x)45033 (5.3x)44631 (4.9x)
🔴 32x Instinct MI2101448.3220485242923881 (3.8x)50952 (6.0x)48848 (5.4x)
🔴 1x Radeon VII13.831610244898 (73%)7778 (58%)5256 (40%)
🔴 2x Radeon VII27.663220488113 (1.7x)15591 (2.0x)10352 (2.0x)
🔴 4x Radeon VII55.3264409612911 (2.6x)24273 (3.1x)17080 (3.2x)
🔴 8x Radeon VII110.64128819221946 (4.5x)30826 (4.0x)24572 (4.7x)
🔵 1x DC GPU Max 110022.224812293487 (43%)6209 (39%)3252 (20%)
🔵 2x DC GPU Max 110044.449624586301 (1.8x)11815 (1.9x)5970 (1.8x)
🔵 4x DC GPU Max 110088.88192491512162 (3.5x)22777 (3.7x)11759 (3.6x)
🟢 1x A100 PCIe 80GB19.498019359657 (76%)17896 (71%)10817 (43%)
🟢 2x A100 PCIe 80GB38.98160387015742 (1.6x)27165 (1.5x)17510 (1.6x)
🟢 4x A100 PCIe 80GB77.96320774025957 (2.7x)52056 (2.9x)33283 (3.1x)
🟢 1x PG506-243 / PG506-24222.146416388195 (77%)15654 (74%)12271 (58%)
🟢 2x PG506-243 / PG506-24244.28128327713885 (1.7x)24168 (1.5x)20906 (1.7x)
🟢 4x PG506-243 / PG506-24288.57256655423097 (2.8x)41088 (2.6x)36130 (2.9x)
🟢 1x A100 SXM4 40GB19.494015558543 (84%)15917 (79%)8748 (43%)
🟢 2x A100 SXM4 40GB38.9880311014311 (1.7x)23707 (1.5x)15512 (1.8x)
🟢 4x A100 SXM4 40GB77.96160622023411 (2.7x)42400 (2.7x)29017 (3.3x)
🟢 8x A100 SXM4 40GB155.923201244037619 (4.4x)72965 (4.6x)63009 (7.2x)
🟢 1x A100 SXM4 40GB19.494015558522 (84%)16013 (79%)11251 (56%)
🟢 2x A100 SXM4 40GB38.9880311013629 (1.6x)24620 (1.5x)18850 (1.7x)
🟢 4x A100 SXM4 40GB77.96160622017978 (2.1x)30604 (1.9x)30627 (2.7x)
🟢 1x Tesla V100 SXM2 32GB15.67329004471 (76%)8947 (77%)7217 (62%)
🟢 2x Tesla V100 SXM2 32GB31.346418007953 (1.8x)15469 (1.7x)12932 (1.8x)
🟢 4x Tesla V100 SXM2 32GB62.68128360013135 (2.9x)26527 (3.0x)22686 (3.1x)
🟢 1x Tesla K40m4.29122881131 (60%)1868 (50%)912 (24%)
🟢 2x Tesla K40m8.58245771971 (1.7x)3300 (1.8x)1801 (2.0x)
🟢 3x K40m + 1x Titan Xp17.164811543117 (2.8x)5174 (2.8x)3127 (3.4x)
🟢 1x Tesla K80 (1 GPU)4.1112240916 (58%)1642 (53%)943 (30%)
🟢 1x Tesla K80 (2 GPU)8.22244802086 (2.3x)3448 (2.1x)2174 (2.3x)
🟢 1x RTX A600040.00487684421 (88%)8814 (88%)8533 (86%)
🟢 2x RTX A600080.009615368041 (1.8x)15026 (1.7x)14795 (1.7x)
🟢 4x RTX A6000160.00192307214314 (3.2x)27915 (3.2x)27227 (3.2x)
🟢 8x RTX A6000320.00384614419311 (4.4x)40063 (4.5x)39004 (4.6x)
🟢 1x Quadro RTX 8000 Pa.14.93486242591 (64%)5408 (67%)5607 (69%)
🟢 2x Quadro RTX 8000 Pa.29.869612484767 (1.8x)9607 (1.8x)10214 (1.8x)
🟢 1x GeForce RTX 2080 Ti13.45116163194 (79%)6700 (84%)6853 (86%)
🟢 2x GeForce RTX 2080 Ti26.902212325085 (1.6x)10770 (1.6x)10922 (1.6x)
🟢 4x GeForce RTX 2080 Ti53.804424649117 (2.9x)18415 (2.7x)18598 (2.7x)
🟢 7x 2080 Ti + 1x A100 40GB107.6088492816146 (5.1x)33732 (5.0x)33857 (4.9x)
🔵 1x A770 + 🟢 1x Titan Xp24.302410954717 (1.7x)8380 (1.7x)8026 (1.6x)
</details>

FAQs

General

Hardware

Graphics

Licensing

External Code/Libraries/Images used in FluidX3D

References

Contact

Support

I'm developing FluidX3D in my spare time, to make computational fluid dynamics lightning fast, accessible on all hardware, and free for everyone.