Skip to content

Commit c866c43

Browse files
Update README.md (#16)
Fix typo in the README.
1 parent 75ad160 commit c866c43

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ PYTHONPATH=./python:$PYTHONPATH python3 test/test_gemm_only.py 4096 12288 6144 -
6161
## Performance
6262
We measured the examples from the above demo on both A800s and H800s. Each machine has 8 GPUs, with a TP size set to 8. The table below shows the performance comparison between flux and torch+nccl. It can be observed that by overlapping fine-grained computation and communication, Flux is able to effectively hide a significant portion of the communication time
6363

64-
| | M | K | N | Torch Gemm | Torch NCCL | Torch Total | Flux Gemm | Flux NCCL | Flux Total |
64+
| | M | K | N | Torch Gemm | Torch NCCL | Torch Total | Flux Gemm | Flux Comm | Flux Total |
6565
|----------|----------|----------|----------|----------|----------|----------|----------|----------|-----------|
6666
| AG+Gemm(A800) | 4096 | 12288 | 49152 | 2.438ms | 0.662ms | 3.099ms | 2.378ms | 0.091ms | 2.469ms |
6767
| Gemm+RS(A800) | 4096 | 49152 | 12288 | 2.453ms | 0.646ms | 3.100ms | 2.429ms | 0.080ms | 2.508ms |

0 commit comments

Comments
 (0)