LEVIATHAN SYSTEMS
← Back to Glossary

What Is Fat-Tree Topology?_

Fat-tree is a network topology used in GPU cluster switch fabrics where bandwidth increases (gets "fatter") toward the top of the tree. It provides full bisection bandwidth, meaning any group of GPUs can communicate with any other group at maximum speed. Fat-tree requires more switches than alternative topologies but delivers the most predictable performance for AI training workloads.

Technical Details

A fat-tree network topology was originally proposed by Charles Leiserson and is the dominant topology for InfiniBand GPU cluster fabrics. In a fat-tree, each level of the switch hierarchy has the same aggregate bandwidth as the level below it, ensuring no oversubscription. This means that any arbitrary subset of compute nodes can communicate with any other subset at full line rate — critical for the all-to-all communication patterns in distributed AI training. Fat-tree networks consist of leaf switches (connected to compute nodes), spine switches (connecting leaf switches), and potentially core switches (for very large clusters). The cabling between switch tiers must be precise and balanced: asymmetric cabling creates bandwidth bottlenecks that degrade training performance.

How Leviathan Systems Works with Fat-Tree Topology

Leviathan Systems installs the physical cabling infrastructure for fat-tree network topologies, ensuring balanced, symmetric fiber connections between switch tiers for optimal cluster performance.