TL;DR
- Dragonfly organises endpoints into groups; within a group, switches are fully meshed; between groups, every pair connects through a long-haul link.
- Designed by Kim, Dally, et al (2008) to minimise long-haul cable cost while preserving low diameter and high bisection for HPC workloads.
- Used in HPE Slingshot (Frontier, El Capitan, Aurora) and earlier Cray Aries; rarely chosen for commercial AI clusters where fat trees dominate.
- Dragonfly+ is a variant with leaf-spine internal groups, simplifying routing and improving load balancing under adaptive-routing policies.
Overview#
Dragonfly is a hierarchical topology in which endpoints are partitioned into groups. Inside each group, every switch is directly connected to every other switch (an internal full mesh). Between groups, every group has at least one direct link to every other group. The result is a topology of low diameter — at most three hops between any two endpoints — at far lower cable cost than a flat fat tree of equivalent scale.
The design was published by Kim, Dally, Scott, and Abts in 2008 with the explicit goal of reducing the optical-cable cost of large supercomputers. Long-haul links between groups carry less aggregate traffic than spine links in a fat tree of comparable bisection, so fewer of them are needed.
Where It Shows Up#
Dragonfly has been the topology of choice for several Cray/HPE systems: Aries on the XC series, Slingshot on the EX series (used in Frontier, El Capitan, LUMI, Lumi, and Aurora). It has also appeared in Mellanox InfiniBand Dragonfly+ deployments at HPC sites.
In commercial AI clusters, dragonfly is rare. Fat trees with full bisection are easier to reason about for training workloads with very uniform AllReduce patterns; dragonfly's variable per-flow latency between hops is harder for collective libraries to amortise.
Variants#
- Dragonfly (canonical): groups internally fully meshed, group-to-group direct links.
- Dragonfly+: groups internally use a leaf-spine, simplifying cabling at large scale.
- Megafly / Dragonfly-Plus-Plus: deeper variants for very large supercomputers.
Operational Notes#
- Adaptive routing is essential — global-link congestion at one group quickly hotspots without it.
- Slingshot's 'rosetta' routing combines minimal and non-minimal routes with congestion signalling.
- Compared with fat trees, dragonfly has worst-case latency 1-2 hops higher but average cable length and cost meaningfully lower.