QoS policies
Qnet supports transmission over multiple networks and provides several policies for specifying how Qnet should select a network interface for transmission.
- loadbalance (the default)
- Qnet can use all available network links, and will share transmission equally among them.
- preferred
- Qnet uses one specified link, ignoring all other networks (unless the preferred one fails).
- exclusive
- Qnet uses one—and only one—link, ignoring all others, even if the exclusive link fails.
If the link that's currently in use fails, Qnet detects the failure, but doesn't switch to the other link because both links go to the same hub. It's up to the application to recover from the error; when the application reestablishes the connection, Qnet switches to the working link.
If the networks are physically separate and a link fails, Qnet automatically switches to another link, depending on the QoS that you chose. The application isn't aware that the first link failed.
You can use the tx_retries option to lsm-qnet.so to limit the number of times that Qnet retries a transmission, and hence control how long Qnet waits before deciding that a link has failed. Note that if the number of retries is too low, Qnet won't tolerate any lost packets and may prematurely decide that a link is down.
- loadbalance
- Qnet decides which links to use for sending packets,
depending on current load and link speeds as determined by io-pkt*.
A packet is queued on the link that can deliver the packet the soonest to the remote end.
This effectively provides greater bandwidth between nodes when
the links are up (the bandwidth is the sum of the bandwidths
of all available links), and allows a graceful degradation
of service when links fail.
If a link does fail, Qnet will switch to the next available link. This switch takes a few seconds the first time, because the network driver on the bad link will have timed out, retried, and finally died. But once Qnet
knows
that a link is down, it will not send user data over that link.While load-balancing among the live links, Qnet will send periodic maintenance packets on the failed link in order to detect recovery. When the link recovers, Qnet places it back into the pool of available links.
- preferred
- With this policy, you specify a preferred link to use for transmissions.
Qnet will use only that one link until it fails.
If your preferred link fails, Qnet will then turn to
the other available links and resume transmission, using the
loadbalance policy.
Once your preferred link is available again, Qnet will again use only that link, ignoring all others (unless the preferred link fails).
- exclusive
- You use this policy when you want to lock transmissions to only one link.
Regardless of how many other links are
available, Qnet will latch onto the one interface you specify.
And if that exclusive link fails, Qnet will NOT use any other link.
Why would you want to use the exclusive policy? Suppose you have two networks, one much faster than the other, and you have an application that moves large amounts of data. You might want to restrict transmissions to only the fast network in order to avoid swamping the slow network under failure conditions.
