Configuring for benchmarking
Disabling source port randomization
Many high-performance network interfaces support Receive Side Scaling, which efficiently distributes traffic flows to different queues using a hashed value. This value is usually based on a combination of the source and destination addresses and their port numbers.
By default, io-sock uses a random, ephemeral source port when it creates a new traffic flow, which makes carrying out a spoofing attack more difficult.
However, when you are benchmarking an interface and the way its traffic flow is distributed is based in part on a random source port, each run may use different traffic distributions across the multiple queues and, therefore, generate different results from run to run.
The use of a random source port is controlled by the sysctl
variable net.inet.ip.portrange.randomized, which defaults to 1
(enabled). If you set it to 0 (disabled), io-sock assigns the
ephemeral source ports in sequential order, which helps ensure that the hashing of
different flows to different queues is done in a consistent manner.
Because it reduces security, disabling source port randomization should only be done during benchmarking and never in a production system.
Performance may be different when source port randomization is disabled. How different depends on the exact hashing algorithm implemented in the network interface hardware and the mixture of traffic used in the benchmark. Although disabling randomization can help you maintain consistency from run to run, especially when experimenting with other performance changes, it is not in itself a method for improving performance.
Benchmarking multiple UDP transmission streams with fragmentation
When a UDP datagram exceeds the MTU of an interface, io-sock fragments the datagram across multiple IP packets before it places them on an interface's transmit queue. This behavior can significantly skew benchmarking results.
By default, all UDP traffic goes to queue 0. When there are multiple streams of traffic, multiple io-sock threads can write to this single queue, but only a single thread is used for dequeuing. As a result, enqueuing the packets happens faster than dequeuing and the queue overflows and packets are dropped.
For example, an 8k UDP datagram is fragmented across six packets, and if any of them is lost, reassembly on the receiving side fails. This arrangement can result in many packets occupying the transmit queue, being sent over the wire, and received by the far end, but then dropped for failure to reassemble. Poor throughput is reported.
These transmit packet drops are reported back to the application as ENOBUFS. Because many benchmarking applications were initially written for Linux-style network stacks where UDP writes block when queues fill (unlike BSD-style network stacks where the writes return the ENOBUFS error) they tend to ignore the ENOBUFS and just immediately write again, usually unsuccessfully.
The end result is that for this UDP transmission scenario, benchmarking applications may report throughput that is much lower than what io-sock and the interface are capable of. Many benchmarking applications can ratelimit the transmission of UDP, and you should use that capability to approach the maximum transmission rate carefully, rather than significantly exceeding it and ending up with excessive packet drops and significantly lower throughput.
