AF_XDP TX Metadata¶
This document describes how to enable offloads when transmitting packets via AF_XDP. Refer to XDP RX Metadata on how to access similar metadata on the receive side.
General Design¶
The headroom for the metadata is reserved via tx_metadata_len
in
struct xdp_umem_reg
. The metadata length is therefore the same for
every socket that shares the same umem. The metadata layout is a fixed UAPI,
refer to union xsk_tx_metadata
in include/uapi/linux/if_xdp.h
.
Thus, generally, the tx_metadata_len
field above should contain
sizeof(union xsk_tx_metadata)
.
The headroom and the metadata itself should be located right before
xdp_desc->addr
in the umem frame. Within a frame, the metadata
layout is as follows:
tx_metadata_len
/ \
+-----------------+---------+----------------------------+
| xsk_tx_metadata | padding | payload |
+-----------------+---------+----------------------------+
^
|
xdp_desc->addr
An AF_XDP application can request headrooms larger than sizeof(struct
xsk_tx_metadata)
. The kernel will ignore the padding (and will still
use xdp_desc->addr - tx_metadata_len
to locate
the xsk_tx_metadata
). For the frames that shouldn’t carry
any metadata (i.e., the ones that don’t have XDP_TX_METADATA
option),
the metadata area is ignored by the kernel as well.
The flags field enables the particular offload:
XDP_TXMD_FLAGS_TIMESTAMP
: requests the device to put transmission timestamp intotx_timestamp
field ofunion xsk_tx_metadata
.XDP_TXMD_FLAGS_CHECKSUM
: requests the device to calculate L4 checksum.csum_start
specifies byte offset of where the checksumming should start andcsum_offset
specifies byte offset where the device should store the computed checksum.
Besides the flags above, in order to trigger the offloads, the first
packet’s struct xdp_desc
descriptor should set XDP_TX_METADATA
bit in the options
field. Also note that in a multi-buffer packet
only the first chunk should carry the metadata.
Software TX Checksum¶
For development and testing purposes its possible to pass
XDP_UMEM_TX_SW_CSUM
flag to XDP_UMEM_REG
UMEM registration call.
In this case, when running in XDK_COPY
mode, the TX checksum
is calculated on the CPU. Do not enable this option in production because
it will negatively affect performance.
Querying Device Capabilities¶
Every devices exports its offloads capabilities via netlink netdev family.
Refer to xsk-flags
features bitmask in
Documentation/netlink/specs/netdev.yaml
.
tx-timestamp
: device supportsXDP_TXMD_FLAGS_TIMESTAMP
tx-checksum
: device supportsXDP_TXMD_FLAGS_CHECKSUM
See tools/net/ynl/samples/netdev.c
on how to query this information.
Example¶
See tools/testing/selftests/bpf/xdp_hw_metadata.c
for an example
program that handles TX metadata. Also see https://github.com/fomichev/xskgen
for a more bare-bones example.