Since our first blog post on how to retrieve packet drop reasons in the Linux kernel, upstream development of the feature has continued and new additions have been made. Drop reasons can be retrieved manually, but they are also used by an increasing number of utilities such as the Network Observability operator for Red Hat OpenShift Container Platform, which can report packets being dropped with their reasons.
Let's see what happened recently in the drop reason space of the Linux kernel and how to avoid pitfalls, especially between kernel versions. It's worth noting tools designed on top of drop reasons, like the above operator, are already doing the right thing and do not need special care. But as we saw in the previous article, drop reasons can be retrieved manually when debugging networking issues which can be error prone when not understanding in depth how this works or when not using the right tools.
Non-core drop reasons
In addition to core drop reasons, discussed in the previous blog post and defined in enum skb_drop_reason
, support for registering non-core drop reasons was added. This allows other parts of the Linux networking stack to register their own drop reasons to improve visibility into why packets are being dropped there.
At the time of writing, two non-core parts of the Linux networking stack register their own drop reasons: the IEEE 802.11 stack (mac80211) and Open vSwitch.
This works by allowing registering at runtime an additional set of drop reasons, which virtually extends the core definition. Since all drop reasons, core and non-core, have a unique value and can be used in the same core functions, current tools and facilities do not need any modification to report the new drop reasons raw values. However converting those to text is not supported everywhere. We'll see this below.
Drop reasons pitfalls
As we just saw, converting drop reasons to text, especially non-core ones, is not always built-in. But it's not the biggest pitfall. Drop reasons are defined in kernel enums and are not part of a stable ABI. This means, and that was actually the case a few times already, that their raw value can change between kernel releases—for example, when a new reason is added in between existing ones, or when reasons are rearranged. Because of this, different versions of the Linux kernel, including Red Hat Enterprise Linux (RHEL), might report different raw values for the same drop reason.
This is not an issue for tools converting the raw value to a text representation, but not all perform this raw to text translation. This means a raw drop reason value should be checked against the running kernel definition. Of course, there are better ways.
Recommendations
There are two ways of performing a raw value to text conversion for drop reasons while still being version dependent: using an in-kernel conversion or inspecting the running kernel internal definitions and using those.
We'll see below three different tools you can use to inspect drop reasons, that (mostly) fit the above requirement.
Perf
By adding a probe on the skb:kfree_skb
tracepoint, we can use its in-kernel translation of drop reasons. However, at the time of writing, this implementation did not support converting non-core drop reasons to a text representation.
While this is not perfect, using perf
on the above tracepoint is a good way of reporting drop reasons when inspecting drops happening in the core networking stack; also because this is a very simple way of getting this information as perf
is widely available.
$ perf record -e skb:kfree_skb sleep 10
$ perf script
curl 103998 [010] 40186.014474: skb:kfree_skb: [...] reason: NO_SOCKET
curl 103998 [010] 40186.014555: skb:kfree_skb: [...] reason: NO_SOCKET
irq/178-iwlwifi 1289 [000] 44222.379744: skb:kfree_skb: [...] reason: 0x10002
In the above example we can see two packets being dropped because no matching socket was found and one packet dropped with a raw drop reason, 0x10002. This drop reason is a non-core one and on the machine used it corresponds to a mac80211 drop reason, namely RX_DROP_U_REPLAY
.
Dropwatch
dropwatch
uses the kernel dropmon
infrastructure which is, at the time of writing, the only in-kernel implementation for non-core drop reasons as text. Because of this, using dropwatch
is one of the preferred ways of inspecting drops in the kernel with their associated reasons.
For an example of how to use dropwatch
, see the previous blog post on drop reasons.
Retis
Last but not least, a new kernel packet inspection tool was developed recently, supporting collecting packets in various places of the Linux networking stack: Retis. When asked to report drop reasons, Retis performs a runtime conversion of drop reasons to a text representation by inspecting the running kernel internal definitions using a technology called BPF Type Format (BTF). This means it always has a right raw to text drop reasons translation, regardless of the kernel version running on the system.
Retis is highly configurable but provide sane built-in defaults such as its drop monitoring profile, dropmon
:
$ retis -p dropmon collect
16:52:39 [INFO] Applying profile dropmon: Default
16:52:39 [INFO] 4 probe(s) loaded
40648351222101 [curl] 104769 [tp] skb:kfree_skb drop (NO_SOCKET)
bpf_prog_0b1566e4b83190c5_sd_devices+0xce8d
bpf_prog_0b1566e4b83190c5_sd_devices+0xce8d
bpf_trace_run3+0x52
kfree_skb_reason+0x8f
tcp_v6_rcv+0x77
ip6_protocol_deliver_rcu+0x6b
ip6_input_finish+0x43
__netif_receive_skb_one_core+0x62
process_backlog+0x85
__napi_poll+0x28
net_rx_action+0x2a4
__do_softirq+0xd1
do_softirq.part.0+0x3d
__local_bh_enable_ip+0x68
__dev_queue_xmit+0x28e
ip6_finish_output2+0x2ae
ip6_finish_output+0x1e0
ip6_xmit+0x2c0
inet6_csk_xmit+0xe9
__tcp_transmit_skb+0x56a
tcp_connect+0xb37
tcp_v6_connect+0x512
__inet_stream_connect+0x10f
inet_stream_connect+0x3a
__sys_connect+0xa8
__x64_sys_connect+0x18
do_syscall_64+0x5d
entry_SYSCALL_64_after_hwframe+0x6e
if 1 (lo) rxif 1 ::1.52414 > ::1.80 ttl 64 label 0x98864 len 40 proto TCP (6) flags [S] seq 2567277025 win 33280
...
In the above example, we can see an IPv6 packet to [::1]:80
was dropped because no socket is listening for such flow. It also reported detailed information about the packet itself, as well as a stack trace.
Thanks to its automatic translation of drop reasons and because it offers flexibility and additional features (probing in many places of the stack in parallel, packets tracking, conntrack and Open vSwitch support, post-processing capabilities, etc.), Retis is a good choice for tracking dropped packets as well as inspecting the Linux networking stack in general. A packet can not only be seen while being dropped, but tracked in the whole networking stack.
Conclusion
Kernel support for drop reasons is increasing over time, now offering drop reasons from non-core parts of the Linux networking stack. All this is very good news as this improves visibility and gives more insight about why some packets are being dropped. While retrieving and making sense of the drop reasons can be tricky due to its implementation, it's easy to avoid pitfalls by understanding how drop reasons work and by using the right tools. Non-core drop reasons are available in recent RHEL 9.2 releases and in RHEL 9.3.
Last updated: January 29, 2024