tproxy 问题调试

本贴最后更新于 370 天前,其中的信息可能已经天翻地覆

本文暴露了 mac 地址与 ip 地址,是个不好(偷懒)的行为(但是都是虚拟地址好像也没有什么问题)

当前 tproxy 策略

table ip clash {
        chain prerouting {
                type filter hook prerouting priority mangle; policy accept;
                iifname { "nrpodman0", "virbr*" } counter packets 320091 bytes 22773419 return
                ip daddr { 0.0.0.0, 10.0.0.0/8, 127.0.0.0/8, 172.0.0.0/8, 192.168.0.0/16, 255.255.255.255 } counter packets 8589080 bytes 15481604885 return
                meta l4proto { tcp, udp } meta mark set 0x00000001 tproxy to 127.0.0.1:9999 counter packets 538989 bytes 35138055 accept
        }
                                                                                                                                              
        chain dns {
                type nat hook prerouting priority dstnat; policy accept;
                iifname { "nrpodman0", "podman*", "virbr*" } counter packets 125240 bytes 6336771 return
                meta l4proto { tcp, udp } th dport 53 counter packets 0 bytes 0 redirect to :1053
        }
                                                                                                                                              
        chain dnsout {
                type nat hook output priority filter; policy accept;
                ip daddr { 52.80.66.66, 117.50.11.11 } return
                socket cgroupv2 level 2 "system.slice/system-clash.slice" counter packets 38961 bytes 2350103 return
                meta l4proto { tcp, udp } th dport 53 counter packets 5494 bytes 382682 redirect to :1053
        }
                                                                                                                                              
        chain output {
                type route hook output priority mangle; policy accept;
                ip daddr { 0.0.0.0, 10.0.0.0/8, 127.0.0.0/8, 172.0.0.0/8, 192.168.0.0/16, 224.0.0.0/4, 255.255.255.255 } counter packets 2687593 bytes 9043504571 return
                socket cgroupv2 level 2 "system.slice/system-clash.slice" counter packets 2811815 bytes 221955079 return
                meta l4proto { tcp, udp } meta mark set 0x00000001 counter packets 528997 bytes 32193984
        }
                                                                                                                                              
        chain divert {
                type filter hook prerouting priority mangle; policy accept;
                meta l4proto tcp socket transparent 1 meta mark set 0x00000001 counter packets 475808 bytes 29798177 accept
        }
}

dns,dnsout,divert 这三条链不管,核心在于 output 打上 0x01 的标记
然后在 iptables 中,当数据包出口时,会被 reroute,一个是这里的 output,另一个就是 forward
所以这里还需要

ip rule add fwmark 1 table 100
ip route add local 0.0.0.0/0 dev lo table 100

这里会把所有的数据重定向到 lo 这个 device
重定向后其实就 output 变成 input 了,所以会被 prerouting

meta l4proto { tcp, udp } meta mark set 0x00000001 tproxy to 127.0.0.1:9999

这里将包转入到 tproxy,后续所有都是交给 tproxy 来处理,而 tproxy 处理方式简单来说就是把 socket 换掉,虽然并不需要关心这个细节。

tproxy 简介完毕,进入正题。

正文

在 podman 使用了 docker-in-docker 之后,tproxy 的代理网络出现了问题。现象就是所有容器内的网络从 tproxy 出去之后收不到回包了。
找个网址做对比测试,比如 www.baidu.com,由于我全局走了 fake-ip,所以直接 curl 百度拿到的是一个虚拟 ip。所以先拿到 baidu.com 一个 ip 地址。
得到 36.155.132.3 为 baidu.com 一个 ip 地址。

本地直接连一下试试

echo > /dev/tcp/36.155.132.3/80;echo $?
0

确认可以正常连接。

进容器里面试试

[ssfdust@RedLotusX ~]$ sudo podman run -ti --rm ubuntu bash
root@531005b96a39:/# echo > /dev/tcp/36.155.132.3/80;echo $?
^Cbash: connect: Interrupted system call
bash: /dev/tcp/36.155.132.3/80: Interrupted system call

这里直接 hang 住了,我 ctrl 退出了

nft 对比

其实没啥头绪,感觉是包被 drop 了,所以第一反应看看防火墙

设置跟踪包地址

sudo nft insert rule ip clash prerouting ip daddr 36.155.132.3 meta nftrace set 1

开始跟踪

sudo nft monitor trace

能正常收到数据的结果

本地访问是正常的所以在本地再一次访问 /dev/tcp/36.155.132.3/80

只看 syn ==> syn/ack 的过程

trace id e83c19de ip clash prerouting packet:
    iif "lo" @ll,0,112 0x800 ip saddr 192.168.0.107 ip daddr 36.155.132.3
    ip dscp cs0 ip ecn not-ectip ttl 64 ip id 30562 ip length 60
    tcp sport 45738 tcp dport 80 tcp flags == syn tcp window 64240
trace id e83c19de ip clash prerouting rule ip daddr 36.155.132.3 meta nftrace set 1 (verdict continue)
trace id e83c19de ip clash prerouting rule meta l4proto { tcp, udp } meta mark set 0x00000001 tproxy to 127.0.0.1:9999 counter packets 13417 bytes 1497892 accept (verdict accept)
trace id e83c19de ip filter INPUT packet: iif "lo" @ll,0,112 0x800 ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 30562 ip length 60 tcp sport 45738 tcp dport 80 tcp flags == syn tcp window 64240
trace id e83c19de ip filter INPUT rule xt match "comment" counter packets 78349 bytes 79668105 jump NETAVARK_INPUT (verdict jump NETAVARK_INPUT)
trace id e83c19de ip filter NETAVARK_INPUT verdict continue meta mark 0x00000001
trace id e83c19de ip filter INPUT rule counter packets 78327 bytes 79672697 jump LIBVIRT_INP (verdict jump LIBVIRT_INP)
trace id e83c19de ip filter LIBVIRT_INP verdict continue meta mark 0x00000001
trace id e83c19de ip filter INPUT verdict continue meta mark 0x00000001
trace id e83c19de ip filter INPUT policy accept meta mark 0x00000001

trace id b2ccfd90 ip clash prerouting packet: iif "lo" @ll,0,112 0x800 ip saddr 36.155.132.3 ip daddr 192.168.0.107 ip dscp cs0 ip ecn not-ectip ttl 64 ip id 0 ip length 60 tcp sport 80 tcp dport 45738 tcp flags == 0x12 tcp window 65483
trace id b2ccfd90 ip clash prerouting rule ip saddr 36.155.132.3 meta nftrace set 1 (verdict continue)
trace id b2ccfd90 ip clash prerouting verdict return
trace id b2ccfd90 ip clash prerouting policy accept
trace id b2ccfd90 ip filter INPUT packet: iif "lo" @ll,0,112 0x800 ip saddr 36.155.132.3 ip daddr 192.168.0.107 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 0 ip length 60 tcp sport 80 tcp dport 45738 tcp flags == 0x12 tcp window 65483
trace id b2ccfd90 ip filter INPUT rule xt match "comment" counter packets 78349 bytes 79668105 jump NETAVARK_INPUT (verdict jump NETAVARK_INPUT)
trace id b2ccfd90 ip filter NETAVARK_INPUT verdict continue
trace id b2ccfd90 ip filter INPUT rule counter packets 78327 bytes 79672697 jump LIBVIRT_INP (verdict jump LIBVIRT_INP)
trace id b2ccfd90 ip filter LIBVIRT_INP verdict continue
trace id b2ccfd90 ip filter INPUT verdict continue
trace id b2ccfd90 ip filter INPUT policy accept

还是太长了,简化一下

trace id e83c19de ip clash prerouting packet: iif "lo" @ll,0,112 0x800 ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ectip ttl 64 ip id 30562 ip length 60 tcp sport 45738 tcp dport 80 tcp flags == syn tcp window 64240
trace id e83c19de ip clash prerouting rule meta l4proto { tcp, udp } meta mark set 0x00000001 tproxy to 127.0.0.1:9999 counter packets 13417 bytes 1497892 accept (verdict accept)
trace id e83c19de ip filter INPUT packet: iif "lo" @ll,0,112 0x800 ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 30562 ip length 60 tcp sport 45738 tcp dport 80 tcp flags == syn tcp window 64240
trace id b2ccfd90 ip filter INPUT packet: iif "lo" @ll,0,112 0x800 ip saddr 36.155.132.3 ip daddr 192.168.0.107 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 0 ip length 60 tcp sport 80 tcp dport 45738 tcp flags == 0x12 tcp window 65483

可以看到,请求经过本地回环,发出去,并且经过本地回环,收到了包

不能收到包的情况

同样发送到 /dev/tcp/36.155.132.3/80

trace id b9b63d1d ip clash prerouting packet:
    iif "podman0" ether saddr 62:03:7f:30:c4:81 ether daddr a2:9b:e7:88:99:2e
    ip saddr 10.88.0.3 ip daddr 36.155.132.3 
    ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 38382 ip length 60
    tcp sport 37016 tcp dport 80 tcp flags == syn tcp window 64240
trace id b9b63d1d ip clash prerouting rule ip daddr 36.155.132.3 meta nftrace set 1 (verdict continue)
trace id b9b63d1d ip clash prerouting rule meta l4proto { tcp, udp } meta mark set 0x00000001 tproxy to 127.0.0.1:9999 counter packets 13417 bytes 1497892 accept (verdict accept)
trace id b9b63d1d ip nat PREROUTING packet:
    iif "podman0" ether saddr 62:03:7f:30:c4:81 ether daddr a2:9b:e7:88:99:2e
    ip saddr 10.88.0.3 ip daddr 36.155.132.3
    ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 38382 ip length 60
    tcp sport 37016 tcp dport 80 tcp flags == syn tcp window 64240
trace id b9b63d1d ip nat PREROUTING verdict continue meta mark 0x00000001
trace id b9b63d1d ip nat PREROUTING policy accept meta mark 0x00000001
trace id b9b63d1d ip clash dns packet:
    iif "podman0" ether saddr 62:03:7f:30:c4:81 ether daddr a2:9b:e7:88:99:2e
    ip saddr 10.88.0.3 ip daddr 36.155.132.3
    ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 38382 ip length 60
    tcp sport 37016 tcp dport 80 tcp flags == syn tcp window 64240
trace id b9b63d1d ip clash dns verdict return meta mark 0x00000001
trace id b9b63d1d ip clash dns policy accept meta mark 0x00000001

可以看到容器中的包直接依然走的是 podman 网卡,而包正常的走的是本地回环,推测问题可能在这个地方。

回头重查 tproxy 配置

核心在于

ip rule add fwmark 1 table 100
ip route add local 0.0.0.0/0 dev lo table 100
# 转发至 9999 端口
nft add rule clash prerouting meta l4proto { tcp, udp } mark set 1 tproxy to 127.0.0.1:$port counter accept
# 出口重路由至 prerouting
nft add rule clash output meta l4proto { tcp, udp } mark set 1 counter

感觉 output 这边可能看出啥猫腻

跟踪 output 包

sudo nft insert rule ip clash output ip daddr 36.155.132.3 meta nftrace set 1

本机的结果

trace id 0b417a5d ip clash output packet: oif "virtbr0" ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 51137 ip length 60 tcp sport 39780 tcp dport 80 tcp flags == syn tcp window 64240
trace id 0b417a5d ip clash output rule ip daddr 36.155.132.3 meta nftrace set 1 (verdict continue)
trace id 0b417a5d ip clash output rule meta l4proto { tcp, udp } meta mark set 0x00000001 counter packets 718 bytes 80083 (verdict continue)
trace id 0b417a5d ip clash output verdict continue meta mark 0x00000001
trace id 0b417a5d ip clash output policy accept meta mark 0x00000001
trace id 0b417a5d ip nat OUTPUT packet: oif "virtbr0" ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 51137 ip length 60 tcp sport 39780 tcp dport 80 tcp flags == syn tcp window 64240
trace id 0b417a5d ip nat OUTPUT verdict continue meta mark 0x00000001
trace id 0b417a5d ip nat OUTPUT policy accept meta mark 0x00000001
trace id 0b417a5d ip clash dnsout packet: oif "virtbr0" ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 51137 ip length 60 tcp sport 39780 tcp dport 80 tcp flags == syn tcp window 64240
trace id 0b417a5d ip clash dnsout verdict continue meta mark 0x00000001
trace id 0b417a5d ip clash dnsout policy accept meta mark 0x00000001
trace id 0b417a5d ip filter OUTPUT packet: oif "virtbr0" ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 51137 ip length 60 tcp sport 39780 tcp dport 80 tcp flags == syn tcp window 64240
trace id 0b417a5d ip filter OUTPUT rule counter packets 173228 bytes 102129488 jump LIBVIRT_OUT (verdict jump LIBVIRT_OUT)
trace id 0b417a5d ip filter LIBVIRT_OUT verdict continue meta mark 0x00000001
trace id 0b417a5d ip filter OUTPUT verdict continue meta mark 0x00000001
trace id 0b417a5d ip filter OUTPUT policy accept meta mark 0x00000001
trace id 0b417a5d ip mangle POSTROUTING packet: oif "lo" ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 51137 ip length 60 tcp sport 39780 tcp dport 80 tcp flags == syn tcp window 64240

简化一下

trace id 0b417a5d ip clash output packet: oif "virtbr0" ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 51137 ip length 60 tcp sport 39780 tcp dport 80 tcp flags == syn tcp window 64240
trace id 0b417a5d ip clash output rule meta l4proto { tcp, udp } meta mark set 0x00000001 counter packets 718 bytes 80083 (verdict continue)
trace id 0b417a5d ip nat OUTPUT packet: oif "virtbr0" ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 51137 ip length 60 tcp sport 39780 tcp dport 80 tcp flags == syn tcp window 64240
trace id 0b417a5d ip clash dnsout packet: oif "virtbr0" ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 51137 ip length 60 tcp sport 39780 tcp dport 80 tcp flags == syn tcp window 64240
trace id 0b417a5d ip filter OUTPUT packet: oif "virtbr0" ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 51137 ip length 60 tcp sport 39780 tcp dport 80 tcp flags == syn tcp window 64240
trace id 0b417a5d ip mangle POSTROUTING packet: oif "lo" ip saddr 192.168.0.107 ip daddr 36.155.132.3 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 51137 ip length 60 tcp sport 39780 tcp dport 80 tcp flags == syn tcp window 64240

可以看到 clash output --> nat OUTPUT --> clash dnsout --> filter OUPUT --> mangle POSTROUTING

容器内异常访问的情况

这里容器内直接访问目标 ip,直接没有返回???
看了一下 nft trace 的结果,发现是没有重路由
好吧上 pwru

sudo pwru --output-meta host 36.155.132.3 --output-file host

然后再找一台虚拟机,同样的配置配置 tproxy

sudo pwru --output-meta host 36.155.132.3 --output-file guest

找到核心区别

虚拟机(正常情况)

0xffffa0abc2f6fae8      3     [bash(1187)]   __inet_lookup_listener netns=4026531840 mark=0x0 iface=5(podman0) proto=0x0800 mtu=1500 len=60
0xffffa0abc2f6fae8      3     [bash(1187)]       inet_lhash2_lookup netns=4026531840 mark=0x0 iface=5(podman0) proto=0x0800 mtu=1500 len=60
0xffffa0abc2f6fae8      3     [bash(1187)]       inet_lhash2_lookup netns=4026531840 mark=0x0 iface=5(podman0) proto=0x0800 mtu=1500 len=60
0xffffa0abc2f6fae8      3     [bash(1187)]   __inet_lookup_listener netns=4026531840 mark=0x1 iface=5(podman0) proto=0x0800 mtu=1500 len=60
0xffffa0abc2f6fae8      3     [bash(1187)]       inet_lhash2_lookup netns=4026531840 mark=0x1 iface=5(podman0) proto=0x0800 mtu=1500 len=60
0xffffa0abc2f6fae8      3     [bash(1187)]       inet_lhash2_lookup netns=4026531840 mark=0x1 iface=5(podman0) proto=0x0800 mtu=1500 len=60
0xffffa0abc2f6fae8      3     [bash(1187)]     ip_route_input_noref netns=4026531840 mark=0x1 iface=5(podman0) proto=0x0800 mtu=1500 len=60
0xffffa0abc2f6fae8      3     [bash(1187)]      ip_route_input_slow netns=4026531840 mark=0x1 iface=5(podman0) proto=0x0800 mtu=1500 len=60
0xffffa0abc2f6fae8      3     [bash(1187)]      fib_validate_source netns=4026531840 mark=0x1 iface=5(podman0) proto=0x0800 mtu=1500 len=60
0xffffa0abc2f6fae8      3     [bash(1187)]    __fib_validate_source netns=4026531840 mark=0x1 iface=5(podman0) proto=0x0800 mtu=1500 len=60

主机(异常情况)

0xffff9fba5ab566e8     12  [bash(1476784)]   __inet_lookup_listener netns=4026531840 mark=0x0 iface=9(podman5) proto=0x0800 mtu=1500 len=60
0xffff9fba5ab566e8     12  [bash(1476784)]       inet_lhash2_lookup netns=4026531840 mark=0x0 iface=9(podman5) proto=0x0800 mtu=1500 len=60
0xffff9fba5ab566e8     12  [bash(1476784)]       inet_lhash2_lookup netns=4026531840 mark=0x0 iface=9(podman5) proto=0x0800 mtu=1500 len=60
0xffff9fba5ab566e8     12  [bash(1476784)]   __inet_lookup_listener netns=4026531840 mark=0x1 iface=9(podman5) proto=0x0800 mtu=1500 len=60
0xffff9fba5ab566e8     12  [bash(1476784)]       inet_lhash2_lookup netns=4026531840 mark=0x1 iface=9(podman5) proto=0x0800 mtu=1500 len=60
0xffff9fba5ab566e8     12  [bash(1476784)]       inet_lhash2_lookup netns=4026531840 mark=0x1 iface=9(podman5) proto=0x0800 mtu=1500 len=60
0xffff9fba5ab566e8     12  [bash(1476784)]                 skb_push netns=4026531840 mark=0x1 iface=17(veth7) proto=0x0800 mtu=1500 len=60
0xffff9fba5ab566e8     12  [bash(1476784)]             nf_hook_slow netns=4026531840 mark=0x1 iface=17(veth7) proto=0x0800 mtu=1500 len=60
0xffff9fba5ab566e8     12  [bash(1476784)]        netif_receive_skb netns=4026531840 mark=0x1 iface=9(podman5) proto=0x0800 mtu=1500 len=60
0xffff9fba5ab566e8     12  [bash(1476784)]   skb_defer_rx_timestamp netns=4026531840 mark=0x1 iface=9(podman5) proto=0x0800 mtu=1500 len=60
0xffff9fba5ab566e8     12  [bash(1476784)]      __netif_receive_skb netns=4026531840 mark=0x1 iface=9(podman5) proto=0x0800 mtu=1500 len=60

核心区别就是在 tproxy meta 设置为 0x1 之后,一个处理的是 skb_push 函数,另一个处理是 ip_route_input_noref 函数

分别 hook 这两个函数打印 stack 对比一下

sudo pwru --filter-func ip_route_input_noref --output-stack host 36.155.132.3 --output-file current1
sudo pwru --output-meta --filter-func skb_push --output-stack host 36.155.132.3

skb_push

0xffff9fba5ab542e8     12  [bash(1476784)]                 skb_push netns=4026531840 mark=0x1 iface=17(veth7) proto=0x0800 mtu=1500 len=60
skb_push
br_nf_pre_routing_finish        [br_netfilter]
br_nf_pre_routing       [br_netfilter]
br_handle_frame [bridge]
__netif_receive_skb_core.constprop.0
__netif_receive_skb_one_core
process_backlog
__napi_poll
net_rx_action
__softirqentry_text_start
do_softirq.part.0
__local_bh_enable_ip
__dev_queue_xmit
ip_finish_output2
__ip_queue_xmit
__tcp_transmit_skb
tcp_connect
tcp_v4_connect
__inet_stream_connect
inet_stream_connect
__sys_connect
__x64_sys_connect
do_syscall_64
entry_SYSCALL_64_after_hwframe

ip_route_input_noref

0xffffa0abd49b84e8      3     [bash(1187)]     ip_route_input_noref netns=4026531840 mark=0x1 iface=5(podman0) proto=0x0800 mtu=1500 len=60
ip_route_input_noref
ip_rcv_finish_core.isra.0
ip_rcv
__netif_receive_skb_one_core
netif_receive_skb
br_handle_frame_finish  [bridge]
br_handle_frame [bridge]
__netif_receive_skb_core.constprop.0
__netif_receive_skb_one_core
process_backlog
__napi_poll
net_rx_action
__do_softirq
do_softirq.part.0
__local_bh_enable_ip
__dev_queue_xmit
ip_finish_output2
__ip_queue_xmit
__tcp_transmit_skb
tcp_connect
tcp_v4_connect
__inet_stream_connect
inet_stream_connect
__sys_connect
__x64_sys_connect
do_syscall_64
entry_SYSCALL_64_after_hwframe

可以看到这边多了 br_netfilter 这个模块,因为这个模块导致了走进 skb_push 这个函数

找到了原因,删除模块驱动

rmmod br_netfilter

重新进入主机容器

echo > /dev/tcp/36.155.132.3/80;echo $?
0

到这里就救回来了。

可是也许其实我需要 br_netfilter,因为这个东西是 docker-in-docker 自己创建的,如果没有这个东西,好像也不大好弄。所以我就想问题好像是在 nat prerouting 那边,所以我就想能不能完全使用 tproxy 替代 nat 中的转发功能。试了一下不太行。然后在 stackoverflow 上得到了 br_netfilter 与 tproxy 不兼容的最终答案。

TPROXY compatibility with Docker

GG.

相关帖子

欢迎来到这里!

我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。

注册 关于
请输入回帖内容 ...