Behavior of Keepalived healthchecker in depends of return TCP flags (example based on configured HTTP_GET health check)

About option connect_timeout – we can have 2 cases about healthchecker behavior with tcp flags:

  • case A
    • healthchecker sent TCP SYN
    • remote host return TCP RESET
    • keepalived ignore connect_timeout and drop RS (or reduce weight in depends of config)
      Test schema:
      
      test RS:
      [root@test5v.dev:~]# date; iptables -A INPUT -s 192.168.2.1 -p tcp --dport 8081 -j REJECT
      Wed Nov 25 14:08:44 ICT 2015
      
      dump - normal tcp healthcheck connect:
      14:08:40.252848 IP 192.168.2.1.37274 > 192.168.2.10.8081: Flags [S], seq 318940877, win 29200, options [mss 1460,sackOK,TS val 629869104 ecr 0,nop,wscale 7], length 0
      14:08:40.252877 IP 192.168.2.10.8081 > 192.168.2.1.37274: Flags [S.], seq 548455626, ack 318940878, win 28960, options [mss 1460,sackOK,TS val 3908139897 ecr 629869104,nop,wscale 7], length 0
      14:08:40.253034 IP 192.168.2.1.37274 > 192.168.2.10.8081: Flags [.], ack 1, win 229, options [nop,nop,TS val 629869104 ecr 3908139897], length 0
      14:08:40.253067 IP 192.168.2.1.37274 > 192.168.2.10.8081: Flags [P.], seq 1:76, ack 1, win 229, options [nop,nop,TS val 629869104 ecr 3908139897], length 75
      14:08:40.253079 IP 192.168.2.10.8081 > 192.168.2.1.37274: Flags [.], ack 76, win 227, options [nop,nop,TS val 3908139897 ecr 629869104], length 0
      14:08:40.253220 IP 192.168.2.10.8081 > 192.168.2.1.37274: Flags [P.], seq 1:225, ack 76, win 227, options [nop,nop,TS val 3908139897 ecr 629869104], length 224
      14:08:40.253238 IP 192.168.2.10.8081 > 192.168.2.1.37274: Flags [FP.], seq 225:320, ack 76, win 227, options [nop,nop,TS val 3908139897 ecr 629869104], length 95
      14:08:40.253350 IP 192.168.2.1.37274 > 192.168.2.10.8081: Flags [.], ack 225, win 237, options [nop,nop,TS val 629869104 ecr 3908139897], length 0
      14:08:40.253503 IP 192.168.2.1.37274 > 192.168.2.10.8081: Flags [R.], seq 76, ack 321, win 237, options [nop,nop,TS val 629869104 ecr 3908139897], length 0
      
      here the last check before drop RS:
      14:08:46.254560 IP 192.168.2.1.37275 > 192.168.2.10.8081: Flags [S], seq 3293277997, win 29200, options [mss 1460,sackOK,TS val 629870605 ecr 0,nop,wscale 7], length 0
      
      test LB:
      Nov 25 14:08:46 test2v Keepalived_healthcheckers[10690]: Error connecting server [192.168.2.10]:80.
      Nov 25 14:08:46 test2v Keepalived_healthcheckers[10690]: Removing service [192.168.2.10]:80 from VS [10.3.0.144]:80
      
  • case B
    • healthchecker sent TCP SYN
    • remote host return nothing (DROP all packets)
    • keepalived use connect_timeout
      • configured connect_timeout value (8sec) < than sumary GET time of nb_get_retry * delay_befor_retry = 10sec
            HTTP_GET {
              connect_port 8081
              url {
                path /status/
                status_code 200
                digest 6fb9c6eed1b7f0a50854944905dc9481
              }
              connect_timeout 8
              nb_get_retry 5
              delay_befor_retry 2
            }
        
Test schema:

test RS:
test5v.dev:~]# date; iptables -A INPUT -s 192.168.2.1 -p tcp --dport 8081 -j DROP
Wed Nov 25 14:22:40 ICT 2015

the last normal healthcheck TCP session:
14:47:00.340636 IP 192.168.2.1.37687 > 192.168.2.10.8081: Flags [S], seq 3820409120, win 29200, options [mss 1460,sackOK,TS val 630444127 ecr 0,nop,wscale 7], length 0
14:47:00.340665 IP 192.168.2.10.8081 > 192.168.2.1.37687: Flags [S.], seq 2338668368, ack 3820409121, win 28960, options [mss 1460,sackOK,TS val 3908714919 ecr 630444127,nop,wscale 7], length 0
14:47:00.340785 IP 192.168.2.1.37687 > 192.168.2.10.8081: Flags [.], ack 1, win 229, options [nop,nop,TS val 630444127 ecr 3908714919], length 0
14:47:00.340864 IP 192.168.2.1.37687 > 192.168.2.10.8081: Flags [P.], seq 1:76, ack 1, win 229, options [nop,nop,TS val 630444127 ecr 3908714919], length 75
14:47:00.340879 IP 192.168.2.10.8081 > 192.168.2.1.37687: Flags [.], ack 76, win 227, options [nop,nop,TS val 3908714919 ecr 630444127], length 0
14:47:00.340949 IP 192.168.2.10.8081 > 192.168.2.1.37687: Flags [P.], seq 1:225, ack 76, win 227, options [nop,nop,TS val 3908714919 ecr 630444127], length 224
14:47:00.340969 IP 192.168.2.10.8081 > 192.168.2.1.37687: Flags [FP.], seq 225:320, ack 76, win 227, options [nop,nop,TS val 3908714919 ecr 630444127], length 95
14:47:00.341134 IP 192.168.2.1.37687 > 192.168.2.10.8081: Flags [.], ack 225, win 237, options [nop,nop,TS val 630444127 ecr 3908714919], length 0
14:47:00.341184 IP 192.168.2.1.37687 > 192.168.2.10.8081: Flags [R.], seq 76, ack 321, win 237, options [nop,nop,TS val 630444127 ecr 3908714919], length 0
...here is send 4 SYN
14:47:06.342363 IP 192.168.2.1.37688 > 192.168.2.10.8081: Flags [S], seq 862050682, win 29200, options [mss 1460,sackOK,TS val 630445627 ecr 0,nop,wscale 7], length 0
14:47:07.339158 IP 192.168.2.1.37688 > 192.168.2.10.8081: Flags [S], seq 862050682, win 29200, options [mss 1460,sackOK,TS val 630445877 ecr 0,nop,wscale 7], length 0
14:47:09.343156 IP 192.168.2.1.37688 > 192.168.2.10.8081: Flags [S], seq 862050682, win 29200, options [mss 1460,sackOK,TS val 630446378 ecr 0,nop,wscale 7], length 0
14:47:13.351145 IP 192.168.2.1.37688 > 192.168.2.10.8081: Flags [S], seq 862050682, win 29200, options [mss 1460,sackOK,TS val 630447380 ecr 0,nop,wscale 7], length 0
...and drop RS due to connect_timeout 8 seconds

test LB:
Nov 25 14:47:14 test2v Keepalived_healthcheckers[10782]: Timeout connect, timeout server [192.168.2.10]:80.
Nov 25 14:47:14 test2v Keepalived_healthcheckers[10782]: Removing service [192.168.2.10]:80 from VS [10.3.0.144]:80

so here RS dropped due to connect_timeout 8 seconds

  • configured connect_timeout value (12sec) > than sumary GET time of nb_get_retry * delay_befor_retry = 10sec
        HTTP_GET {
          connect_port 8081
          url {
            path /status/
            status_code 200
            digest 6fb9c6eed1b7f0a50854944905dc9481
          }
          connect_timeout 12
          nb_get_retry 5
          delay_befor_retry 2
        }
    
Test schema:

test5v.dev:~]# date; iptables -A INPUT -s 192.168.2.1 -p tcp --dport 8081 -j DROP
Wed Nov 25 14:57:04 ICT 2015

the last normal healthcheck TCP session:
14:57:02.912134 IP 192.168.2.1.37751 > 192.168.2.10.8081: Flags [S], seq 3492153841, win 29200, options [mss 1460,sackOK,TS val 630594770 ecr 0,nop,wscale 7], length 0
14:57:02.912168 IP 192.168.2.10.8081 > 192.168.2.1.37751: Flags [S.], seq 2204287936, ack 3492153842, win 28960, options [mss 1460,sackOK,TS val 3908865562 ecr 630594770,nop,wscale 7], length 0
14:57:02.912284 IP 192.168.2.1.37751 > 192.168.2.10.8081: Flags [.], ack 1, win 229, options [nop,nop,TS val 630594770 ecr 3908865562], length 0
14:57:02.912432 IP 192.168.2.1.37751 > 192.168.2.10.8081: Flags [P.], seq 1:76, ack 1, win 229, options [nop,nop,TS val 630594770 ecr 3908865562], length 75
14:57:02.912453 IP 192.168.2.10.8081 > 192.168.2.1.37751: Flags [.], ack 76, win 227, options [nop,nop,TS val 3908865562 ecr 630594770], length 0
14:57:02.912534 IP 192.168.2.10.8081 > 192.168.2.1.37751: Flags [P.], seq 1:225, ack 76, win 227, options [nop,nop,TS val 3908865562 ecr 630594770], length 224
14:57:02.912553 IP 192.168.2.10.8081 > 192.168.2.1.37751: Flags [FP.], seq 225:320, ack 76, win 227, options [nop,nop,TS val 3908865562 ecr 630594770], length 95
14:57:02.912597 IP 192.168.2.1.37751 > 192.168.2.10.8081: Flags [.], ack 225, win 237, options [nop,nop,TS val 630594770 ecr 3908865562], length 0
14:57:02.912789 IP 192.168.2.1.37751 > 192.168.2.10.8081: Flags [R.], seq 76, ack 321, win 237, options [nop,nop,TS val 630594770 ecr 3908865562], length 0
...here is send 5 SYN
14:57:08.913861 IP 192.168.2.1.37752 > 192.168.2.10.8081: Flags [S], seq 714362941, win 29200, options [mss 1460,sackOK,TS val 630596270 ecr 0,nop,wscale 7], length 0
14:57:09.911049 IP 192.168.2.1.37752 > 192.168.2.10.8081: Flags [S], seq 714362941, win 29200, options [mss 1460,sackOK,TS val 630596520 ecr 0,nop,wscale 7], length 0
14:57:11.915050 IP 192.168.2.1.37752 > 192.168.2.10.8081: Flags [S], seq 714362941, win 29200, options [mss 1460,sackOK,TS val 630597021 ecr 0,nop,wscale 7], length 0
14:57:15.927067 IP 192.168.2.1.37752 > 192.168.2.10.8081: Flags [S], seq 714362941, win 29200, options [mss 1460,sackOK,TS val 630598024 ecr 0,nop,wscale 7], length 0
14:57:23.943070 IP 192.168.2.1.37752 > 192.168.2.10.8081: Flags [S], seq 714362941, win 29200, options [mss 1460,sackOK,TS val 630600028 ecr 0,nop,wscale 7], length 0
...and drop RS due to connect_timeout 15 seconds (> defined 12sec)

test LB:
Nov 25 14:57:28 test2v Keepalived_healthcheckers[11075]: Timeout connect, timeout server [192.168.2.10]:80.
Nov 25 14:57:28 test2v Keepalived_healthcheckers[11075]: Removing service [192.168.2.10]:80 from VS [10.3.0.144]:80

so here RS dropped due to connect_timeout 15 seconds

Conclusion: if we have errors on L4 (TCP), L7 checks will not work due to problem on another layer.


Now let’s test with enabled option inhibit_on_failure – it set weight to 0 on healthchecker failure (and should keep existing connections).

Request from test1v.dev to VIP 10.3.0.144:80

test1v.dev:~]# curl http://test.localhost
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Test LB:
Every 1.0s: ipvsadm -L                                                                                                                                                                      Wed Nov 25 17:27:59 2015

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.3.0.144:http wlc
  -> 192.168.2.10:http            Route   10     0          15

Now reject TCP SYN for /status

test5v.dev:~]# date; iptables -A INPUT -s 192.168.2.1 -p tcp --dport 8081 -j REJECT
Wed Nov 25 17:28:53 ICT 2015
Test LB:
Every 1.0s: ipvsadm -L Wed Nov 25 17:29:01 2015

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.3.0.144:http wlc
  -> 192.168.2.10:http            Route   0      0          15


Nov 25 17:28:59 test2v Keepalived_healthcheckers[11371]: Error connecting server [192.168.2.10]:80.
Nov 25 17:28:59 test2v Keepalived_healthcheckers[11371]: Disabling service [192.168.2.10]:80 from VS [10.3.0.144]:80

…still keep connections, but we can not establish a new one

test1v.dev:~]# curl http://test.localhost
curl: (7) couldn't connect to host

Let’s add RS back:

test5v.dev:~]# date; iptables -D INPUT -s 192.168.2.1 -p tcp --dport 8081 -j REJECT
Wed Nov 25 17:31:17 ICT 2015
Test LB:
Every 1.0s: ipvsadm -L Wed Nov 25 17:31:24 2015

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.3.0.144:http wlc
  -> 192.168.2.10:http            Route   10     0          0


Nov 25 17:31:17 test2v Keepalived_healthcheckers[11371]: MD5 digest success to [192.168.2.10]:80 url(1).
Nov 25 17:31:23 test2v Keepalived_healthcheckers[11371]: Remote Web server [192.168.2.10]:80 succeed on service.
Nov 25 17:31:23 test2v Keepalived_healthcheckers[11371]: Enabling service [192.168.2.10]:80 to VS [10.3.0.144]:80

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top