Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos IPsec Failover group stop working

Hi,

I setup IPsec failover group in a branch-office device. Everything works well last week, but from 2 days the failover group stop working.

When the internet cuts in the head-office, the branch-office device not switch to second IPsec.

So, i have to turn off and turn on the failover group every time manually.

i  tested both IPsec tunnels separately, both works fine.

Firmware: SFOS 17.5.14 MR-14-1



This thread was automatically locked due to age.
Parents
  • After checking the logs i found something interesting in the lines below (i changed the ip on both ends for confidentiality )

    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (ref_counting) ref_count: 2 to 1 -- down -- (192.168.2.0/24#192.168.1.0/24)
    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (ref_counting_remote) ref_count_remote: 2 to 1 -- down -- (s1.s1.s1.s1#s2.s2.s2.s2)
    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (cop_updown_invoke_once) UID: 66 Net: Local s1.s1.s1.s1 Remote s2.s2.s2.s2 Connection: SITE2_SITE1_ISP1 Fullname: SITE2_SITE1_ISP1-1
    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (cop_updown_invoke_once) Tunnel: User '' Peer-IP '' my-IP '' down-client
    2020-12-18 00:02:22 16[APP] [COP-UPDOWN][DB] (db_conn_info) hostname: 'SITE2_SITE1_ISP1' result --> id: '4', mode: 'ntn', tunnel_type: '0', subnet_family:'0'
    2020-12-18 00:02:22 16[APP] [COP-UPDOWN] (do_cop_updown_invoke_once) !!SKIP!! IPsec IKE for remotes (s1.s1.s1.s1 to s2.s2.s2.s2) already set up
    2020-12-18 00:02:22 16[APP] [COP-UPDOWN] (do_cop_updown_invoke_once) !!SKIP!! IPsec SA for subnet (192.168.2.0/24 to 192.168.1.0/24) already set up
    2020-12-18 00:02:22 13[IKE] <SITE2_SITE1_ISP1-1|66> sending DELETE for ESP CHILD_SA with SPI c92bc96b
    2020-12-18 00:02:22 13[ENC] <SITE2_SITE1_ISP1-1|66> generating INFORMATIONAL_V1 request 3944376634 [ HASH D ]
    2020-12-18 00:02:22 13[NET] <SITE2_SITE1_ISP1-1|66> sending packet: from s1.s1.s1.s1[4500] to s2.s2.s2.s2[4500] (92 bytes)
    2020-12-18 00:02:22 13[IKE] <SITE2_SITE1_ISP1-1|66> deleting IKE_SA SITE2_SITE1_ISP1-1[66] between s1.s1.s1.s1[sw1.sw1.sw1.sw1]...s2.s2.s2.s2[s2.s2.s2.s2]
    2020-12-18 00:02:22 13[IKE] <SITE2_SITE1_ISP1-1|66> sending DELETE for ITE2_SITE1_ISP1-1[66]
    2020-12-18 00:02:22 13[ENC] <SITE2_SITE1_ISP1-1|66> generating INFORMATIONAL_V1 request 1673290694 [ HASH D ]
    2020-12-18 00:02:22 13[NET] <SITE2_SITE1_ISP1-1|66> sending packet: from s1.s1.s1.s1[4500] to s2.s2.s2.s2[4500]  (108 bytes)
    

    Sometime i found a duplicated tunnel for the same site. As i understand, After losing the connection between both sites the failover process kill the current tunnel and start checking if there is any active tunnel before switch to the backup.

    As a result, the failover process finds that there is a existed tunnel, so it skips to switch to backup.

    The ( [COP-UPDOWN] (ref_counting) ref_count: 2 to 1 ) must be ([COP-UPDOWN] (ref_counting) ref_count: 1 to 0).

    i think the problem is coming from the duplicated tunnel. This bug need a fix.

Reply
  • After checking the logs i found something interesting in the lines below (i changed the ip on both ends for confidentiality )

    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (ref_counting) ref_count: 2 to 1 -- down -- (192.168.2.0/24#192.168.1.0/24)
    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (ref_counting_remote) ref_count_remote: 2 to 1 -- down -- (s1.s1.s1.s1#s2.s2.s2.s2)
    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (cop_updown_invoke_once) UID: 66 Net: Local s1.s1.s1.s1 Remote s2.s2.s2.s2 Connection: SITE2_SITE1_ISP1 Fullname: SITE2_SITE1_ISP1-1
    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (cop_updown_invoke_once) Tunnel: User '' Peer-IP '' my-IP '' down-client
    2020-12-18 00:02:22 16[APP] [COP-UPDOWN][DB] (db_conn_info) hostname: 'SITE2_SITE1_ISP1' result --> id: '4', mode: 'ntn', tunnel_type: '0', subnet_family:'0'
    2020-12-18 00:02:22 16[APP] [COP-UPDOWN] (do_cop_updown_invoke_once) !!SKIP!! IPsec IKE for remotes (s1.s1.s1.s1 to s2.s2.s2.s2) already set up
    2020-12-18 00:02:22 16[APP] [COP-UPDOWN] (do_cop_updown_invoke_once) !!SKIP!! IPsec SA for subnet (192.168.2.0/24 to 192.168.1.0/24) already set up
    2020-12-18 00:02:22 13[IKE] <SITE2_SITE1_ISP1-1|66> sending DELETE for ESP CHILD_SA with SPI c92bc96b
    2020-12-18 00:02:22 13[ENC] <SITE2_SITE1_ISP1-1|66> generating INFORMATIONAL_V1 request 3944376634 [ HASH D ]
    2020-12-18 00:02:22 13[NET] <SITE2_SITE1_ISP1-1|66> sending packet: from s1.s1.s1.s1[4500] to s2.s2.s2.s2[4500] (92 bytes)
    2020-12-18 00:02:22 13[IKE] <SITE2_SITE1_ISP1-1|66> deleting IKE_SA SITE2_SITE1_ISP1-1[66] between s1.s1.s1.s1[sw1.sw1.sw1.sw1]...s2.s2.s2.s2[s2.s2.s2.s2]
    2020-12-18 00:02:22 13[IKE] <SITE2_SITE1_ISP1-1|66> sending DELETE for ITE2_SITE1_ISP1-1[66]
    2020-12-18 00:02:22 13[ENC] <SITE2_SITE1_ISP1-1|66> generating INFORMATIONAL_V1 request 1673290694 [ HASH D ]
    2020-12-18 00:02:22 13[NET] <SITE2_SITE1_ISP1-1|66> sending packet: from s1.s1.s1.s1[4500] to s2.s2.s2.s2[4500]  (108 bytes)
    

    Sometime i found a duplicated tunnel for the same site. As i understand, After losing the connection between both sites the failover process kill the current tunnel and start checking if there is any active tunnel before switch to the backup.

    As a result, the failover process finds that there is a existed tunnel, so it skips to switch to backup.

    The ( [COP-UPDOWN] (ref_counting) ref_count: 2 to 1 ) must be ([COP-UPDOWN] (ref_counting) ref_count: 1 to 0).

    i think the problem is coming from the duplicated tunnel. This bug need a fix.

Children
No Data