Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos IPsec Failover group stop working

Hi,

I setup IPsec failover group in a branch-office device. Everything works well last week, but from 2 days the failover group stop working.

When the internet cuts in the head-office, the branch-office device not switch to second IPsec.

So, i have to turn off and turn on the failover group every time manually.

i  tested both IPsec tunnels separately, both works fine.

Firmware: SFOS 17.5.14 MR-14-1



This thread was automatically locked due to age.
  • After checking the logs i found something interesting in the lines below (i changed the ip on both ends for confidentiality )

    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (ref_counting) ref_count: 2 to 1 -- down -- (192.168.2.0/24#192.168.1.0/24)
    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (ref_counting_remote) ref_count_remote: 2 to 1 -- down -- (s1.s1.s1.s1#s2.s2.s2.s2)
    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (cop_updown_invoke_once) UID: 66 Net: Local s1.s1.s1.s1 Remote s2.s2.s2.s2 Connection: SITE2_SITE1_ISP1 Fullname: SITE2_SITE1_ISP1-1
    2020-12-18 00:02:22 13[APP] <SITE2_SITE1_ISP1-1|66> [COP-UPDOWN] (cop_updown_invoke_once) Tunnel: User '' Peer-IP '' my-IP '' down-client
    2020-12-18 00:02:22 16[APP] [COP-UPDOWN][DB] (db_conn_info) hostname: 'SITE2_SITE1_ISP1' result --> id: '4', mode: 'ntn', tunnel_type: '0', subnet_family:'0'
    2020-12-18 00:02:22 16[APP] [COP-UPDOWN] (do_cop_updown_invoke_once) !!SKIP!! IPsec IKE for remotes (s1.s1.s1.s1 to s2.s2.s2.s2) already set up
    2020-12-18 00:02:22 16[APP] [COP-UPDOWN] (do_cop_updown_invoke_once) !!SKIP!! IPsec SA for subnet (192.168.2.0/24 to 192.168.1.0/24) already set up
    2020-12-18 00:02:22 13[IKE] <SITE2_SITE1_ISP1-1|66> sending DELETE for ESP CHILD_SA with SPI c92bc96b
    2020-12-18 00:02:22 13[ENC] <SITE2_SITE1_ISP1-1|66> generating INFORMATIONAL_V1 request 3944376634 [ HASH D ]
    2020-12-18 00:02:22 13[NET] <SITE2_SITE1_ISP1-1|66> sending packet: from s1.s1.s1.s1[4500] to s2.s2.s2.s2[4500] (92 bytes)
    2020-12-18 00:02:22 13[IKE] <SITE2_SITE1_ISP1-1|66> deleting IKE_SA SITE2_SITE1_ISP1-1[66] between s1.s1.s1.s1[sw1.sw1.sw1.sw1]...s2.s2.s2.s2[s2.s2.s2.s2]
    2020-12-18 00:02:22 13[IKE] <SITE2_SITE1_ISP1-1|66> sending DELETE for ITE2_SITE1_ISP1-1[66]
    2020-12-18 00:02:22 13[ENC] <SITE2_SITE1_ISP1-1|66> generating INFORMATIONAL_V1 request 1673290694 [ HASH D ]
    2020-12-18 00:02:22 13[NET] <SITE2_SITE1_ISP1-1|66> sending packet: from s1.s1.s1.s1[4500] to s2.s2.s2.s2[4500]  (108 bytes)
    

    Sometime i found a duplicated tunnel for the same site. As i understand, After losing the connection between both sites the failover process kill the current tunnel and start checking if there is any active tunnel before switch to the backup.

    As a result, the failover process finds that there is a existed tunnel, so it skips to switch to backup.

    The ( [COP-UPDOWN] (ref_counting) ref_count: 2 to 1 ) must be ([COP-UPDOWN] (ref_counting) ref_count: 1 to 0).

    i think the problem is coming from the duplicated tunnel. This bug need a fix.

  • i start getting the same problem with multiple devices with SFOS 17.5.14 MR-14-1.

    I'm still waiting for the support feedback.

  • Hello there,

    Thank you for contacting the Sophos Community!

    Are both devices Sophos XG?

    Are you configuring the Failover groups only on the Branch Office? 

    Make sure the BO is the one initiating the connection.

    Regards,

  • Hello

    1 - Are both devices Sophos XG?

    Yes, Both devices are Sophos XG (135/85).

    2 - Are you configuring the Failover groups only on the Branch Office? 

    Nope, i configured the failover only on BO.

    3 - the BO is the one initiating the connection, the HO is responding only.

    Note that i followed this KB: https://support.sophos.com/support/s/article/KB-000035828?language=en_US

    Everything was working until this month.

  • i got the same problem yesterday on another device, the failover & failback was working but after restarting the device the failover stop working.