Beause I could not find much helpful stuff about our issue today, here some information maybe someone finds handy in the future.
Today our XG v18 HA Cluster did a Failover. This did not work so well as we often notice. The failover took more than 10 minutes and so on.
Afterwards incomming Site-2-Site VPN connections from a SG firewall could not be established. We checked the settings and all was fine. Still no connection.
On shell in strongswan.log I found something interesting:
2020-10-07 11:22:42 17[CFG] loaded RSA public key for "172.1xx.xxx.xxx"
2020-10-07 11:22:42 17[CFG] loaded RSA public key for "xxx.mydomain.com"
2020-10-07 11:22:42 17[CFG] added configuration 'VPNCONN-1'
2020-10-07 11:23:27 03[NET] ### drop_ike_sa_init(): rejecting new connections ###
2020-10-07 11:24:07 03[NET] ### drop_ike_sa_init(): rejecting new connections ###
2020-10-07 11:24:47 03[NET] ### drop_ike_sa_init(): rejecting new connections ###
....
So the XG firewall was actively blocking the connection.
On the other side, the connecting SG reported
2020:10:07-10:50:53 Firewall-1 pluto[5518]: ERROR: "S_REF_IpsSitTunnelxghe_0" #442834: sendto on eth0 to 2xx.xxx.xxx.xxx:500 failed in main_outI1. Errno 1: Operation not permitted
...
2020:10:07-11:07:47 Firewall-1 pluto[5518]: "S_REF_IpsSitTunnelxghe_0" #442852: max number of retransmissions (20) reached STATE_MAIN_I1. No response (or no acceptable response) to our first IKE message
In the the only solution was to restart strongswan service on the XG side:
service strongswan:restart -ds nosync
afterwards the connection could be established immediately.
Be aware the restarting the service drops all existing IPSec connections if there is any.
This thread was automatically locked due to age.