Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HA Mode Master active Slave Unlinked

Hallo,
 
ich habe im Moment das Problem das der Slave Node in den unlinked Modus geht. Ich kann somit im Moment unseren Cluster nicht updaten ohne dabei den Betrieb zu stören. Ich habe bisher nur einen Neustart beim Slave versucht. Gibt es die Möglichkeit nur den Slave Node auf Werkseinstellungen zurückzusetzen ohne den Betrieb zu stören und dann wieder zu syncen?.
 
Im Webinterface nur Global Reset gesehen. ggfs. kann man über die ha_utils nur den Slave resetten?
 
 
 
 
 
 
###############################   Ausgabe Log nach Neustart Slave ################################################
019:07:22-14:32:26 oytf00001-1 repctl[16339]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2019:07:22-14:32:26 oytf00001-1 ha_mode[13290]: topology_changed done (started at 14:32:26)
2019:07:22-14:32:26 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 928 26.825" name="Reading cluster configuration"
2019:07:22-14:32:42 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 929 42.065" name="Monitoring interfaces for link beat: eth4 eth3 eth7 lag0"
2019:07:22-14:33:27 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 930 27.515" name="Access granted to remote node 2!"
2019:07:22-14:33:42 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 931 42.515" name="Request reset MTU size to 1500 (ignored)"
2019:07:22-14:33:43 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 932 43.615" name="Node 2 changed version! 0.000000 -> 9.509003"
2019:07:22-14:33:43 oytf00001-1 ha_daemon[4457]: id="38C0" severity="info" sys="System" sub="ha" seq="M: 933 43.615" name="Node 2 is alive"
2019:07:22-14:33:43 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 934 43.615" name="Node 2 changed state: DEAD(2048) -> SYNCING(2)"
2019:07:22-14:33:43 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 935 43.615" name="Node 2 changed role: DEAD -> SLAVE"
2019:07:22-14:33:43 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 936 43.615" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.1 slave_ip 198.19.250.2"
2019:07:22-14:33:43 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 937 43.726" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2019:07:22-14:33:43 oytf00001-1 ha_mode[14000]: calling topology_changed
2019:07:22-14:33:43 oytf00001-1 ha_mode[14000]: topology_changed: waiting for last ha_mode done
2019:07:22-14:33:43 oytf00001-1 repctl[14015]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2019:07:22-14:33:43 oytf00001-1 repctl[14015]: [i] daemonize_check(1497): trying to signal daemon and exit
2019:07:22-14:33:43 oytf00001-1 repctl[16339]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2019:07:22-14:33:43 oytf00001-1 ha_mode[14000]: repctl[14015]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2019:07:22-14:33:43 oytf00001-1 ha_mode[14000]: topology_changed done (started at 14:33:43)
2019:07:22-14:33:44 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 938 44.196" name="Reading cluster configuration"
2019:07:22-14:33:49 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 939 49.257" name="Set syncing.files for node 2"
2019:07:22-14:33:54 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 940 54.797" name="Clear syncing.files for node 2"
2019:07:22-14:33:58 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 941 58.631" name="Node 2 changed state: SYNCING(2) -> SYNCING(3)"
2019:07:22-14:33:59 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 942 59.430" name="Monitoring interfaces for link beat: eth4 eth3 eth7 lag0"
2019:07:22-14:34:15 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 100 15.827" name="Reading cluster configuration"
2019:07:22-14:34:20 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 101 20.981" name="Monitoring interfaces for link beat: eth4 eth3 eth7 lag0"
2019:07:22-14:35:28 oytf00001-2 repctl[4391]: [i] stop_backup_mode(765): stopped backup mode at 000000010000053C0000004D<30>Jul 22 14:35:28 repctl[4391]: [i] execute(1768): waiting for server to start....
2019:07:22-14:35:29 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:30 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:31 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:32 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:33 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:34 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:35 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:36 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:37 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:38 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:39 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:40 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:41 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:42 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:43 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:44 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:45 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:46 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:47 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:48 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:49 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:50 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:51 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:52 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:53 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:54 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:55 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:56 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:57 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:35:58 oytf00001-2 repctl[4391]: [i] execute(1768): stopped waiting
2019:07:22-14:35:58 oytf00001-2 repctl[4391]: [i] execute(1768): server is still starting up
2019:07:22-14:35:58 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 102 58.949" name="HA control: cmd = 'sync stop 1 database'"
2019:07:22-14:35:58 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 103 58.949" name="Deactivating sync process for database on node 1"
2019:07:22-14:35:58 oytf00001-2 repctl[4391]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 0
2019:07:22-14:35:58 oytf00001-2 repctl[4391]: [i] execute(1768): pg_ctl: server is running (PID: 8603)
2019:07:22-14:35:58 oytf00001-2 repctl[4391]: [i] execute(1768): /usr/pgsql92/bin/postgres "-D" "/var/storage/pgsql92/data"
2019:07:22-14:35:59 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 104 59.067" name="HA control: cmd = 'sync start 1 database'"
2019:07:22-14:35:59 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 105 59.067" name="Activating sync process for database on node 1"
2019:07:22-14:35:59 oytf00001-2 repctl[4391]: [i] execute(1768): waiting for server to shut down...
2019:07:22-14:35:59 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:36:00 oytf00001-2 repctl[4391]: [i] execute(1768): done
2019:07:22-14:36:00 oytf00001-2 repctl[4391]: [i] execute(1768): server stopped
2019:07:22-14:36:01 oytf00001-2 repctl[4391]: [i] start_backup_mode(744): starting backup mode at 000000010000053C0000004F
2019:07:22-14:36:01 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 106 01.552" name="HA control: cmd = 'sync start 1 database'"
2019:07:22-14:36:01 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 107 01.552" name="Activating sync process for database on node 1"
2019:07:22-14:37:29 oytf00001-2 repctl[4391]: [i] stop_backup_mode(765): stopped backup mode at 000000010000053C0000004F
2019:07:22-14:37:29 oytf00001-2 repctl[4391]: [i] execute(1768): waiting for server to start....
2019:07:22-14:37:30 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:31 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:32 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:33 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:34 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:35 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:36 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:37 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:38 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:39 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:40 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:41 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:42 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:43 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:44 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:45 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:46 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:47 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:48 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:49 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:50 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:51 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:52 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:53 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:54 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:55 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:56 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:57 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:58 oytf00001-2 repctl[4391]: [i] execute(1768): .
2019:07:22-14:37:59 oytf00001-2 repctl[4391]: [i] execute(1768): stopped waiting
2019:07:22-14:37:59 oytf00001-2 repctl[4391]: [i] execute(1768): server is still starting up
2019:07:22-14:37:59 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 108 59.790" name="HA control: cmd = 'sync stop 1 database'"
2019:07:22-14:37:59 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 109 59.790" name="Deactivating sync process for database on node 1"
2019:07:22-14:37:59 oytf00001-2 repctl[4391]: [i] setup_replication(278): checkinterval 300
2019:07:22-14:37:59 oytf00001-2 repctl[4391]: [i] setup_replication(278): checkinterval 300
2019:07:22-14:38:43 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 110 43.509" name="Initial synchronization finished!"
2019:07:22-14:38:43 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 111 43.509" name="state change SYNCING(3) -> UNLINKED(1)"
2019:07:22-14:38:43 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 943 43.865" name="Node 2 changed state: SYNCING(3) -> UNLINKED(1)"
2019:07:22-14:38:44 oytf00001-1 repctl[16339]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
 


This thread was automatically locked due to age.
Parents
  • Hallo Jens,

    Variante 1: behebe die Ursache un den Status unlinked. Fehlt hier ggf. eine Link (physisch/logisch), so dass der Status dies meldet? Ein regulärer HA Failover würde so auch nicht zustande kommen.

    Variante 2: entferne den unlinked Node aus dem HA Verbund. Führe das Update wie gewünscht durch.

    Beste Grüße

    Alex

    -

  • Hallo Alex,

     

    optimal ist natürlich wenn der Slave Node wieder auf ready gehen würde. Ich habe die Screenshots schon gepostet von dem

    HA NIC. Der ist auf beiden Seiten verbunden. Dier Slave wird ja auch syncronisiert  aber dann deaktiviert er den Sync Prozess.

     

    Hat sich nach dem letzten Update etwas geändert mir ist nicht in Erinnerung das es normal ist das der Cluster auf unlinked geht?

     

    2019:07:22-14:37:59 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 109 59.790" name="Deactivating sync process for database on node 1"
    2019:07:22-14:37:59 oytf00001-2 repctl[4391]:  setup_replication(278): checkinterval 300
    2019:07:22-14:37:59 oytf00001-2 repctl[4391]:  setup_replication(278): checkinterval 300
    2019:07:22-14:38:43 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 110 43.509" name="Initial synchronization finished!"
    2019:07:22-14:38:43 oytf00001-2 ha_daemon[4327]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 111 43.509" name="state change SYNCING(3) -> UNLINKED(1)"
    2019:07:22-14:38:43 oytf00001-1 ha_daemon[4457]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 943 43.865" name="Node 2 changed state: SYNCING(3) -> UNLINKED(1)"
    2019:07:22-14:38:44 oytf00001-1 repctl[16339]:  recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
     
     
  • Hallo Jens,

    der Status unlinked bezieht sich auch nicht auf den HA Link, sondern auf alle anderen "Links". Der Slave Node muss identisch wie der Master Node verkabelt sein.

    Ist dies denn der Fall?

    Beste Grüße

    Alex

    -

Reply Children
Share Feedback
×

Submitted a Tech Support Case lately from the Support Portal?