The clusterXL_admin command is really fine, but i think it is not ready now. It has just one big problem – in my point of view. But the whole post has really low priority…it is really only for geeks. :-)
Lets say the cluster members should be restarted but with a controlled way. How can I use this command?
1. Make the active Firewall down:
[Expert@firewall1]# clusterXL_admin down Setting member to administratively down state ... Member current state is Down |
2. Check what happened:
[Expert@firewall1]# cpstat -f all ha Product name: High Availability Major version: 6 Minor version: 0 Service pack: 2 Version string: N/A Status code: 2 Status short: Problem Status long: Refer to the Notification and Interfaces tables for information about the problem HA installed: 1 Working mode: High Availability (Active Up) HA protocol version: 2 HA started: yes HA state: down HA identifier: 2 Interface table ---------------------------------------------------------------------- |Name |IP |Status |Verified |Trusted|Shared|Netmask| ---------------------------------------------------------------------- |Mgmt | 222.44.0.30|Disconnected|2516509204| 0| 2|0.0.0.0| |eth3-01| 0.0.0.0|Disconnected|2516509204| 0| 2|0.0.0.0| |eth3-02| 0.0.0.0|Disconnected|2516509204| 0| 2|0.0.0.0| |bond1 | 20.240.40.3|Up | 300| 0| 2|0.0.0.0| |bond0 |232.35.228.19|Up | 300| 1| 2|0.0.0.0| |Sync | 0.0.0.0|Up | 300| 0| 2|0.0.0.0| ---------------------------------------------------------------------- Problem Notification table ------------------------------------------------- |Name |Status |Priority|Verified|Descr| ------------------------------------------------- |Synchronization|OK | 0| 852| | |Filter |OK | 0| 852| | |fwd |OK | 0| 0| | |cphad |OK | 0| 0| | |admin_down |problem| 0| 10| | ------------------------------------------------- Cluster IPs table -------------------------------------------------------------------- |Name |IP |Netmask |Member Network|Member Netmask | -------------------------------------------------------------------- |bond1| 10.250.81.1|255.255.255.192| 10.250.81.0|255.255.255.192| |bond1| 20.240.40.1|255.255.255.128| 20.240.40.0|255.255.255.128| |bond0|232.35.228.17|255.255.255.248| 232.35.228.16|255.255.255.248| -------------------------------------------------------------------- Sync table ------------------------------------- |Name |IP |Netmask | ------------------------------------- |bond0|232.35.228.19|255.255.255.248| ------------------------------------- |
the command registers a device, that is called admin_down with problem:
[Expert@firewall1]# cphaprob list Built-in Devices: Device Name: Interface Active Check Current state: OK Registered Devices: Device Name: Synchronization Registration number: 0 Timeout: none Current state: OK Time since last report: 910.2 sec Device Name: Filter Registration number: 1 Timeout: none Current state: OK Time since last report: 910.2 sec Device Name: fwd Registration number: 2 Timeout: 2 sec Current state: OK Time since last report: 0.9 sec Device Name: cphad Registration number: 3 Timeout: 2 sec Current state: OK Time since last report: 0.7 sec Device Name: admin_down Registration number: 4 Timeout: none Current state: problem Time since last report: 68.7 sec |
check the other firewall if it is active now:
[Expert@firewall2]# cpstat -f all ha Product name: High Availability Major version: 6 Minor version: 0 Service pack: 2 Version string: N/A Status code: 0 Status short: OK Status long: Refer to the Notification and Interfaces tables for information about the problem HA installed: 1 Working mode: High Availability (Active Up) HA protocol version: 2 HA started: yes HA state: active HA identifier: 1 Interface table -------------------------------------------------------------------- |Name |IP |Status |Verified|Trusted|Shared|Netmask| -------------------------------------------------------------------- |Sync | 0.0.0.0|Up | 0| 0| 2|0.0.0.0| |Mgmt | 222.44.0.29|Disconnected| 275100| 0| 2|0.0.0.0| |eth3-01| 0.0.0.0|Disconnected| 275100| 0| 2|0.0.0.0| |eth3-02| 0.0.0.0|Disconnected| 275100| 0| 2|0.0.0.0| |bond0 |232.35.228.18|Up | 0| 0| 2|0.0.0.0| |bond1 | 20.240.40.2|Up | 0| 0| 2|0.0.0.0| -------------------------------------------------------------------- Problem Notification table ------------------------------------------------ |Name |Status|Priority|Verified|Descr| ------------------------------------------------ |Synchronization|OK | 0| 244| | |Filter |OK | 0| 244| | |cphad |OK | 0| 0| | |fwd |OK | 0| 0| | ------------------------------------------------ Cluster IPs table -------------------------------------------------------------------- |Name |IP |Netmask |Member Network|Member Netmask | -------------------------------------------------------------------- |bond0|232.35.228.17|255.255.255.248| 232.35.228.16|255.255.255.248| |bond1| 10.250.81.1|255.255.255.192| 10.250.81.0|255.255.255.192| |bond1| 20.240.40.1|255.255.255.128| 20.240.40.0|255.255.255.128| -------------------------------------------------------------------- Sync table ----------------- |Name|IP|Netmask| ----------------- ----------------- |
reboot the firewall
[Expert@firewall1]#reboot ... |
If the firewall is back it stays in standby state since it sees another active member. This is good, it better to stay standby (automatic failback without human control is not the most secure way):
[Expert@firewall1]# cpstat -f all ha Product name: High Availability Major version: 6 Minor version: 0 Service pack: 2 Version string: N/A Status code: 0 Status short: OK Status long: Refer to the Notification and Interfaces tables for information about the problem HA installed: 1 Working mode: High Availability (Active Up) HA protocol version: 2 HA started: yes HA state: standby HA identifier: 2 Interface table -------------------------------------------------------------------- |Name |IP |Status |Verified|Trusted|Shared|Netmask| -------------------------------------------------------------------- |Sync | 0.0.0.0|Up | 0| 0| 2|0.0.0.0| |Mgmt | 222.44.0.30|Disconnected| 95200| 0| 2|0.0.0.0| |eth3-01| 0.0.0.0|Disconnected| 95200| 0| 2|0.0.0.0| |eth3-02| 0.0.0.0|Disconnected| 95200| 0| 2|0.0.0.0| |bond0 |232.35.228.19|Up | 0| 1| 2|0.0.0.0| |bond1 | 20.240.40.3|Up | 0| 0| 2|0.0.0.0| -------------------------------------------------------------------- Problem Notification table ------------------------------------------------ |Name |Status|Priority|Verified|Descr| ------------------------------------------------ |Synchronization|OK | 0| 64| | |Filter |OK | 0| 64| | |cphad |OK | 0| 0| | |fwd |OK | 0| 0| | ------------------------------------------------ Cluster IPs table -------------------------------------------------------------------- |Name |IP |Netmask |Member Network|Member Netmask | -------------------------------------------------------------------- |bond0|232.35.228.17|255.255.255.248| 232.35.228.16|255.255.255.248| |bond1| 10.250.81.1|255.255.255.192| 10.250.81.0|255.255.255.192| |bond1| 20.240.40.1|255.255.255.128| 20.240.40.0|255.255.255.128| -------------------------------------------------------------------- Sync table ------------------------------------- |Name |IP |Netmask | ------------------------------------- |bond0|232.35.228.19|255.255.255.248| ------------------------------------- |
Question is:
What should I do to make it active again and the secondary to standby? You cant use the “clusterXL_admin down” command, since it will put the firewall in a state where it cannot be master anymore if the other firewall falls out.
Answer:
You have to burn the policy again, that – in my point of you – not the best way or you use the cphaprob command. A Policy installation on a firewall with fast high cpu or really big rulebase
can easily goes wrong. If you burn the policy the order of cluster members will be activated again.
On a Cisco ASA it just one command that works in both direction, it is “no failover active”. Another example is if we stay by Checkpoint is the VRRP, that I find much more better then ClusterXL.
With VRRP you have to change the VRRP Priority on the Firewall directly and that’s it. And the VRRP was earlier free, in GAIA it may not free anymore. But the code is in opensource word free: http://www.keepalived.org/. I do not thing Checkpoint wrote its own code if it is publicly available, but I guess the VRRP Support will fall out, since they have their own proprietary HA software.
wml
February 8, 2013
clusterxl_admin up does not work?
itsecworks
February 8, 2013
No :-) But test it. Every Firewall Administrator using SPLAT or GAIA or whatever should know how to failover and fail back their own firewall. Test your failover scenario in a planned maintenance window or it may surprise you during a real cluster problem.
wml
February 8, 2013
well, then following sequence shall work:
PRI (ACTIVE): clusterxl_admin down –> PRI goes STANDBY, SEC goes ACTIVE
PRI (STANDBY): clusterxl_admin up SEC still ACTIVE
SEC(ACTIVE): clusterxl_admin down –> SEC goes STANDBY, PRI goes ACTIVE
SEC (STANDBY): clusterxl_admin up PRI still ACTIVE, both are up
will test at earliest opportunity
itsecworks
February 9, 2013
Sounds really good, thanks for the idea!
raul diaz
September 17, 2014
in the problematic member cluster, the output of cphaprob -ia list show this error:
Device Name: IPSO member status
Current state: problem
The solution was push policies and no more error messages