my problem with clusterXL_admin command

Posted on February 6, 2013

5



The clusterXL_admin command is really fine, but i think it is not ready now. It has just one big problem – in my point of view. But the whole post has really low priority…it is really only for geeks. :-)
Lets say the cluster members should be restarted but with a controlled way. How can I use this command?

1. Make the active Firewall down:

[Expert@firewall1]# clusterXL_admin down
Setting member to administratively down state ...
Member current state is Down

2. Check what happened:

[Expert@firewall1]# cpstat -f all ha

Product name:        High Availability
Major version:       6
Minor version:       0
Service pack:        2
Version string:      N/A
Status code:         2
Status short:        Problem
Status long:         Refer to the Notification and Interfaces tables for information about the problem
HA installed:        1
Working mode:        High Availability (Active Up)
HA protocol version: 2
HA started:          yes
HA state:            down
HA identifier:       2


Interface table
----------------------------------------------------------------------
|Name   |IP           |Status      |Verified  |Trusted|Shared|Netmask|
----------------------------------------------------------------------
|Mgmt   |  222.44.0.30|Disconnected|2516509204|      0|     2|0.0.0.0|
|eth3-01|      0.0.0.0|Disconnected|2516509204|      0|     2|0.0.0.0|
|eth3-02|      0.0.0.0|Disconnected|2516509204|      0|     2|0.0.0.0|
|bond1  |  20.240.40.3|Up          |       300|      0|     2|0.0.0.0|
|bond0  |232.35.228.19|Up          |       300|      1|     2|0.0.0.0|
|Sync   |      0.0.0.0|Up          |       300|      0|     2|0.0.0.0|
----------------------------------------------------------------------



Problem Notification table
-------------------------------------------------
|Name           |Status |Priority|Verified|Descr|
-------------------------------------------------
|Synchronization|OK     |       0|     852|     |
|Filter         |OK     |       0|     852|     |
|fwd            |OK     |       0|       0|     |
|cphad          |OK     |       0|       0|     |
|admin_down     |problem|       0|      10|     |
-------------------------------------------------



Cluster IPs table
--------------------------------------------------------------------
|Name |IP           |Netmask        |Member Network|Member Netmask |
--------------------------------------------------------------------
|bond1|  10.250.81.1|255.255.255.192|   10.250.81.0|255.255.255.192|
|bond1|  20.240.40.1|255.255.255.128|   20.240.40.0|255.255.255.128|
|bond0|232.35.228.17|255.255.255.248| 232.35.228.16|255.255.255.248|
--------------------------------------------------------------------



Sync table
-------------------------------------
|Name |IP           |Netmask        |
-------------------------------------
|bond0|232.35.228.19|255.255.255.248|
-------------------------------------

the command registers a device, that is called admin_down with problem:

[Expert@firewall1]# cphaprob list

Built-in Devices:

Device Name: Interface Active Check
Current state: OK

Registered Devices:

Device Name: Synchronization
Registration number: 0
Timeout: none
Current state: OK
Time since last report: 910.2 sec

Device Name: Filter
Registration number: 1
Timeout: none
Current state: OK
Time since last report: 910.2 sec

Device Name: fwd
Registration number: 2
Timeout: 2 sec
Current state: OK
Time since last report: 0.9 sec

Device Name: cphad
Registration number: 3
Timeout: 2 sec
Current state: OK
Time since last report: 0.7 sec

Device Name: admin_down
Registration number: 4
Timeout: none
Current state: problem
Time since last report: 68.7 sec

check the other firewall if it is active now:

[Expert@firewall2]# cpstat -f all ha

Product name:        High Availability
Major version:       6
Minor version:       0
Service pack:        2
Version string:      N/A
Status code:         0
Status short:        OK
Status long:         Refer to the Notification and Interfaces tables for information about the problem
HA installed:        1
Working mode:        High Availability (Active Up)
HA protocol version: 2
HA started:          yes
HA state:            active
HA identifier:       1


Interface table
--------------------------------------------------------------------
|Name   |IP           |Status      |Verified|Trusted|Shared|Netmask|
--------------------------------------------------------------------
|Sync   |      0.0.0.0|Up          |       0|      0|     2|0.0.0.0|
|Mgmt   |  222.44.0.29|Disconnected|  275100|      0|     2|0.0.0.0|
|eth3-01|      0.0.0.0|Disconnected|  275100|      0|     2|0.0.0.0|
|eth3-02|      0.0.0.0|Disconnected|  275100|      0|     2|0.0.0.0|
|bond0  |232.35.228.18|Up          |       0|      0|     2|0.0.0.0|
|bond1  |  20.240.40.2|Up          |       0|      0|     2|0.0.0.0|
--------------------------------------------------------------------



Problem Notification table
------------------------------------------------
|Name           |Status|Priority|Verified|Descr|
------------------------------------------------
|Synchronization|OK    |       0|     244|     |
|Filter         |OK    |       0|     244|     |
|cphad          |OK    |       0|       0|     |
|fwd            |OK    |       0|       0|     |
------------------------------------------------



Cluster IPs table
--------------------------------------------------------------------
|Name |IP           |Netmask        |Member Network|Member Netmask |
--------------------------------------------------------------------
|bond0|232.35.228.17|255.255.255.248| 232.35.228.16|255.255.255.248|
|bond1|  10.250.81.1|255.255.255.192|   10.250.81.0|255.255.255.192|
|bond1|  20.240.40.1|255.255.255.128|   20.240.40.0|255.255.255.128|
--------------------------------------------------------------------



Sync table
-----------------
|Name|IP|Netmask|
-----------------
-----------------

reboot the firewall

[Expert@firewall1]#reboot
...

If the firewall is back it stays in standby state since it sees another active member. This is good, it better to stay standby (automatic failback without human control is not the most secure way):

[Expert@firewall1]# cpstat -f all ha

Product name:        High Availability
Major version:       6
Minor version:       0
Service pack:        2
Version string:      N/A
Status code:         0
Status short:        OK
Status long:         Refer to the Notification and Interfaces tables for information about the problem
HA installed:        1
Working mode:        High Availability (Active Up)
HA protocol version: 2
HA started:          yes
HA state:            standby
HA identifier:       2


Interface table
--------------------------------------------------------------------
|Name   |IP           |Status      |Verified|Trusted|Shared|Netmask|
--------------------------------------------------------------------
|Sync   |      0.0.0.0|Up          |       0|      0|     2|0.0.0.0|
|Mgmt   |  222.44.0.30|Disconnected|   95200|      0|     2|0.0.0.0|
|eth3-01|      0.0.0.0|Disconnected|   95200|      0|     2|0.0.0.0|
|eth3-02|      0.0.0.0|Disconnected|   95200|      0|     2|0.0.0.0|
|bond0  |232.35.228.19|Up          |       0|      1|     2|0.0.0.0|
|bond1  |  20.240.40.3|Up          |       0|      0|     2|0.0.0.0|
--------------------------------------------------------------------



Problem Notification table
------------------------------------------------
|Name           |Status|Priority|Verified|Descr|
------------------------------------------------
|Synchronization|OK    |       0|      64|     |
|Filter         |OK    |       0|      64|     |
|cphad          |OK    |       0|       0|     |
|fwd            |OK    |       0|       0|     |
------------------------------------------------



Cluster IPs table
--------------------------------------------------------------------
|Name |IP           |Netmask        |Member Network|Member Netmask |
--------------------------------------------------------------------
|bond0|232.35.228.17|255.255.255.248| 232.35.228.16|255.255.255.248|
|bond1|  10.250.81.1|255.255.255.192|   10.250.81.0|255.255.255.192|
|bond1|  20.240.40.1|255.255.255.128|   20.240.40.0|255.255.255.128|
--------------------------------------------------------------------



Sync table
-------------------------------------
|Name |IP           |Netmask        |
-------------------------------------
|bond0|232.35.228.19|255.255.255.248|
-------------------------------------

Question is:
What should I do to make it active again and the secondary to standby? You cant use the “clusterXL_admin down” command, since it will put the firewall in a state where it cannot be master anymore if the other firewall falls out.

Answer:
You have to burn the policy again, that – in my point of you – not the best way or you use the cphaprob command. A Policy installation on a firewall with fast high cpu or really big rulebase
can easily goes wrong. If you burn the policy the order of cluster members will be activated again.

On a Cisco ASA it just one command that works in both direction, it is “no failover active”. Another example is if we stay by Checkpoint is the VRRP, that I find much more better then ClusterXL.
With VRRP you have to change the VRRP Priority on the Firewall directly and that’s it. And the VRRP was earlier free, in GAIA it may not free anymore. But the code is in opensource word free: http://www.keepalived.org/. I do not thing Checkpoint wrote its own code if it is publicly available, but I guess the VRRP Support will fall out, since they have their own proprietary HA software.

Advertisement