my problem with clusterXL_admin command

Posted on February 6, 2013

The clusterXL_admin command is really fine, but i think it is not ready now. It has just one big problem – in my point of view. But the whole post has really low priority…it is really only for geeks. :-)
Lets say the cluster members should be restarted but with a controlled way. How can I use this command?

1. Make the active Firewall down:

[Expert@firewall1]# clusterXL_admin down
Setting member to administratively down state ...
Member current state is Down

2. Check what happened:

[Expert@firewall1]# cpstat -f all ha

Product name:        High Availability
Major version:       6
Minor version:       0
Service pack:        2
Version string:      N/A
Status code:         2
Status short:        Problem
Status long:         Refer to the Notification and Interfaces tables for information about the problem
HA installed:        1
Working mode:        High Availability (Active Up)
HA protocol version: 2
HA started:          yes
HA state:            down
HA identifier:       2


Interface table
----------------------------------------------------------------------
|Name   |IP           |Status      |Verified  |Trusted|Shared|Netmask|
----------------------------------------------------------------------
|Mgmt   |  222.44.0.30|Disconnected|2516509204|      0|     2|0.0.0.0|
|eth3-01|      0.0.0.0|Disconnected|2516509204|      0|     2|0.0.0.0|
|eth3-02|      0.0.0.0|Disconnected|2516509204|      0|     2|0.0.0.0|
|bond1  |  20.240.40.3|Up          |       300|      0|     2|0.0.0.0|
|bond0  |232.35.228.19|Up          |       300|      1|     2|0.0.0.0|
|Sync   |      0.0.0.0|Up          |       300|      0|     2|0.0.0.0|
----------------------------------------------------------------------



Problem Notification table
-------------------------------------------------
|Name           |Status |Priority|Verified|Descr|
-------------------------------------------------
|Synchronization|OK     |       0|     852|     |
|Filter         |OK     |       0|     852|     |
|fwd            |OK     |       0|       0|     |
|cphad          |OK     |       0|       0|     |
|admin_down     |problem|       0|      10|     |
-------------------------------------------------



Cluster IPs table
--------------------------------------------------------------------
|Name |IP           |Netmask        |Member Network|Member Netmask |
--------------------------------------------------------------------
|bond1|  10.250.81.1|255.255.255.192|   10.250.81.0|255.255.255.192|
|bond1|  20.240.40.1|255.255.255.128|   20.240.40.0|255.255.255.128|
|bond0|232.35.228.17|255.255.255.248| 232.35.228.16|255.255.255.248|
--------------------------------------------------------------------



Sync table
-------------------------------------
|Name |IP           |Netmask        |
-------------------------------------
|bond0|232.35.228.19|255.255.255.248|
-------------------------------------

the command registers a device, that is called admin_down with problem:

[Expert@firewall1]# cphaprob list

Built-in Devices:

Device Name: Interface Active Check
Current state: OK

Registered Devices:

Device Name: Synchronization
Registration number: 0
Timeout: none
Current state: OK
Time since last report: 910.2 sec

Device Name: Filter
Registration number: 1
Timeout: none
Current state: OK
Time since last report: 910.2 sec

Device Name: fwd
Registration number: 2
Timeout: 2 sec
Current state: OK
Time since last report: 0.9 sec

Device Name: cphad
Registration number: 3
Timeout: 2 sec
Current state: OK
Time since last report: 0.7 sec

Device Name: admin_down
Registration number: 4
Timeout: none
Current state: problem
Time since last report: 68.7 sec

check the other firewall if it is active now:

[Expert@firewall2]# cpstat -f all ha

Product name:        High Availability
Major version:       6
Minor version:       0
Service pack:        2
Version string:      N/A
Status code:         0
Status short:        OK
Status long:         Refer to the Notification and Interfaces tables for information about the problem
HA installed:        1
Working mode:        High Availability (Active Up)
HA protocol version: 2
HA started:          yes
HA state:            active
HA identifier:       1


Interface table
--------------------------------------------------------------------
|Name   |IP           |Status      |Verified|Trusted|Shared|Netmask|
--------------------------------------------------------------------
|Sync   |      0.0.0.0|Up          |       0|      0|     2|0.0.0.0|
|Mgmt   |  222.44.0.29|Disconnected|  275100|      0|     2|0.0.0.0|
|eth3-01|      0.0.0.0|Disconnected|  275100|      0|     2|0.0.0.0|
|eth3-02|      0.0.0.0|Disconnected|  275100|      0|     2|0.0.0.0|
|bond0  |232.35.228.18|Up          |       0|      0|     2|0.0.0.0|
|bond1  |  20.240.40.2|Up          |       0|      0|     2|0.0.0.0|
--------------------------------------------------------------------



Problem Notification table
------------------------------------------------
|Name           |Status|Priority|Verified|Descr|
------------------------------------------------
|Synchronization|OK    |       0|     244|     |
|Filter         |OK    |       0|     244|     |
|cphad          |OK    |       0|       0|     |
|fwd            |OK    |       0|       0|     |
------------------------------------------------



Cluster IPs table
--------------------------------------------------------------------
|Name |IP           |Netmask        |Member Network|Member Netmask |
--------------------------------------------------------------------
|bond0|232.35.228.17|255.255.255.248| 232.35.228.16|255.255.255.248|
|bond1|  10.250.81.1|255.255.255.192|   10.250.81.0|255.255.255.192|
|bond1|  20.240.40.1|255.255.255.128|   20.240.40.0|255.255.255.128|
--------------------------------------------------------------------



Sync table
-----------------
|Name|IP|Netmask|
-----------------
-----------------

reboot the firewall

[Expert@firewall1]#reboot
...

If the firewall is back it stays in standby state since it sees another active member. This is good, it better to stay standby (automatic failback without human control is not the most secure way):

[Expert@firewall1]# cpstat -f all ha

Product name:        High Availability
Major version:       6
Minor version:       0
Service pack:        2
Version string:      N/A
Status code:         0
Status short:        OK
Status long:         Refer to the Notification and Interfaces tables for information about the problem
HA installed:        1
Working mode:        High Availability (Active Up)
HA protocol version: 2
HA started:          yes
HA state:            standby
HA identifier:       2


Interface table
--------------------------------------------------------------------
|Name   |IP           |Status      |Verified|Trusted|Shared|Netmask|
--------------------------------------------------------------------
|Sync   |      0.0.0.0|Up          |       0|      0|     2|0.0.0.0|
|Mgmt   |  222.44.0.30|Disconnected|   95200|      0|     2|0.0.0.0|
|eth3-01|      0.0.0.0|Disconnected|   95200|      0|     2|0.0.0.0|
|eth3-02|      0.0.0.0|Disconnected|   95200|      0|     2|0.0.0.0|
|bond0  |232.35.228.19|Up          |       0|      1|     2|0.0.0.0|
|bond1  |  20.240.40.3|Up          |       0|      0|     2|0.0.0.0|
--------------------------------------------------------------------



Problem Notification table
------------------------------------------------
|Name           |Status|Priority|Verified|Descr|
------------------------------------------------
|Synchronization|OK    |       0|      64|     |
|Filter         |OK    |       0|      64|     |
|cphad          |OK    |       0|       0|     |
|fwd            |OK    |       0|       0|     |
------------------------------------------------



Cluster IPs table
--------------------------------------------------------------------
|Name |IP           |Netmask        |Member Network|Member Netmask |
--------------------------------------------------------------------
|bond0|232.35.228.17|255.255.255.248| 232.35.228.16|255.255.255.248|
|bond1|  10.250.81.1|255.255.255.192|   10.250.81.0|255.255.255.192|
|bond1|  20.240.40.1|255.255.255.128|   20.240.40.0|255.255.255.128|
--------------------------------------------------------------------



Sync table
-------------------------------------
|Name |IP           |Netmask        |
-------------------------------------
|bond0|232.35.228.19|255.255.255.248|
-------------------------------------

Question is:
What should I do to make it active again and the secondary to standby? You cant use the “clusterXL_admin down” command, since it will put the firewall in a state where it cannot be master anymore if the other firewall falls out.

Answer:
You have to burn the policy again, that – in my point of you – not the best way or you use the cphaprob command. A Policy installation on a firewall with fast high cpu or really big rulebase
can easily goes wrong. If you burn the policy the order of cluster members will be activated again.

On a Cisco ASA it just one command that works in both direction, it is “no failover active”. Another example is if we stay by Checkpoint is the VRRP, that I find much more better then ClusterXL.
With VRRP you have to change the VRRP Priority on the Firewall directly and that’s it. And the VRRP was earlier free, in GAIA it may not free anymore. But the code is in opensource word free: http://www.keepalived.org/. I do not thing Checkpoint wrote its own code if it is publicly available, but I guess the VRRP Support will fall out, since they have their own proprietary HA software.

Tagged: Checkpoint Clusterxl, failover, vrrp

Posted in: Checkpoint, High Availability, Security

5 Responses “my problem with clusterXL_admin command” →

wml

February 8, 2013

clusterxl_admin up does not work?

Reply

itsecworks

February 8, 2013

No :-) But test it. Every Firewall Administrator using SPLAT or GAIA or whatever should know how to failover and fail back their own firewall. Test your failover scenario in a planned maintenance window or it may surprise you during a real cluster problem.

Reply

wml

February 8, 2013

well, then following sequence shall work:

PRI (ACTIVE): clusterxl_admin down –> PRI goes STANDBY, SEC goes ACTIVE
PRI (STANDBY): clusterxl_admin up SEC still ACTIVE
SEC(ACTIVE): clusterxl_admin down –> SEC goes STANDBY, PRI goes ACTIVE
SEC (STANDBY): clusterxl_admin up PRI still ACTIVE, both are up

will test at earliest opportunity

Reply

itsecworks

February 9, 2013

Sounds really good, thanks for the idea!

Reply

raul diaz

September 17, 2014

in the problematic member cluster, the output of cphaprob -ia list show this error:

Device Name: IPSO member status
Current state: problem

The solution was push policies and no more error messages

Reply

my problem with clusterXL_admin command

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta

my problem with clusterXL_admin command

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta