How to Sync and Start Redhat Cluster to Verify Failover Scenario

by Karthikeyan Sadhasivam on March 19, 2015

In the first part, we explained in detail on how to install and configure 2 Node RedHat Cluster.

We covered the following high-level steps in the previous tutorial:

Install and start RICCI cluster service
Create cluster on active node
Add a node to cluster
Add fencing to cluster
Configure failover domain
Add resources to cluster

In this tutorial, we’ll cover the following high-level steps to finish the cluster setup:

Sync cluster configuration across nodes
Start the cluster
Verify failover by shutting down an active node

1. Sync Configurations across Nodes

Anytime a configuration change is made, or the 1st time when you are installing and configuring the cluster, you should sync the configurations from active node to all the nodes.

The following command will sync the cluster configurations to all the available nodes:

[root@rh1 ~]# ccs -h rh1 --sync --activate
rh2 password:

2. Verify Cluster Configuration

Finally, verify that the configurations are valid as shown below.

[root@rh1 ~]# ccs -h rh1 --checkconf
All nodes in sync.

If there are any configuration issues, or when the configurations on the active node does not match the configurations on all the nodes in the cluster, the above command will list them appropriately.

3. Start the Cluster

To start the cluster on Node1, do the following:

[root@rh1 ~]# ccs -h rh1 –start

To start cluster on both the nodes, do the following:

[root@rh1 ~]# ccs -h rh1 –startall

To stop the cluster on Node1, do the following:

[root@rh1 ~]# ccs -h rh1 –stop

To stop cluster on both the nodes, do the following:

[root@rh1 ~]# ccs -h rh1 –stopall

4. View Cluster Status

When everything is up and running in your Redhat or CentOS Linux Cluster, you can view the cluster status as shown below:

[root@rh1 cluster]# clustat
Cluster Status for mycluster @ Sat Mar 15 02:05:59 2015
Member Status: Quorate

 Member Name      ID   Status
 ------ ----      ---- ------
 rh1                 1 Online, Local, rgmanager
 rh2                 2 Online

 Service Name          Owner (Last)  State
 ------- ----          ----- ------  -----
 service:webservice1   rh1           started

As you see in the above output, it indicates that there are two nodes in our cluster, and both the nodes are online, and and rh1 is the active node.

5. Verify Cluster Failover

To verify the failover of the cluster, stop the cluster on active node or shut-down the active node. This should force the cluster to automatically failover the IP resource and filesystem resource to the next available node defined in the failover domain.

This is what we currently see on the node1.

[root@rh1 ~]# clustat
Cluster Status for mycluster @ Sat Mar 15 14:16:00 2015
Member Status: Quorate

 Member Name  ID   Status
 ------ ----  ---- ------
 rh1             1 Online, Local, rgmanager
 rh2             2 Online, rgmanager

 Service Name         Owner (Last)  State
 ------- ----         ----- ------  -----
 service:webservice1  rh1           started

[root@rh1 ~]# hostname
rh1.mydomain.net

[root@rh1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:e6:6d:b7 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.12/24 scope global secondary eth0
    inet6 fe80::a00:27ff:fee6:6db7/64 scope link
       valid_lft forever preferred_lft forever

[root@rh1 ~]# df -h /var/www
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/cluster_vg-vol01  993M   18M  925M   2% /var/www

5. Force Cluster Failover

Now bring down the node1, and all the service and resource should failover to second node and you will see like the below output.

[root@rh1 ~]# shutdown -h now

After the node1 is down, the following is what you’ll see on the node1.

root@rh2 ~]# clustat
Cluster Status for mycluster @ Sat Mar 18 14:41:23 2015
Member Status: Quorate

 Member Name   ID   Status
 ------ ----   ---- ------
 rh1              1 Offline
 rh2              2 Online, Local, rgmanager

 Service Name         Owner (Last)  State
 ------- ----         ----- ------  -----
 service:webservice1  rh2           started

The above output indicates that there are two nodes in the cluster (rh1 and rh2). rh1 is down, and currently rh2 is the active node.

Also, as you see below, on rh2, the filesystem and the ip-address got failover from rh1 without any issues.

[root@rh2 ~]# df -h /var/www
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/cluster_vg-vol01  993M   18M  925M   2% /var/www

[root@rh2 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP ql                        en 1000
    link/ether 08:00:27:e6:6d:b7 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.11/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.12/24 scope global secondary eth0
    inet6 fe80::a00:27ff:fee6:6db7/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever

6. Full Working cluster.conf Example File

The following the final working cluster.conf configuration file for a 2 node redhat cluster.

[root@rh1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="28" name="mycluster">
  <fence_daemon post_join_delay="25"/>
  <clusternodes>
    <clusternode name="rh1" nodeid="1">
      <fence>
        <method name="mthd1">
          <device name="myfence"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="rh2" nodeid="2">
      <fence>
        <method name="mthd1">
          <device name="myfence"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman expected_votes="1" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_virt" name="myfence"/>
  </fencedevices>
  <rm>
    <failoverdomains>
      <failoverdomain name="webserverdomain" nofailback="0" ordered="1" restricted="0">
        <failoverdomainnode name="rh1"/>
        <failoverdomainnode name="rh2"/>
      </failoverdomain>
    </failoverdomains>
    <resources>
      <fs device="/dev/cluster_vg/vol01" fstype="ext4" mountpoint="/var/www" name="web_fs"/>
    </resources>
    <service autostart="1" domain="webserverdomain" name="webservice1" recovery="relocate">
      <fs ref="web_fs"/>
      <ip address="192.168.1.12" monitor_link="yes" sleeptime="10"/>
    </service>
  </rm>
</cluster>

Add your comment

If you enjoyed this article, you might also like..

Comments on this entry are closed.

HK March 20, 2015, 7:31 am

This is one of best articles out there on how to setup a cluster. Thank you.

While trying, I do get the following error when doing clustat.

“Could not connect to CMAN: No such file or directory”

Is there something I missed?

∞
Monte March 20, 2015, 2:36 pm

Hello Ramesh
Curious no mention of a quorum for 2
node cluster possible brain split ?
Thank you

∞
Jalal Hajigholamali March 21, 2015, 1:09 am

Hi,
Thanks a lot
very nice article…

∞
RovshanP March 30, 2015, 10:44 pm

Hi Karthikeyan Sadhasivam,

First of all, thanks for such nice tutorial.
Secondly, I would like to ask which RedHat (CentOS) version are you presenting here.
Can we use this tutorial for Version 7?

Thank you in advance.

∞
Chandran April 29, 2015, 6:01 am

very much useful articles, please share redhat cluster troubleshooting article also..thanks a lot..

∞
TheUnF June 7, 2015, 7:35 am

I´m having a hard time creating a pcs resource.

Do you have more information about creating services ?

∞
Joe June 30, 2015, 1:51 pm

To initiate a failover you could also use

clusvcadm -r -m

Where -r = relocate and -m = member to re locate to. This is the more graceful way to initiate the moving of the service to the passive node.

Now on to my question. I am having issues with my failover/relocate where the shutdown on the active is successful, but the start on the passive fails. I can use clusvcadm -e on either node to start the service, but the failover will not start the service on the other node and restarts it on the same node.

There is no detail in the logs I’ve found. Just “clurgmgrd: #68: Failed to start service:xxx”. Do you have any hints for me to find more info on what is failing?

thanks

∞
Noufal August 29, 2015, 1:21 am

sir
Is this article is for RHEL 6?

∞
gaurav sharma April 3, 2016, 12:31 am

It’s very nice article very helpful to me but can you help me
while sync mine cluster i am getting error

“unable to connect node2 , make sure ricci server started”

as i am making cluster in mine VMware workstation

∞

Next post: 5 Steps to Develop a Basic AngularJS Application with Example

Previous post: How to Configure Linux Cluster with 2 Nodes on RedHat and CentOS