In an active-standby Linux cluster configuration, all the critical services including IP, filesystem will failover from one node to another node in the cluster.
This tutorials explains in detail on how to create and configure two node redhat cluster using command line utilities.
The following are the high-level steps involved in configuring Linux cluster on Redhat or CentOS:
- Install and start RICCI cluster service
- Create cluster on active node
- Add a node to cluster
- Add fencing to cluster
- Configure failover domain
- Add resources to cluster
- Sync cluster configuration across nodes
- Start the cluster
- Verify failover by shutting down an active node
1. Required Cluster Packages
First make sure the following cluster packages are installed. If you don’t have these packages install them using yum command.
[root@rh1 ~]# rpm -qa | egrep -i "ricci|luci|cluster|ccs|cman" modcluster-0.16.2-28.el6.x86_64 luci-0.26.0-48.el6.x86_64 ccs-0.16.2-69.el6.x86_64 ricci-0.16.2-69.el6.x86_64 cman-220.127.116.11-59.el6.x86_64 clusterlib-18.104.22.168-59.el6.x86_64
2. Start RICCI service and Assign Password
Next, start ricci service on both the nodes.
[root@rh1 ~]# service ricci start Starting oddjobd: [ OK ] generating SSL certificates... done Generating NSS database... done Starting ricci: [ OK ]
You also need to assign a password for the RICCI on both the nodes.
[root@rh1 ~]# passwd ricci Changing password for user ricci. New password: Retype new password: passwd: all authentication tokens updated successfully.
Also, If you are running iptables firewall, keep in mind that you need to have appropriate firewall rules on both the nodes to be able to talk to each other.
3. Create Cluster on Active Node
From the active node, please run the below command to create a new cluster.
The following command will create the cluster configuration file /etc/cluster/cluster.conf. If the file already exists, it will replace the existing cluster.conf with the newly created cluster.conf.
[root@rh1 ~]# ccs -h rh1.mydomain.net --createcluster mycluster rh1.mydomain.net password: [root@rh1 ~]# ls -l /etc/cluster/cluster.conf -rw-r-----. 1 root root 188 Sep 26 17:40 /etc/cluster/cluster.conf
Also keep in mind that we are running these commands only from one node on the cluster and we are not yet ready to propagate the changes to the other node on the cluster.
4. Initial Plain cluster.conf File
After creating the cluster, the cluster.conf file will look like the following:
[root@rh1 ~]# cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster config_version="1" name="mycluster"> <fence_daemon/> <clusternodes/> <cman/> <fencedevices/> <rm> <failoverdomains/> <resources/> </rm> </cluster>
5. Add a Node to the Cluster
Once the cluster is created, we need to add the participating nodes to the cluster using the ccs command as shown below.
First, add the first node rh1 to the cluster as shown below.
[root@rh1 ~]# ccs -h rh1.mydomain.net --addnode rh1.mydomain.net Node rh1.mydomain.net added.
Next, add the second node rh2 to the cluster as shown below.
[root@rh1 ~]# ccs -h rh1.mydomain.net --addnode rh2.mydomain.net Node rh2.mydomain.net added.
Once the nodes are created, you can use the following command to view all the available nodes in the cluster. This will also display the node id for the corresponding node.
[root@rh1 ~]# ccs -h rh1 --lsnodes rh1.mydomain.net: nodeid=1 rh2.mydomain.net: nodeid=2
6. cluster.conf File After Adding Nodes
This above will also add the nodes to the cluster.conf file as shown below.
[root@rh1 ~]# cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster config_version="3" name="mycluster"> <fence_daemon/> <clusternodes> <clusternode name="rh1.mydomain.net" nodeid="1"/> <clusternode name="rh2.mydomain.net" nodeid="2"/> </clusternodes> <cman/> <fencedevices/> <rm> <failoverdomains/> <resources/> </rm> </cluster>
7. Add Fencing to Cluster
Fencing is the disconnection of a node from shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity.
A fence device is a hardware device that can be used to cut a node off from shared storage.
This can be accomplished in a variety of ways: powering off the node via a remote power switch, disabling a Fiber Channel switch port, or revoking a host’s SCSI 3 reservations.
A fence agent is a software program that connects to a fence device in order to ask the fence device to cut off access to a node’s shared storage (via powering off the node or removing access to the shared storage by other means).
Execute the following command to enable fencing.
[root@rh1 ~]# ccs -h rh1 --setfencedaemon post_fail_delay=0 [root@rh1 ~]# ccs -h rh1 --setfencedaemon post_join_delay=25
Next, add a fence device. There are different types of fencing devices available. If you are using virtual machine to build a cluster, use fence_virt device as shown below.
[root@rh1 ~]# ccs -h rh1 --addfencedev myfence agent=fence_virt
Next, add fencing method. After creating the fencing device, you need to created the fencing method and add the hosts to the fencing method.
[root@rh1 ~]# ccs -h rh1 --addmethod mthd1 rh1.mydomain.net Method mthd1 added to rh1.mydomain.net. [root@rh1 ~]# ccs -h rh1 --addmethod mthd1 rh2.mydomain.net Method mthd1 added to rh2.mydomain.net.
Finally, associate fence device to the method created above as shown below:
[root@rh1 ~]# ccs -h rh1 --addfenceinst myfence rh1.mydomain.net mthd1 [root@rh1 ~]# ccs -h rh1 --addfenceinst myfence rh2.mydomain.net mthd1
8. cluster.conf File after Fencing
Your cluster.conf will look like below after the fencing devices, methods are added.
[root@rh1 ~]# cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster config_version="10" name="mycluster"> <fence_daemon post_join_delay="25"/> <clusternodes> <clusternode name="rh1.mydomain.net" nodeid="1"> <fence> <method name="mthd1"> <device name="myfence"/> </method> </fence> </clusternode> <clusternode name="rh2.mydomain.net" nodeid="2"> <fence> <method name="mthd1"> <device name="myfence"/> </method> </fence> </clusternode> </clusternodes> <cman/> <fencedevices> <fencedevice agent="fence_virt" name="myfence"/> </fencedevices> <rm> <failoverdomains/> <resources/> </rm> </cluster>
9. Types of Failover Domain
A failover domain is an ordered subset of cluster members to which a resource group or service may be bound.
The following are the different types of failover domains:
- Restricted failover-domain: Resource groups or service bound to the domain may only run on cluster members which are also members of the failover domain. If no members of failover domain are availables, the resource group or service is placed in stopped state.
- Unrestricted failover-domain: Resource groups bound to this domain may run on all cluster members, but will run on a member of the domain whenever one is available. This means that if a resource group is running outside of the domain and member of the domain transitions online, the resource group or
- service will migrate to that cluster member.
- Ordered domain: Nodes in the ordered domain are assigned a priority level from 1-100. Priority 1 being highest and 100 being the lowest. A node with the highest priority will run the resource group. The resource if it was running on node 2, will migrate to node 1 when it becomes online.
- Unordered domain: Members of the domain have no order of preference. Any member may run in the resource group. Resource group will always migrate to members of their failover domain whenever possible.
10. Add a Filover Domain
To add a failover domain, execute the following command. In this example, I created domain named as “webserverdomain”,
[root@rh1 ~]# ccs -h rh1 --addfailoverdomain webserverdomain ordered
Once the failover domain is created, add both the nodes to the failover domain as shown below:
[root@rh1 ~]# ccs -h rh1 --addfailoverdomainnode webserverdomain rh1.mydomain.net priority=1 [root@rh1 ~]# ccs -h rh1 --addfailoverdomainnode webserverdomain rh2.mydomain.net priority=2
You can view all the nodes in the failover domain using the following command.
[root@rh1 ~]# ccs -h rh1 --lsfailoverdomain webserverdomain: restricted=0, ordered=1, nofailback=0 rh1.mydomain.net: 1 rh2.mydomain.net: 2
11. Add Resources to Cluster
Now it is time to add a resources. This indicates the services that also should failover along with ip and filesystem when a node fails. For example, the Apache webserver can be part of the failover in the Redhat Linux Cluster.
When you are ready to add resources, there are 2 ways you can do this.
You can add as global resources or add a resource directly to resource group or service.
The advantage of adding it as global resource is that if you want to add the resource to more than one service group you can just reference the global resource on your service or resource group.
In this example, we added the filesystem on a shared storage as global resource and referenced it on the service.
[root@rh1 ~]# ccs –h rh1 --addresource fs name=web_fs device=/dev/cluster_vg/vol01 mountpoint=/var/www fstype=ext4
To add a service to the cluster, create a service and add the resource to the service.
[root@rh1 ~]# ccs -h rh1 --addservice webservice1 domain=webserverdomain recovery=relocate autostart=1
Now add the following lines in the cluster.conf for adding the resource references to the service. In this example, we also added failover IP to our service.
<fs ref="web_fs"/> <ip address="192.168.1.12" monitor_link="yes" sleeptime="10"/>
In the 2nd part of this tutorial (tomorrow), we’ll explain how to sync the configurations across multiple nodes in a cluster, and how to verify the failover scenario in a cluster setup.