≡ Menu

Howto Setup Apache Zookeeper Cluster on Multiple Nodes in Linux

Apache ZookeeperIf you are running Apache zookeeper in your infrastructure, you should set it up to run in a cluster mode. Zookeeper cluster is called as ensemble.

For a cluster to be always up and running, majority of the nodes in the cluster should be up. So, it is always recommended to run zookeeper cluster in odd number of servers. For example, cluster with 3 nodes, or cluster with 5 nodes, etc.

In this tutorial, we’ll setup zookeeper cluster with 3 node setup on the following servers: node1, node2, and node3.

Java Pre-req

For zookeeper, you should have java already installed on your system. JKD version 6 or above will work with Zookeeper.

The following will install the latest Java version on your system:

yum install java-1.8.0-openjdk

Verify that the java is installed properly.

# java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

Verify: Start zookeeper in Standalone Mode for Testing

Before we start the zookeeper cluster, first start the zookeeper on the individual machine in a single node configuration (without cluster setup) to make sure it works properly.

This way, we’ll isolate any non-cluster related issues and fix them first on the individual nodes.

In this example, I’ve installed zookeeper under /opt/zookeeper directory. This is using the latest zookeeper 3.4.9 version:

ZOOKEEPER_HOME=/opt/zookeeper

On node1, use the zookeeper’s sample configuration file zoo_sample.cfg as baseline.

cd $ZOOKEEPER_HOME/conf
cp zoo_sample.cfg zoo.cfg

From now on, we’ll use the zoo.cfg as our configuration file. We’ll modify this for our cluster setup later.

On node1, execute the following command to start the single node zookeeper.

cd $ZOOKEEPER_HOME
java -cp zookeeper-3.4.9.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar:conf \
  org.apache.zookeeper.server.quorum.QuorumPeerMain \
  conf/zoo.cfg

In the above command:

  • Specify the jar files that should be included when zookeeper is started. This includes the zookeeper jar file, log4j, slf4j and slf4j-api jar files. All these jar files comes with the zookeeper installation, and you don’t have to download them separately.
  • QuorumPeerMain is the name of the main class that should be invoked to start the zookeeper.
  • conf/zoo.cfg is the zookeeper configuration file.

If everything goes well, you’ll get the following output on the screen. In the beginning of each of the following line, it will have the time stamp followed by “[myid:] – INFO ”

[main:QuorumPeerConfig@124] - Reading configuration from: conf/zoo.cfg
[main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
[main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
[main:DatadirCleanupManager@101] - Purge task is not scheduled.
[main:QuorumPeerMain@113] - Either no config or no quorum defined in config, running  in standalone mode
[main:QuorumPeerConfig@124] - Reading configuration from: conf/zoo.cfg
[main:ZooKeeperServerMain@96] - Starting server
[main:Environment@100] - Server environment:zookeeper.version=3.4.9-1757313, built on 08/23/2016 06:50 GMT
[main:Environment@100] - Server environment:host.name=node1.thegeekstuff.com
[main:Environment@100] - Server environment:java.version=1.8.0_91
[main:Environment@100] - Server environment:java.vendor=Oracle Corporation
[main:Environment@100] - Server environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-0.b14.el7_2.x86_64/jre
[main:Environment@100] - Server environment:java.class.path=zookeeper-3.4.9.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar:conf
[main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
[main:Environment@100] - Server environment:java.io.tmpdir=/tmp
[main:Environment@100] - Server environment:java.compiler=NA
[main:Environment@100] - Server environment:os.name=Linux
[main:Environment@100] - Server environment:os.arch=amd64
[main:Environment@100] - Server environment:os.version=3.10.0-327.18.2.el7.x86_64
[main:Environment@100] - Server environment:user.name=root
[main:Environment@100] - Server environment:user.home=/root
[main:Environment@100] - Server environment:user.dir=/opt/zookeeper
[main:ZooKeeperServer@815] - tickTime set to 2000
[main:ZooKeeperServer@824] - minSessionTimeout set to -1
[main:ZooKeeperServer@833] - maxSessionTimeout set to -1
[main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
[main:FileSnap@83] - Reading snapshot /tmp/zookeeper/version-2/snapshot.363

Note: Now that we know this is working properly on single node, press Ctrl-C and come-out of this.

Repeat the above testing on node1 and node3 also to make sure zookeeper works on all the nodes in a single user standalone mode.

Possible Errors and Solutions during Zookeeper Startup

During the above standalone mode zookeeper startup testing, you might encounter these errors:

Error 1: You might get the following “java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory” error

Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
  at org.apache.zookeeper.server.quorum.QuorumPeerMain.clinit(QuorumPeerMain.java:64)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

Solution 1: The above error will happen when you don’t have slf4j-log4j12’s jar in the classpath. Include this jar file as shown below during startup.

java -cp zookeeper-3.4.9.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar:conf org.apache.zookeeper.server.quorum.QuorumPeerMain conf/zoo.cfg	

Error 2: You might get the following “ERROR [main:QuorumPeerMain@85] – Invalid config, exiting abnormally” error

[myid:] - INFO  [main:QuorumPeerConfig@124] - Reading configuration from: zoo.cfg
[myid:] - ERROR [main:QuorumPeerMain@85] - Invalid config, exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing zoo.cfg
  at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:144)
  at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:101)
  at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.lang.IllegalArgumentException: zoo.cfg1 file is missing
 at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:128)
 ... 2 more

Solution 2: The above error will happen when it can’t find the zookeepers configuration file zoo.cfg. Make sure you’ve mentioned “conf/zoo.cfg” in the command line path at the end of the command as shown below.

java -cp zookeeper-3.4.9.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar:conf \
 org.apache.zookeeper.server.quorum.QuorumPeerMain \
 conf/zoo.cfg	

Error 3: You might get the following could not find main class error message.

Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain

Solution 3: Make sure you are starting the zookeeper from the zookeeper home directory. For example, if you’ve installed zookeeper under /opt/zookeeper, start it as shown below:

export ZOOKEEPER_HOME=/opt/zookeeper
cd $ZOOKEEPER_HOME
java -cp zookeeper-3.4.9.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar:conf \
 org.apache.zookeeper.server.quorum.QuorumPeerMain \
 conf/zoo.cfg	

Setup Zookeeper Cluster: Modify zoo.cfg File

Append the following lines to your $ZOOKEEPER_HOME/conf/zoo.cfg file. These parameters are required for cluster setup.

initLimit=5
syncLimit=2
server.1=node1.thegeekstuff.com:2888:3888
server.2=node2.thegeekstuff.com:2888:3888
server.3=node3.thegeekstuff.com:2888:3888

In the above:

  • initLimit This is the timeout limit, which indicates the length of time for one of the zookeeper nodes in quorum have to connect to the leader.
  • syncLimit This specifies the limit on how much apart the individual nodes can be out-of-sync (i.e out-of-date) from the leader.
  • The above two init and sync limit are calculated using tickTime. By default tickTime is set to 2000 in the zoo.cfg. This means 2000 milliseconds. So, when we set initLimit as 5, multiply that by tickTime to calculate it in seconds. So, initLimit=5*2000=10000=10 seconds. syncLimit=2*2000=4000=4 seconds.
  • server.1, server.2 and server.3 will list all the three nodes. In this, instead of giving the full hostname, you can also specify the ip-address of the nodes.
  • Don’t change the “:2888:3888” that is at the end of the nodes. Zookeeper nodes will use these ports to connect the individual follower nodes to the leader nodes. The another port is used for leader election.

Also, in the zoo.cfg, by default, the dataDir will be pointing to /tmp/zookeeper directory. Change this to something else.

In zoo.cfg, set the dataDir to the following:

dataDir=/var/zookeeper

Make sure this directory is created.

mkdir /var/zookeeper

Note: Make the above zoo.cfg changes on all the nodes (i.e node1, node2 and node3)

Create Unique Zookeeper Id on Individual Nodes

On node1, create a unique zookeeper id and store it in the “myid” file that should be located under the directory that is specified by the “dataDir” in zoo.cfg.

On node1, the unique id will be “1”, which will be stored in the /var/zookeeper/myid file.

# echo "1" > /var/zookeeper/myid

# cat /var/zookeeper/myid 
1

On node2, the unique id will be “2”.

echo "2" > /var/zookeeper/myid

On node3, the unique id will be “3”.

echo "3" > /var/zookeeper/myid

Note: If you don’t set the myid properly, when you start the zookeper you’ll set the following “/var/zookeeper/myid file is missing” error message:

[myid:] - INFO  [main:QuorumPeerConfig@124] - Reading configuration from: conf/zoo.cfg
[myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: node1.thegeekstuff.com to address: /192.168.101.1
[myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: node2.thegeekstuff.com to address: /192.168.101.2
[myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: node3.thegeekstuff.com to address: /192.168.101.3
[myid:] - WARN  [main:QuorumPeerConfig@305] - No server failure will be tolerated. You need at least 3 servers.
[myid:] - INFO  [main:QuorumPeerConfig@352] - Defaulting to majority quorums
[myid:] - ERROR [main:QuorumPeerMain@85] - Invalid config, exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing conf/zoo.cfg
  at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:144)
  at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:101)
  at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.lang.IllegalArgumentException: /var/zookeeper/myid file is missing
  at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:362)
  at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:140)
  ... 2 more
Invalid config, exiting abnormally

Note: You’ll see “[myid:]” in the above without any id number in it. But, once you fix the problem, on node1, in the log files, you’ll see “[myid:1]”. On node2, you’ll see [myid:2], and node3 will display [myid:3]. This is a easy and quick way to identify which zookeeper node a log message is from.

Start the Zookeeper Cluster

Now, to start the cluster, start the zookeeper one-by-one on all the individual nodes, as shown below:

export ZOOKEEPER_HOME=/opt/zookeeper
cd $ZOOKEEPER_HOME
java -cp zookeeper-3.4.9.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar:conf \
 org.apache.zookeeper.server.quorum.QuorumPeerMain \
 conf/zoo.cfg

Note: The best thing you can do is to put the above lines inside zookeeper-start.sh and use nohup command to start it in the background as shown below:

nohup zookeeper-start.sh &

Note: To stop the zookeeper cluster, on all the individual nodes, use grep command to locate zookeeper process, and use kill command to terminate it.

On the node1, at this stage, you’ll start getting some error messages like these. You can ignore them for now.

We are getting this error because currently only node1 is up. Once node2 and node3 are up, we’ll not see this error message anymore.

[myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@149] - Resolved hostname: node1.thegeekstuff.com to address: /192.168.101.1
[myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 400
[myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@400] - Cannot open channel to 2 at election address /192.168.101.2:3888
[myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@400] - Cannot open channel to 3 at election address /192.168.101.3:3888
java.net.ConnectException: Connection refused
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
  at java.net.Socket.connect(Socket.java:589)
  at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:381)
  at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:426)
  at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
  at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:822)

Note: Just by looking at the above logs message lines, we know it is from node1, as it says “[myid:1]” in the front of each line.

After starting the zookeeper on node2 and node3, we’ll see the following on all the logs on the individual nodes, indicating that the zookeeper cluster is up and running.

In front of each of these lines, there will be a timestamp, followed by “[myid:1] – INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:”

QuorumPeer$QuorumServer@149] - Resolved hostname: node1.thegeekstuff.com to address: /192.168.101.1
FastLeaderElection@852] - Notification time out: 6400
[/192.168.101.1:3888:QuorumCnxManager$Listener@541] - Received connection request /192.168.101.2:56214
[WorkerReceiver[myid=1]:FastLeaderElection@600] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
[WorkerReceiver[myid=1]:FastLeaderElection@600] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
QuorumPeer@844] - FOLLOWING
Learner@86] - TCP NoDelay set to: true
Environment@100] - Server environment:zookeeper.version=3.4.9-1757313, built on 08/23/2016 06:50 GMT
Environment@100] - Server environment:host.name=node1.thegeekstuff.com
Environment@100] - Server environment:java.version=1.8.0_91
Environment@100] - Server environment:java.vendor=Oracle Corporation
Environment@100] - Server environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-0.b14.el7_2.x86_64/jre
Environment@100] - Server environment:java.class.path=zookeeper-3.4.9.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar:conf
Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
Environment@100] - Server environment:java.io.tmpdir=/tmp
Environment@100] - Server environment:java.compiler=NA
Environment@100] - Server environment:os.name=Linux
Environment@100] - Server environment:os.arch=amd64
Environment@100] - Server environment:os.version=3.10.0-327.18.2.el7.x86_64
Environment@100] - Server environment:user.name=root
Environment@100] - Server environment:user.home=/root
Environment@100] - Server environment:user.dir=/opt/zookeeper
ZooKeeperServer@173] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /var/zookeeper/version-2 snapdir /var/zookeeper/version-2
Follower@61] - FOLLOWING - LEADER ELECTION TOOK - 8869
QuorumPeer$QuorumServer@149] - Resolved hostname: node2.thegeekstuff.com to address: /192.168.101.2
QuorumPeer$QuorumServer@149] - Resolved hostname: node3.thegeekstuff.com to address: /192.168.101.3
Learner@326] - Getting a diff from the leader 0x0
FileTxnSnapLog@240] - Snapshotting: 0x0 to /var/zookeeper/version-2/snapshot.0
Add your comment

If you enjoyed this article, you might also like..

  1. 50 Linux Sysadmin Tutorials
  2. 50 Most Frequently Used Linux Commands (With Examples)
  3. Top 25 Best Linux Performance Monitoring and Debugging Tools
  4. Mommy, I found it! – 15 Practical Linux Find Command Examples
  5. Linux 101 Hacks 2nd Edition eBook Linux 101 Hacks Book

Bash 101 Hacks Book Sed and Awk 101 Hacks Book Nagios Core 3 Book Vim 101 Hacks Book

Comments on this entry are closed.

  • Sankalp November 4, 2016, 6:33 am

    Thanks for sharing, this perfectly worked.
    I follows zookeeper website to setup but key things that I missed and found your article helpful were :-
    1. Copy zoo.cfg in all the follower zookeeper instances
    2. Start zookeeper on all followers after leader
    3. Create myid file for leader zookeeper as well and do entry there.

  • Güvenlik Kamera Sistemleri November 15, 2016, 6:37 am

    Thanks for sharing your nice experiences. It worked.

  • Sankalp January 18, 2017, 10:28 pm

    This is indeed helpful but is there a way in zookeeper to dynamically add nodes as plug and play ?
    IN The current setup we need to have cluster info beforehand.