Apache Hadoop Single Node Standalone Installation Tutorial

by Ramesh Natarajan on February 1, 2012

When you implement Apache Hadoop in production environment, you’ll need multiple server nodes. If you are just exploring the distributed computing, you might want to play around with Hadoop by installing it on a single node.

This article explains how to setup and configure a single node standalone Hadoop environment. Please note that you can also simulate a multi node Hadoop installation on a single server using pseudo distributed hadoop installation, which we’ll be covering in detail in the next article of this series. (next article: How To Install Apache Hadoop Pseudo Distributed Mode on a Single Node)

The standlone hadoop environment is a good place to start to make sure your server environment is setup properly with all the pre-req to run Hadoop.

If you are new to Hadoop, read our Apache Hadoop Fundamentals – HDFS and MapReduce article.

1. Create a Hadoop User

You can download and install hadoop on root. But, it is recommended to install it as a separate user. So, login to root and create a user called hadoop.

# adduser hadoop
# passwd hadoop

2. Download Hadoop Common

Download the Apache Hadoop Common  and move it to the server where you want to install it.

You can also use wget to download it directly to your server using wget.

# su - hadoop
$ wget http://mirror.nyi.net/apache//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gz

Make sure Java 1.6 is installed on your system.

$ java -version
java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.7) (rhel-1.39.1.9.7.el6-x86_64)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

3. Unpack under hadoop User

As hadoop user, unpack this package.

$ tar xvfz hadoop-0.20.203.0rc1.tar.gz

This will create the “hadoop-0.20.204.0″ directory.

$ ls -l hadoop-0.20.204.0
total 6780
drwxr-xr-x.  2 hadoop hadoop    4096 Oct 12 08:50 bin
-rw-rw-r--.  1 hadoop hadoop  110797 Aug 25 16:28 build.xml
drwxr-xr-x.  4 hadoop hadoop    4096 Aug 25 16:38 c++
-rw-rw-r--.  1 hadoop hadoop  419532 Aug 25 16:28 CHANGES.txt
drwxr-xr-x.  2 hadoop hadoop    4096 Nov  2 05:29 conf
drwxr-xr-x. 14 hadoop hadoop    4096 Aug 25 16:28 contrib
drwxr-xr-x.  7 hadoop hadoop    4096 Oct 12 08:49 docs
drwxr-xr-x.  3 hadoop hadoop    4096 Aug 25 16:29 etc

Modify the hadoop-0.20.204.0/conf/hadoop-env.sh file and make sure JAVA_HOME environment variable is pointing to the correct location of the java that is installed on your system.

$ grep JAVA ~/hadoop-0.20.204.0/conf/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_27

4. Test Sample Hadoop Program

In a single node standalone application, you don’t need to start any hadoop background process. Instead just call the ~/hadoop-0.20.203.0/bin/hadoop, which will execute hadoop as a single java process for your testing purpose.

This example program is provided as part of the hadoop, and it is shown in the hadoop document as an simple example to see whether this setup work.

First, create a input directory, where all the input files will be stored. This might be your location where all the incoming data files will be stored in the hadoop environment.

$ cd ~/hadoop-0.20.204.0
$ mkdir input

For testing purpose, add some sample data files to the input directory. Let us just copy all the xml file from the conf directory to the input directory. So, these xml file will be considered as the data file for the example program.

$ cp conf/*.xml input

Execute the sample hadoop test program. This is a simple hadoop program that simulates a grep. This searches for the reg-ex pattern “dfs[a-z.]+” in all the input/*.xml file and stores the output in the output directory.

$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

When everything is setup properly, the above sample hadoop test program will display the following messages on the screen when it is executing it.

$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
12/01/14 23:38:46 INFO mapred.FileInputFormat: Total input paths to process : 6
12/01/14 23:38:46 INFO mapred.JobClient: Running job: job_local_0001
12/01/14 23:38:46 INFO mapred.MapTask: numReduceTasks: 1
12/01/14 23:38:46 INFO mapred.MapTask: io.sort.mb = 100
12/01/14 23:38:46 INFO mapred.MapTask: data buffer = 79691776/99614720
12/01/14 23:38:46 INFO mapred.MapTask: record buffer = 262144/327680
12/01/14 23:38:46 INFO mapred.MapTask: Starting flush of map output
12/01/14 23:38:46 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/01/14 23:38:47 INFO mapred.JobClient:  map 0% reduce 0%
...

This will create the output directory with the results as shown below.

$ ls -l output
total 4
-rwxrwxrwx. 1 root root 11 Aug 23 08:39 part-00000
-rwxrwxrwx. 1 root root  0 Aug 23 08:39 _SUCCESS

$ cat output/*
1       dfsadmin

The source code of the example programs are located under src/examples/org/apache/hadoop/examples directory.

$ ls -l ~/hadoop-0.20.204.0/src/examples/org/apache/hadoop/examples
-rw-rw-r--. 1 hadoop hadoop  2395 Jan 14 23:28 WordCount.java
-rw-rw-r--. 1 hadoop hadoop  8040 Jan 14 23:28 Sort.java
-rw-rw-r--. 1 hadoop hadoop  9156 Jan 14 23:28 SleepJob.java
-rw-rw-r--. 1 hadoop hadoop  7809 Jan 14 23:28 SecondarySort.java
-rw-rw-r--. 1 hadoop hadoop 10190 Jan 14 23:28 RandomWriter.java
-rw-rw-r--. 1 hadoop hadoop 40350 Jan 14 23:28 RandomTextWriter.java
-rw-rw-r--. 1 hadoop hadoop 11914 Jan 14 23:28 PiEstimator.java
-rw-rw-r--. 1 hadoop hadoop   853 Jan 14 23:28 package.html
-rw-rw-r--. 1 hadoop hadoop  8276 Jan 14 23:28 MultiFileWordCount.java
-rw-rw-r--. 1 hadoop hadoop  6582 Jan 14 23:28 Join.java
-rw-rw-r--. 1 hadoop hadoop  3334 Jan 14 23:28 Grep.java
-rw-rw-r--. 1 hadoop hadoop  3751 Jan 14 23:28 ExampleDriver.java
-rw-rw-r--. 1 hadoop hadoop 13089 Jan 14 23:28 DBCountPageView.java
-rw-rw-r--. 1 hadoop hadoop  2879 Jan 14 23:28 AggregateWordHistogram.java
-rw-rw-r--. 1 hadoop hadoop  2797 Jan 14 23:28 AggregateWordCount.java
drwxr-xr-x. 2 hadoop hadoop  4096 Jan 14 08:49 dancing
drwxr-xr-x. 2 hadoop hadoop  4096 JAn 14 08:49 terasort

5. Troubleshooting Issues

Issue: “Temporary failure in name resolution”

While executing the sample hadoop program, you might get the following error message.

12/01/14 23:34:57 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-root/mapred/staging/root-1040516815/.staging/job_local_0001
java.net.UnknownHostException: hadoop: hadoop: Temporary failure in name resolution
        at java.net.InetAddress.getLocalHost(InetAddress.java:1438)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:815)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
        at java.security.AccessController.doPrivileged(Native Method)

Solution: Add the following entry to the /etc/hosts file that contains the ip-address, FQDN fully qualified domain name, and host name.

192.168.1.10 hadoop.thegeekstuff.com hadoop

In the next article of this series, we’ll discuss in detail about how to simulate a multi node hadoop installation on a single server.

Share

If you enjoyed this article, you might also like..

  1. 50 Linux Sysadmin Tutorials
  2. 50 Most Frequently Used Linux Commands (With Examples)
  3. Top 25 Best Linux Performance Monitoring and Debugging Tools
  4. Mommy, I found it! – 15 Practical Linux Find Command Examples
  5. Linux 101 Hacks 2nd Edition eBook Linux 101 Hacks Book

Bash 101 Hacks Book Sed and Awk 101 Hacks Book Nagios Core 3 Book Vim 101 Hacks Book

{ 4 comments… read them below or add one }

1 kashyap February 1, 2012 at 6:02 am

when i type java -version below is the output
java version “1.6.0_20″
OpenJDK Runtime Environment (IcedTea6 1.9.10) (rhel-1.23.1.9.10.el5_7-i386)
OpenJDK Client VM (build 19.0-b09, mixed mode)

how should I install OpenJDK Server VM..??

2 Unam February 1, 2012 at 7:39 am

I’m a fresher(6months) working as a perl developer. I wanted to ask a question.
How is Bigdata (Hadoop) as a carrer option to guys like me who are in early stage of their carrers compared to the other established langusges like C, C++ and perl.
Advance thanx for the answers

3 kd February 10, 2012 at 12:02 am

bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
Error: JAVA_HOME is not set.

please help me i m facing this error……….

4 msp April 8, 2012 at 3:46 am

In UBUNTU:
$whereis java
java: /usr/bin/java /etc/java /usr/share/java /usr/share/man/man1/java.1.gz

set the JAVA_HOME variable to “/usr”
no need to add bin/java to environment variable

Leave a Comment

Previous post:

Next post: