Apache Hadoop Single Node Standalone Installation Tutorial

by Ramesh Natarajan on February 1, 2012

When you implement Apache Hadoop in production environment, you’ll need multiple server nodes. If you are just exploring the distributed computing, you might want to play around with Hadoop by installing it on a single node.

This article explains how to setup and configure a single node standalone Hadoop environment. Please note that you can also simulate a multi node Hadoop installation on a single server using pseudo distributed hadoop installation, which we’ll be covering in detail in the next article of this series. (next article: How To Install Apache Hadoop Pseudo Distributed Mode on a Single Node)

The standlone hadoop environment is a good place to start to make sure your server environment is setup properly with all the pre-req to run Hadoop.

If you are new to Hadoop, read our Apache Hadoop Fundamentals – HDFS and MapReduce article.

1. Create a Hadoop User

You can download and install hadoop on root. But, it is recommended to install it as a separate user. So, login to root and create a user called hadoop.

# adduser hadoop
# passwd hadoop

2. Download Hadoop Common

Download the Apache Hadoop Common and move it to the server where you want to install it.

You can also use wget to download it directly to your server using wget.

# su - hadoop
$ wget http://mirror.nyi.net/apache//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gz

Make sure Java 1.6 is installed on your system.

$ java -version
java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.7) (rhel-1.39.1.9.7.el6-x86_64)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

3. Unpack under hadoop User

As hadoop user, unpack this package.

$ tar xvfz hadoop-0.20.203.0rc1.tar.gz

This will create the “hadoop-0.20.204.0” directory.

$ ls -l hadoop-0.20.204.0
total 6780
drwxr-xr-x.  2 hadoop hadoop    4096 Oct 12 08:50 bin
-rw-rw-r--.  1 hadoop hadoop  110797 Aug 25 16:28 build.xml
drwxr-xr-x.  4 hadoop hadoop    4096 Aug 25 16:38 c++
-rw-rw-r--.  1 hadoop hadoop  419532 Aug 25 16:28 CHANGES.txt
drwxr-xr-x.  2 hadoop hadoop    4096 Nov  2 05:29 conf
drwxr-xr-x. 14 hadoop hadoop    4096 Aug 25 16:28 contrib
drwxr-xr-x.  7 hadoop hadoop    4096 Oct 12 08:49 docs
drwxr-xr-x.  3 hadoop hadoop    4096 Aug 25 16:29 etc

Modify the hadoop-0.20.204.0/conf/hadoop-env.sh file and make sure JAVA_HOME environment variable is pointing to the correct location of the java that is installed on your system.

$ grep JAVA ~/hadoop-0.20.204.0/conf/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_27

4. Test Sample Hadoop Program

In a single node standalone application, you don’t need to start any hadoop background process. Instead just call the ~/hadoop-0.20.203.0/bin/hadoop, which will execute hadoop as a single java process for your testing purpose.

This example program is provided as part of the hadoop, and it is shown in the hadoop document as an simple example to see whether this setup work.

First, create a input directory, where all the input files will be stored. This might be your location where all the incoming data files will be stored in the hadoop environment.

$ cd ~/hadoop-0.20.204.0
$ mkdir input

For testing purpose, add some sample data files to the input directory. Let us just copy all the xml file from the conf directory to the input directory. So, these xml file will be considered as the data file for the example program.

$ cp conf/*.xml input

Execute the sample hadoop test program. This is a simple hadoop program that simulates a grep. This searches for the reg-ex pattern “dfs[a-z.]+” in all the input/*.xml file and stores the output in the output directory.

$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

When everything is setup properly, the above sample hadoop test program will display the following messages on the screen when it is executing it.

$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
12/01/14 23:38:46 INFO mapred.FileInputFormat: Total input paths to process : 6
12/01/14 23:38:46 INFO mapred.JobClient: Running job: job_local_0001
12/01/14 23:38:46 INFO mapred.MapTask: numReduceTasks: 1
12/01/14 23:38:46 INFO mapred.MapTask: io.sort.mb = 100
12/01/14 23:38:46 INFO mapred.MapTask: data buffer = 79691776/99614720
12/01/14 23:38:46 INFO mapred.MapTask: record buffer = 262144/327680
12/01/14 23:38:46 INFO mapred.MapTask: Starting flush of map output
12/01/14 23:38:46 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/01/14 23:38:47 INFO mapred.JobClient:  map 0% reduce 0%
...

This will create the output directory with the results as shown below.

$ ls -l output
total 4
-rwxrwxrwx. 1 root root 11 Aug 23 08:39 part-00000
-rwxrwxrwx. 1 root root  0 Aug 23 08:39 _SUCCESS

$ cat output/*
1       dfsadmin

The source code of the example programs are located under src/examples/org/apache/hadoop/examples directory.

$ ls -l ~/hadoop-0.20.204.0/src/examples/org/apache/hadoop/examples
-rw-rw-r--. 1 hadoop hadoop  2395 Jan 14 23:28 WordCount.java
-rw-rw-r--. 1 hadoop hadoop  8040 Jan 14 23:28 Sort.java
-rw-rw-r--. 1 hadoop hadoop  9156 Jan 14 23:28 SleepJob.java
-rw-rw-r--. 1 hadoop hadoop  7809 Jan 14 23:28 SecondarySort.java
-rw-rw-r--. 1 hadoop hadoop 10190 Jan 14 23:28 RandomWriter.java
-rw-rw-r--. 1 hadoop hadoop 40350 Jan 14 23:28 RandomTextWriter.java
-rw-rw-r--. 1 hadoop hadoop 11914 Jan 14 23:28 PiEstimator.java
-rw-rw-r--. 1 hadoop hadoop   853 Jan 14 23:28 package.html
-rw-rw-r--. 1 hadoop hadoop  8276 Jan 14 23:28 MultiFileWordCount.java
-rw-rw-r--. 1 hadoop hadoop  6582 Jan 14 23:28 Join.java
-rw-rw-r--. 1 hadoop hadoop  3334 Jan 14 23:28 Grep.java
-rw-rw-r--. 1 hadoop hadoop  3751 Jan 14 23:28 ExampleDriver.java
-rw-rw-r--. 1 hadoop hadoop 13089 Jan 14 23:28 DBCountPageView.java
-rw-rw-r--. 1 hadoop hadoop  2879 Jan 14 23:28 AggregateWordHistogram.java
-rw-rw-r--. 1 hadoop hadoop  2797 Jan 14 23:28 AggregateWordCount.java
drwxr-xr-x. 2 hadoop hadoop  4096 Jan 14 08:49 dancing
drwxr-xr-x. 2 hadoop hadoop  4096 JAn 14 08:49 terasort

5. Troubleshooting Issues

Issue: “Temporary failure in name resolution”

While executing the sample hadoop program, you might get the following error message.

12/01/14 23:34:57 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-root/mapred/staging/root-1040516815/.staging/job_local_0001
java.net.UnknownHostException: hadoop: hadoop: Temporary failure in name resolution
        at java.net.InetAddress.getLocalHost(InetAddress.java:1438)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:815)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
        at java.security.AccessController.doPrivileged(Native Method)

Solution: Add the following entry to the /etc/hosts file that contains the ip-address, FQDN fully qualified domain name, and host name.

192.168.1.10 hadoop.thegeekstuff.com hadoop

In the next article of this series, we’ll discuss in detail about how to simulate a multi node hadoop installation on a single server.

Add your comment

If you enjoyed this article, you might also like..

Comments on this entry are closed.

kashyap February 1, 2012, 6:02 am

when i type java -version below is the output
java version “1.6.0_20”
OpenJDK Runtime Environment (IcedTea6 1.9.10) (rhel-1.23.1.9.10.el5_7-i386)
OpenJDK Client VM (build 19.0-b09, mixed mode)

how should I install OpenJDK Server VM..??

∞
Unam February 1, 2012, 7:39 am

I’m a fresher(6months) working as a perl developer. I wanted to ask a question.
How is Bigdata (Hadoop) as a carrer option to guys like me who are in early stage of their carrers compared to the other established langusges like C, C++ and perl.
Advance thanx for the answers

∞
kd February 10, 2012, 12:02 am

bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
Error: JAVA_HOME is not set.

please help me i m facing this error……….

∞
msp April 8, 2012, 3:46 am

In UBUNTU:
$whereis java
java: /usr/bin/java /etc/java /usr/share/java /usr/share/man/man1/java.1.gz

set the JAVA_HOME variable to “/usr”
no need to add bin/java to environment variable

∞
rejina June 11, 2012, 11:57 pm

performed all the steps till step 4… in test run of sample program we got following :

[root@station1 hadoop-0.20.203.0]# bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z]+’
12/06/12 10:51:50 INFO mapred.FileInputFormat: Total input paths to process : 6
12/06/12 10:51:51 INFO mapred.JobClient: Running job: job_local_0001
12/06/12 10:51:51 INFO mapred.MapTask: numReduceTasks: 1
12/06/12 10:51:51 INFO mapred.MapTask: io.sort.mb = 100
12/06/12 10:51:51 WARN mapred.LocalJobRunner: job_local_0001

java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:948)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:427)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
12/06/12 10:51:52 INFO mapred.JobClient: map 0% reduce 0%
12/06/12 10:51:52 INFO mapred.JobClient: Job complete: job_local_0001
12/06/12 10:51:52 INFO mapred.JobClient: Counters: 0
12/06/12 10:51:52 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1204)
at org.apache.hadoop.examples.Grep.run(Grep.java:69)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.Grep.main(Grep.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

and after this output directory was also not created…
[root@station1 hadoop-0.20.203.0]# ls -l output
ls: output: No such file or directory

∞
BobbyD_FL June 13, 2012, 2:32 pm

@rejina,

For anyone using RPM or DEB packages, the documentation and common advice is misleading. These packages install hadoop configuration files into /etc/hadoop. These will take priority over other settings.

The /etc/hadoop/hadoop-env.sh sets the maximum java heap memory, by Defailt it is:
export HADOOP_CLIENT_OPTS=”-Xmx128m $HADOOP_CLIENT_OPTS”
This Xmx setting is too low, simply change it to this and rerun:
export HADOOP_CLIENT_OPTS=”-Xmx512m $HADOOP_CLIENT_OPTS”

http://stackoverflow.com/questions/8464048/out-of-memory-error-in-hadoop

∞
xiaomai November 2, 2012, 2:44 am

Hello!When I run the programs of “bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’ “.It cause the problem:
Exception in thread “main” java.io.IOException: Error opening job jar: hadoop-examples-*.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:130)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:131)
at java.util.jar.JarFile.(JarFile.java:150)
at java.util.jar.JarFile.(JarFile.java:87)
at org.apache.hadoop.util.RunJar.main(RunJar.java:128)

I do not know why .Please help me!
Thank you very much!

∞
Deepak January 29, 2013, 5:53 am

hi when i executed the below command am getting the following msg :

hadoop@ubuntu:~/Hadoop/hadoop-0.20.204.0$ jar hadoop-examples-0.20.204.0.jar grep input output ‘dfs[a-z.]+’
The program ‘jar’ can be found in the following packages:
* openjdk-6-jdk
* fastjar
* gcj-4.4-jdk
* gcj-4.3
Ask your administrator to install one of them

but i got java installed on my ubuntu machine
hadoop@ubuntu:~/Hadoop/hadoop-0.20.204.0$ java -version
java version “1.6.0_20”
OpenJDK Runtime Environment (IcedTea6 1.9.10) (6b20-1.9.10-0ubuntu1~10.04.3)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

and i have set java path as below in the file hadoop-env.sh

hadoop@ubuntu:~/Hadoop/hadoop-0.20.204.0/conf$ grep ‘JAVA’ hadoop-env.sh

# The only required environment variable is JAVA_HOME. All others are
# set JAVA_HOME in this file, so that it is correctly defined on
export JAVA_HOME=/usr/java/jdk1.6.0_27

plz help me on this

∞
Deepak January 30, 2013, 3:33 am

the above issues was , jar command was not installed , after installation i was able to run the command successfully. below is my output

hadoop@ubuntu:~/Hadoop/hadoop-0.20.204.0$ bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
13/01/30 01:28:28 INFO mapred.FileInputFormat: Total input paths to process : 6
13/01/30 01:28:29 INFO mapred.JobClient: Running job: job_local_0001
13/01/30 01:28:30 INFO mapred.MapTask: numReduceTasks: 1
13/01/30 01:28:30 INFO mapred.MapTask: io.sort.mb = 100
13/01/30 01:28:30 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/30 01:28:30 INFO mapred.MapTask: record buffer = 262144/327680
13/01/30 01:28:30 INFO mapred.MapTask: Starting flush of map output
13/01/30 01:28:30 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/01/30 01:28:30 INFO mapred.JobClient: map 0% reduce 0%
13/01/30 01:28:33 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/input/capacity-scheduler.xml:0+7457
13/01/30 01:28:33 INFO mapred.Task: Task ‘attempt_local_0001_m_000000_0’ done.
13/01/30 01:28:33 INFO mapred.MapTask: numReduceTasks: 1
13/01/30 01:28:33 INFO mapred.MapTask: io.sort.mb = 100
13/01/30 01:28:33 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/30 01:28:33 INFO mapred.MapTask: record buffer = 262144/327680
13/01/30 01:28:33 INFO mapred.MapTask: Starting flush of map output
13/01/30 01:28:33 INFO mapred.MapTask: Finished spill 0
13/01/30 01:28:33 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
13/01/30 01:28:33 INFO mapred.JobClient: map 100% reduce 0%
13/01/30 01:28:36 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/input/hadoop-policy.xml:0+4644
13/01/30 01:28:36 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/input/hadoop-policy.xml:0+4644
13/01/30 01:28:36 INFO mapred.Task: Task ‘attempt_local_0001_m_000001_0’ done.
13/01/30 01:28:36 INFO mapred.MapTask: numReduceTasks: 1
13/01/30 01:28:36 INFO mapred.MapTask: io.sort.mb = 100
13/01/30 01:28:36 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/30 01:28:36 INFO mapred.MapTask: record buffer = 262144/327680
13/01/30 01:28:36 INFO mapred.MapTask: Starting flush of map output
13/01/30 01:28:36 INFO mapred.Task: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
13/01/30 01:28:39 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/input/mapred-queue-acls.xml:0+2033
13/01/30 01:28:39 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/input/mapred-queue-acls.xml:0+2033
13/01/30 01:28:39 INFO mapred.Task: Task ‘attempt_local_0001_m_000002_0’ done.
13/01/30 01:28:39 INFO mapred.MapTask: numReduceTasks: 1
13/01/30 01:28:39 INFO mapred.MapTask: io.sort.mb = 100
13/01/30 01:28:39 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/30 01:28:39 INFO mapred.MapTask: record buffer = 262144/327680
13/01/30 01:28:39 INFO mapred.MapTask: Starting flush of map output
13/01/30 01:28:39 INFO mapred.Task: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting
13/01/30 01:28:42 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/input/hdfs-site.xml:0+178
13/01/30 01:28:42 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/input/hdfs-site.xml:0+178
13/01/30 01:28:42 INFO mapred.Task: Task ‘attempt_local_0001_m_000003_0’ done.
13/01/30 01:28:42 INFO mapred.MapTask: numReduceTasks: 1
13/01/30 01:28:42 INFO mapred.MapTask: io.sort.mb = 100
13/01/30 01:28:42 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/30 01:28:42 INFO mapred.MapTask: record buffer = 262144/327680
13/01/30 01:28:42 INFO mapred.MapTask: Starting flush of map output
13/01/30 01:28:42 INFO mapred.Task: Task:attempt_local_0001_m_000004_0 is done. And is in the process of commiting
13/01/30 01:28:45 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/input/mapred-site.xml:0+178
13/01/30 01:28:45 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/input/mapred-site.xml:0+178
13/01/30 01:28:45 INFO mapred.Task: Task ‘attempt_local_0001_m_000004_0’ done.
13/01/30 01:28:45 INFO mapred.MapTask: numReduceTasks: 1
13/01/30 01:28:45 INFO mapred.MapTask: io.sort.mb = 100
13/01/30 01:28:45 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/30 01:28:45 INFO mapred.MapTask: record buffer = 262144/327680
13/01/30 01:28:45 INFO mapred.MapTask: Starting flush of map output
13/01/30 01:28:45 INFO mapred.Task: Task:attempt_local_0001_m_000005_0 is done. And is in the process of commiting
13/01/30 01:28:48 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/input/core-site.xml:0+178
13/01/30 01:28:48 INFO mapred.Task: Task ‘attempt_local_0001_m_000005_0’ done.
13/01/30 01:28:48 INFO mapred.LocalJobRunner:
13/01/30 01:28:48 INFO mapred.Merger: Merging 6 sorted segments
13/01/30 01:28:48 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 21 bytes
13/01/30 01:28:48 INFO mapred.LocalJobRunner:
13/01/30 01:28:48 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
13/01/30 01:28:48 INFO mapred.LocalJobRunner:
13/01/30 01:28:48 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
13/01/30 01:28:48 INFO mapred.FileOutputCommitter: Saved output of task ‘attempt_local_0001_r_000000_0’ to file:/home/hadoop/Hadoop/hadoop-0.20.204.0/grep-temp-1606033621
13/01/30 01:28:51 INFO mapred.LocalJobRunner: reduce > reduce
13/01/30 01:28:51 INFO mapred.Task: Task ‘attempt_local_0001_r_000000_0’ done.
13/01/30 01:28:51 INFO mapred.JobClient: map 100% reduce 100%
13/01/30 01:28:51 INFO mapred.JobClient: Job complete: job_local_0001
13/01/30 01:28:51 INFO mapred.JobClient: Counters: 17
13/01/30 01:28:51 INFO mapred.JobClient: File Input Format Counters
13/01/30 01:28:51 INFO mapred.JobClient: Bytes Read=14668
13/01/30 01:28:51 INFO mapred.JobClient: File Output Format Counters
13/01/30 01:28:51 INFO mapred.JobClient: Bytes Written=123
13/01/30 01:28:51 INFO mapred.JobClient: FileSystemCounters
13/01/30 01:28:51 INFO mapred.JobClient: FILE_BYTES_READ=1108807
13/01/30 01:28:51 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1233130
13/01/30 01:28:51 INFO mapred.JobClient: Map-Reduce Framework
13/01/30 01:28:51 INFO mapred.JobClient: Map output materialized bytes=55
13/01/30 01:28:51 INFO mapred.JobClient: Map input records=357
13/01/30 01:28:51 INFO mapred.JobClient: Reduce shuffle bytes=0
13/01/30 01:28:51 INFO mapred.JobClient: Spilled Records=2
13/01/30 01:28:51 INFO mapred.JobClient: Map output bytes=17
13/01/30 01:28:51 INFO mapred.JobClient: Map input bytes=14668
13/01/30 01:28:51 INFO mapred.JobClient: SPLIT_RAW_BYTES=713
13/01/30 01:28:51 INFO mapred.JobClient: Combine input records=1
13/01/30 01:28:51 INFO mapred.JobClient: Reduce input records=1
13/01/30 01:28:51 INFO mapred.JobClient: Reduce input groups=1
13/01/30 01:28:51 INFO mapred.JobClient: Combine output records=1
13/01/30 01:28:51 INFO mapred.JobClient: Reduce output records=1
13/01/30 01:28:51 INFO mapred.JobClient: Map output records=1
13/01/30 01:28:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/01/30 01:28:52 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/30 01:28:52 INFO mapred.JobClient: Running job: job_local_0002
13/01/30 01:28:52 INFO mapred.MapTask: numReduceTasks: 1
13/01/30 01:28:52 INFO mapred.MapTask: io.sort.mb = 100
13/01/30 01:28:52 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/30 01:28:52 INFO mapred.MapTask: record buffer = 262144/327680
13/01/30 01:28:52 INFO mapred.MapTask: Starting flush of map output
13/01/30 01:28:52 INFO mapred.MapTask: Finished spill 0
13/01/30 01:28:52 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
13/01/30 01:28:53 INFO mapred.JobClient: map 0% reduce 0%
13/01/30 01:28:55 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/grep-temp-1606033621/part-00000:0+111
13/01/30 01:28:55 INFO mapred.LocalJobRunner: file:/home/hadoop/Hadoop/hadoop-0.20.204.0/grep-temp-1606033621/part-00000:0+111
13/01/30 01:28:55 INFO mapred.Task: Task ‘attempt_local_0002_m_000000_0’ done.
13/01/30 01:28:55 INFO mapred.LocalJobRunner:
13/01/30 01:28:55 INFO mapred.Merger: Merging 1 sorted segments
13/01/30 01:28:55 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 21 bytes
13/01/30 01:28:55 INFO mapred.LocalJobRunner:
13/01/30 01:28:55 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
13/01/30 01:28:55 INFO mapred.LocalJobRunner:
13/01/30 01:28:55 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now
13/01/30 01:28:55 INFO mapred.FileOutputCommitter: Saved output of task ‘attempt_local_0002_r_000000_0’ to file:/home/hadoop/Hadoop/hadoop-0.20.204.0/output
13/01/30 01:28:56 INFO mapred.JobClient: map 100% reduce 0%
13/01/30 01:28:58 INFO mapred.LocalJobRunner: reduce > reduce
13/01/30 01:28:58 INFO mapred.Task: Task ‘attempt_local_0002_r_000000_0’ done.
13/01/30 01:28:59 INFO mapred.JobClient: map 100% reduce 100%
13/01/30 01:28:59 INFO mapred.JobClient: Job complete: job_local_0002
13/01/30 01:28:59 INFO mapred.JobClient: Counters: 17
13/01/30 01:28:59 INFO mapred.JobClient: File Input Format Counters
13/01/30 01:28:59 INFO mapred.JobClient: Bytes Read=123
13/01/30 01:28:59 INFO mapred.JobClient: File Output Format Counters
13/01/30 01:28:59 INFO mapred.JobClient: Bytes Written=23
13/01/30 01:28:59 INFO mapred.JobClient: FileSystemCounters
13/01/30 01:28:59 INFO mapred.JobClient: FILE_BYTES_READ=607981
13/01/30 01:28:59 INFO mapred.JobClient: FILE_BYTES_WRITTEN=701613
13/01/30 01:28:59 INFO mapred.JobClient: Map-Reduce Framework
13/01/30 01:28:59 INFO mapred.JobClient: Map output materialized bytes=25
13/01/30 01:28:59 INFO mapred.JobClient: Map input records=1
13/01/30 01:28:59 INFO mapred.JobClient: Reduce shuffle bytes=0
13/01/30 01:28:59 INFO mapred.JobClient: Spilled Records=2
13/01/30 01:28:59 INFO mapred.JobClient: Map output bytes=17
13/01/30 01:28:59 INFO mapred.JobClient: Map input bytes=25
13/01/30 01:28:59 INFO mapred.JobClient: SPLIT_RAW_BYTES=127
13/01/30 01:28:59 INFO mapred.JobClient: Combine input records=0
13/01/30 01:28:59 INFO mapred.JobClient: Reduce input records=1
13/01/30 01:28:59 INFO mapred.JobClient: Reduce input groups=1
13/01/30 01:28:59 INFO mapred.JobClient: Combine output records=0
13/01/30 01:28:59 INFO mapred.JobClient: Reduce output records=1
13/01/30 01:28:59 INFO mapred.JobClient: Map output records=1

∞
Rudra March 9, 2013, 4:38 pm

In your explanation above, you mentioned that to look out for source-codes; we shall look inside the following path:-
“$ ls -l ~/hadoop-0.20.204.0/src/examples/org/apache/hadoop/examples”

I downloaded and installed the stable release “hadoop-1.0.4-bin.tar.gz” from one of mirror-URLs available on the apache.org website. I downloaded the release from here.

Now, all the installation is complete, but, I don’t find anything beyond:-
~/hadoop-0.20.204.0/src/

There is no examples directory after this, only “contrib” exists. So, my question is how can I get the JAVA source-codes and where should I look them for. Is there somewhere over Internet that I can find a bunch of these source-codes.

∞
EVA May 29, 2013, 9:34 pm

:~/hadoop-1.0.4$ bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
Warning: $HADOOP_HOME is deprecated.

i am getting this whilen running can any one help me??

∞
Debashis Mandal August 21, 2013, 12:56 am

Hello, I have installed Hadoop 1.1.2 on Ubuntu 12.04 LTS.
It was running OK few days back.
But now since few days when i start my hadoop single node cluster, using “start-all.sh” I get the meseages like “Starting namenode, starting datanode, starting tasktracker….” etc.
But on running “jps” command I am returned only the ID with jps and not the namenode or tasktracker or jobtracker.
“4801 jps”.
But previously I used to get the IDs of the naenode as well as tasktracker and the jobtracker.

Any solution to my problem??

∞
Naveen December 12, 2013, 11:47 am

When I run the jar i am getting this error

Exception in thread “main” java.io.IOException: Error opening job jar: hadoop-examples-*.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:130)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:131)
at java.util.jar.JarFile.(JarFile.java:150)
at java.util.jar.JarFile.(JarFile.java:87)
at org.apache.hadoop.util.RunJar.main(RunJar.java:128)

Pls help

∞
Manoj March 12, 2015, 3:21 am

Hi Team,
Hope you are doing well, I am getting this ERROR while running this command bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
First ERROR Showing:
java.io.IOException: The temporary job-output directory file:/usr/local/hadoop/grep-temp-573322172/_temporary doesn’t exist!
at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
at org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:44)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.(ReduceTask.java:449)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:491)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)

Second ERROR showing :
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.hadoop.examples.Grep.run(Grep.java:69)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.Grep.main(Grep.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

∞
Swami January 4, 2016, 6:51 am

Is there a way to change single node hadoop installation to distributed one? Pl. let me know!

∞

Next post: How to Install Oracle VM VirtualBox and Create a Virtual Machine

Previous post: 5 Useful Perl 5.10 Features – Say, State, ~~, Defined OR