Friday, February 24, 2012

Hadoop 1.0 Quick Start Update

Purpose
There seems to be quite a few changes in Hadoop 1.0, which was not reflected in Hadoop's official setup guide. This document attempts to supplement Hadoop's setup guide, with updates for version 1.0.


Please follow the official Hadoop setup guide first and then check the specific sections for 1.0 update.


Standalone Operations
In Hadoop 1.0, all configuration files have been moved to etc/hadoop directory.


The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory. 


$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/
hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 

$ cat output/*

Execution
Start the hadoop daemons:
$ sbin/start-all.sh


Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put etc/hadoop/ input
Run some of the examples provided:
$ bin/hadoop jar share/hadoop/hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
Should see something like this:

12/02/23 16:48:04 INFO mapred.FileInputFormat: Total input paths to process : 16
12/02/23 16:48:05 INFO mapred.JobClient: Running job: job_201202231031_0001
12/02/23 16:48:06 INFO mapred.JobClient:  map 0% reduce 0%
12/02/23 16:48:19 INFO mapred.JobClient:  map 12% reduce 0%
12/02/23 16:48:28 INFO mapred.JobClient:  map 25% reduce 0%
12/02/23 16:48:34 INFO mapred.JobClient:  map 25% reduce 4%
12/02/23 16:48:37 INFO mapred.JobClient:  map 37% reduce 4%
12/02/23 16:48:40 INFO mapred.JobClient:  map 37% reduce 8%
12/02/23 16:48:43 INFO mapred.JobClient:  map 50% reduce 8%
12/02/23 16:48:50 INFO mapred.JobClient:  map 56% reduce 12%
12/02/23 16:48:53 INFO mapred.JobClient:  map 62% reduce 12%
12/02/23 16:48:56 INFO mapred.JobClient:  map 68% reduce 18%
12/02/23 16:48:58 INFO mapred.JobClient:  map 75% reduce 22%
12/02/23 16:49:01 INFO mapred.JobClient:  map 81% reduce 22%
12/02/23 16:49:04 INFO mapred.JobClient:  map 87% reduce 22%
12/02/23 16:49:07 INFO mapred.JobClient:  map 93% reduce 27%
12/02/23 16:49:10 INFO mapred.JobClient:  map 100% reduce 27%
12/02/23 16:49:13 INFO mapred.JobClient:  map 100% reduce 29%
12/02/23 16:49:20 INFO mapred.JobClient:  map 100% reduce 100%
12/02/23 16:49:25 INFO mapred.JobClient: Job complete: job_201202231031_0001
12/02/23 16:49:25 INFO mapred.JobClient: Counters: 30
12/02/23 16:49:25 INFO mapred.JobClient:   Job Counters 
12/02/23 16:49:25 INFO mapred.JobClient:     Launched reduce tasks=1
12/02/23 16:49:25 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=99019
12/02/23 16:49:25 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/02/23 16:49:25 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/02/23 16:49:25 INFO mapred.JobClient:     Launched map tasks=16
12/02/23 16:49:25 INFO mapred.JobClient:     Data-local map tasks=16
12/02/23 16:49:25 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=60471
12/02/23 16:49:25 INFO mapred.JobClient:   File Input Format Counters 
12/02/23 16:49:25 INFO mapred.JobClient:     Bytes Read=26852
12/02/23 16:49:25 INFO mapred.JobClient:   File Output Format Counters 
12/02/23 16:49:25 INFO mapred.JobClient:     Bytes Written=180
12/02/23 16:49:25 INFO mapred.JobClient:   FileSystemCounters
12/02/23 16:49:25 INFO mapred.JobClient:     FILE_BYTES_READ=82
12/02/23 16:49:25 INFO mapred.JobClient:     HDFS_BYTES_READ=28574
12/02/23 16:49:25 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=367327
12/02/23 16:49:25 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=180
12/02/23 16:49:25 INFO mapred.JobClient:   Map-Reduce Framework
12/02/23 16:49:25 INFO mapred.JobClient:     Map output materialized bytes=172
12/02/23 16:49:25 INFO mapred.JobClient:     Map input records=758
12/02/23 16:49:25 INFO mapred.JobClient:     Reduce shuffle bytes=166
12/02/23 16:49:25 INFO mapred.JobClient:     Spilled Records=6
12/02/23 16:49:25 INFO mapred.JobClient:     Map output bytes=70
12/02/23 16:49:25 INFO mapred.JobClient:     Total committed heap usage (bytes)=2596864000
12/02/23 16:49:25 INFO mapred.JobClient:     CPU time spent (ms)=12500
12/02/23 16:49:25 INFO mapred.JobClient:     Map input bytes=26852
12/02/23 16:49:25 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1722
12/02/23 16:49:25 INFO mapred.JobClient:     Combine input records=3
12/02/23 16:49:25 INFO mapred.JobClient:     Reduce input records=3
12/02/23 16:49:25 INFO mapred.JobClient:     Reduce input groups=3
12/02/23 16:49:25 INFO mapred.JobClient:     Combine output records=3
12/02/23 16:49:25 INFO mapred.JobClient:     Physical memory (bytes) snapshot=2788790272
12/02/23 16:49:25 INFO mapred.JobClient:     Reduce output records=3
12/02/23 16:49:25 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=9619705856
12/02/23 16:49:25 INFO mapred.JobClient:     Map output records=3
12/02/23 16:49:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/02/23 16:49:25 INFO mapred.FileInputFormat: Total input paths to process : 1
12/02/23 16:49:25 INFO mapred.JobClient: Running job: job_201202231031_0002
12/02/23 16:49:26 INFO mapred.JobClient:  map 0% reduce 0%
12/02/23 16:49:41 INFO mapred.JobClient:  map 100% reduce 0%
12/02/23 16:49:53 INFO mapred.JobClient:  map 100% reduce 100%
12/02/23 16:49:58 INFO mapred.JobClient: Job complete: job_201202231031_0002
12/02/23 16:49:58 INFO mapred.JobClient: Counters: 30
12/02/23 16:49:58 INFO mapred.JobClient:   Job Counters 
12/02/23 16:49:58 INFO mapred.JobClient:     Launched reduce tasks=1
12/02/23 16:49:58 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=13874
12/02/23 16:49:58 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/02/23 16:49:58 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/02/23 16:49:58 INFO mapred.JobClient:     Launched map tasks=1
12/02/23 16:49:58 INFO mapred.JobClient:     Data-local map tasks=1
12/02/23 16:49:58 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10575
12/02/23 16:49:58 INFO mapred.JobClient:   File Input Format Counters 
12/02/23 16:49:58 INFO mapred.JobClient:     Bytes Read=180
12/02/23 16:49:58 INFO mapred.JobClient:   File Output Format Counters 
12/02/23 16:49:58 INFO mapred.JobClient:     Bytes Written=52
12/02/23 16:49:58 INFO mapred.JobClient:   FileSystemCounters
12/02/23 16:49:58 INFO mapred.JobClient:     FILE_BYTES_READ=82
12/02/23 16:49:58 INFO mapred.JobClient:     HDFS_BYTES_READ=296
12/02/23 16:49:58 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=42387
12/02/23 16:49:58 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=52
12/02/23 16:49:58 INFO mapred.JobClient:   Map-Reduce Framework
12/02/23 16:49:58 INFO mapred.JobClient:     Map output materialized bytes=82
12/02/23 16:49:58 INFO mapred.JobClient:     Map input records=3
12/02/23 16:49:58 INFO mapred.JobClient:     Reduce shuffle bytes=82
12/02/23 16:49:58 INFO mapred.JobClient:     Spilled Records=6
12/02/23 16:49:58 INFO mapred.JobClient:     Map output bytes=70
12/02/23 16:49:58 INFO mapred.JobClient:     Total committed heap usage (bytes)=220725248
12/02/23 16:49:58 INFO mapred.JobClient:     CPU time spent (ms)=1990
12/02/23 16:49:58 INFO mapred.JobClient:     Map input bytes=94
12/02/23 16:49:58 INFO mapred.JobClient:     SPLIT_RAW_BYTES=116
12/02/23 16:49:58 INFO mapred.JobClient:     Combine input records=0
12/02/23 16:49:58 INFO mapred.JobClient:     Reduce input records=3
12/02/23 16:49:58 INFO mapred.JobClient:     Reduce input groups=1
12/02/23 16:49:58 INFO mapred.JobClient:     Combine output records=0
12/02/23 16:49:58 INFO mapred.JobClient:     Physical memory (bytes) snapshot=235311104
12/02/23 16:49:58 INFO mapred.JobClient:     Reduce output records=3
12/02/23 16:49:58 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1171832832
12/02/23 16:49:58 INFO mapred.JobClient:     Map output records=3



Examine output.

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
or
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
When you're done, stop the daemons with:
$ sbin/stop-all.sh



20 comments:

  1. Hadoop is creating more opportunities to every one. And thanks for sharing best information about hadoop in this Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!
    Hadoop Training in hyderabad

    ReplyDelete
  2. Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing. VMWare Training in chennai | VMWare Training chennai | VMWare course in chennai | VMWare course chennai

    ReplyDelete
  3. very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing. AWS course chennai | AWS certification in chennai | AWS cerfication chennai

    ReplyDelete
  4. Nice article i was really impressed by seeing this article, it was very interesting and it is very useful for me.. cloud computing training in chennai | cloud computing training chennai | cloud computing course in chennai | cloud computing course chennai

    ReplyDelete
  5. Truely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)

    Software testing training in chennai | Software testing course in chennai | Testing training in chennai

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. The clients and customers can see the articles through Customer or Partner people group. Inward clients anyway will have direct admittance to the articles in Salesforce. what is the best institute for Salesforce course in Noida?

    ReplyDelete
  8. virtual event When asked about the effect the pandemic has had on their tech proficiency, 55.4 percent of planners said they were more proficient than they were prior to the pandemic and 24 percent said they were much more so. virtual gifts, quick icebreakers for virtual meetings and free event ticket software

    ReplyDelete
  9. Nice Article!

    Thanks for sharing with us 🙂

    React Training in Hyderabad

    ReplyDelete