Hadoop 研究: 3月 2011

2011年3月30日星期三

Map & Reduce class variable

經過測試 Map 和 Reduce 的 Class variable 只會維持在同一個 Job 中

一出 Job 後Map 和 Reduce的Class variable 就不會再存在

For example:

我們寫了一個Reduce class

public class TEST_Reduce extends Reducer{

private double[][] C = null;

public void reduce(Text key, Iterable values, Context context)

throws IOException, InterruptedException{

if (C == null){

C = new double[r_int][];

context.write(new Text("Read C"),new Text("1"));

}

這個程式在同一個Job中不管Reduce收到多少keys，都只會output一次(Read C,1)

但是若重覆跑 N 次 Job 就會輸出 N 次(Read C,1)

大家不妨可以試試

2011年3月29日星期二

How to use MapReduce DistributedCache v0.20.2

首先要先

import org.apache.hadoop.filecache.* ==> 給 DistributedCache用

import java.net.RUI

import org.apache.hadoop.fs.Path

前面已經提到 MapReduce v0.20.2 必需將 Configuration 和 Job分開。

而 DistributedCache 的用法也變成

1. 在main中先告訴Configuration 哪個File要被當成DistributedCache

DistributedCache.addCacheFile(new URI("$(path in HFS)")
,$(configuration object));

For example:

Configuration conf = new Configuration();

DistributedCache.addCacheFile(new URI("/myDis/test.dat"),conf);

2. 而在 map 或 reduce 中

可以用 DistributedCache.getLocalCacheFiles(context.getConfiguration());

取得CacheFile的Path Array;

For example:

Path[] localFiles;

localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());

接下來就是Java的正常開檔了，提供一個用法

FileReader fr = new FileReader(localFiles[0].toString());

BufferedReader br = nre bufferedReader(fr);

String fileline = br.readLine(); //Read first line in the file

如果我們把localFile的Path印出來，我們不難發現，Hadoop 會幫我們把檔案偷偷放在我們在core-site.xml (詳細請參考Hadoop建置文件) 裡設定的hadoop.tmp.dir下的

mapred/local/taskTracker/archive/${ServerName}/${path in HFS}

Example Code:

http://cmlab.csie.ntu.edu.tw/~wfuny/MapReduce/myDistributedCache.java

P.S. You have to put the file "test.dat" under the directory "/myDis" first in the HFS before running the program.

2011年3月21日星期一

New MapReduce API Slide

http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api

How to compile MapReduce file and run it

以下以hadoop-0.20.2為例
Compile:
javac -classpath /{yourPathToHadoop}/hadoop-0.20.2-core.jar yourfile -d {outputDir}
For example:
javac -classpath /usr/local/hadoop/hadoop-0.20.2-core.jar myWordCount.java -d bin/

把bin內的檔案包成 jar檔
jar -cvf {OutputJarName} -C bin/ .
For example:
jar -cvf output.jar -C bin/ .

放到 hadoop上跑
hadoop jar {yourJarName} {MainClassName}
For example:
hadoop jar output.jar myWordCount input output
P.S. 後面的input output為myWordCount程式的參數

2011年3月20日星期日

MapReduce 0.20.2 passing parameters to Mapper

MapReduce passing parameter

MapReduce 在0.20.2把原來的 JobConf改掉，變成要先用Configuration 設好Config再傳入Job中。

For example:

Configuration conf = new Configuration();

Job job = new Job(conf, "wordcount");

因此0.20.2的傳參數方法和之前用JobConf的也不一樣

傳的方法變成

在Main function中用conf.set("key","value")來傳

For example:

Configuration conf = new Configuration();

conf.set("round","1000");

Job job = new Job(conf, "wordcount");

而在Mapper中只要用context.getConfiguration().get("key") 就可以取出相對應的值了~

For example:

in map(...) function

{

String round = context.getConfiguration().get("round");

...

}

參考Source Code:

http://www.cmlab.csie.ntu.edu.tw/~wfuny/MapReduce/myWordCount.java

2011年3月30日 星期三

Map & Reduce class variable

2011年3月29日 星期二

How to use MapReduce DistributedCache v0.20.2

2011年3月21日 星期一

New MapReduce API Slide

How to compile MapReduce file and run it

2011年3月20日 星期日

MapReduce 0.20.2 passing parameters to Mapper

2011年3月30日星期三

2011年3月29日星期二

2011年3月21日星期一

2011年3月20日星期日