Hadoop 研究: How to use MapReduce DistributedCache v0.20.2

首先要先

import org.apache.hadoop.filecache.* ==> 給 DistributedCache用

import java.net.RUI

import org.apache.hadoop.fs.Path

前面已經提到 MapReduce v0.20.2 必需將 Configuration 和 Job分開。

而 DistributedCache 的用法也變成

1. 在main中先告訴Configuration 哪個File要被當成DistributedCache

DistributedCache.addCacheFile(new URI("$(path in HFS)")
,$(configuration object));

For example:

Configuration conf = new Configuration();

DistributedCache.addCacheFile(new URI("/myDis/test.dat"),conf);

2. 而在 map 或 reduce 中

可以用 DistributedCache.getLocalCacheFiles(context.getConfiguration());

取得CacheFile的Path Array;

For example:

Path[] localFiles;

localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());

接下來就是Java的正常開檔了，提供一個用法

FileReader fr = new FileReader(localFiles[0].toString());

BufferedReader br = nre bufferedReader(fr);

String fileline = br.readLine(); //Read first line in the file

如果我們把localFile的Path印出來，我們不難發現，Hadoop 會幫我們把檔案偷偷放在我們在core-site.xml (詳細請參考Hadoop建置文件) 裡設定的hadoop.tmp.dir下的

mapred/local/taskTracker/archive/${ServerName}/${path in HFS}

Example Code:

P.S. You have to put the file "test.dat" under the directory "/myDis" first in the HFS before running the program.

Hadoop 研究