2011年4月24日 星期日

How to use Hadoop TwoDArrayWritable datatype

以下例子假設我們要建立一個 double的array

首先 我們要建立一個新的 class 名稱可自訂
(如果沒建新的class 在Reducer就會出現 NoSuchMethodException)

假設我們的Class 名稱為DoubleTwoDArrayWritable

我們要Create一個DoubleTwoDArrayWritable.java file
裡面寫
import org.apache.hadoop.io.*;
public class DoubleTwoDArrayWritable extends TwoDArrayWritable
{
public DoubleTwoDArrayWritable()
{
super(DoubleWritable.class);
}
}

這樣我們就建立了一個新的datatype了

之後設定 job.setMapOutputValueClass(DoubleTwoDArrayWritable.class);
(若要設為Key的話 還要implement必較方法 目前似乎沒支援)

在Mapper中,要給值還要過一個Writable[][]

private Text one = new Text("1");
private DoubleTwoDArrayWritable WTW = new DoubleTwoDArrayWritable();
private DoubleWritable [][] Result = null;
if (Result == null)
{
Result = new DoubleWritable[r_int][];
for (i=0;i < r_int ;i++)
{
Result[i] = new DoubleWritable[r_int];
for (j=0;j
<r_int;j++)
Result[i][j] = new DoubleWritable();
}

for ( i=0; i
<r_int ; i++ )
for ( j=i ; j
<r_int ; j++ )
{
tmpR = W[i]*W[j];
Result[i][j].set(tmpR);
Result[j][i].set(tmpR);
}
WTW.set(Result);
context.write(one,WTW);






而在Reducer中,也要過一個Writable[][] 來接值 @@"

private Writable [][] getArray = null;
for (DoubleTwoDArrayWritable value : values)
{
getArray = value.get();
for ( i=0; i<r_int ; i++ )
for ( j=0 ; j<r_int ; j++ )
C[i][j] = C[i][j] + ((DoubleWritable)getArray[i][j]).get();

}

2 則留言:

  1. Could you please update the post with an example.what is your tmpR and W?

    回覆刪除
  2. Hi, thanks for interested in this article. Basically, tmpR is just a temp double value, and W is a vector with r_int double values.
    You can use any value you want.
    This article is just told you a way to pass 2D double matrix from Mapper to Reducer without converting to string. We want to do so because converting to string might cause some precision error, and needs more network bandwidth.

    回覆刪除