Hadoop 研究: 5月 2011

2011年5月8日星期日

1.In the Main Class

job.setOutputKeyClass(BytesWritable.class);

job.setOutputValueClass(BytesWritable.class);

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(SequenceFileAsBinaryOutputFormat.class);

job.setNumReduceTasks(0); //don't use reduce class

2.In the Map class

public class Map extends

Mapper< LongWritable, Text, BytesWritable, BytesWritable>

{

private BytesWritable one = new BytesWritable("1".getBytes());

private BytesWritable val = new BytesWritable();

public void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException

{

String s = value.toString();

val.set(s.getBytes(),0,s.length());

context.write(one,val);

}

1.First you have to make sure your input files are binary sequence files.

To more detail, please see

http://www.hadoop.tw/2008/12/hadoop-uncompressed-sequencefi.html

If you don't know how to Covert Text file to Binary sequence file, please see

http://kuanyuhadoop.blogspot.com/2011/05/coverting-text-file-to-binary-sequence.html

2.In the Main Class, you have to set the following things:

job.setOutputKeyClass(BytesWritable.class)

job.setOutputValueClass(BytesWritable.class)

job.setInputFormatClass(SequenceFileAsBinaryInputFormat.class)

job.setOutputFormatClass(SequenceFileAsBinaryOutputFormat.class)

3.In the Map Class

public class Map extends Mapper

{

public void map(BytesWritable key, BytesWritable value, Context context)

{

...

}

Reduce class is the same as Map Class...

2011年5月8日 星期日