2011年5月8日 星期日

Coverting Text file to Binary Sequence File

1.In the Main Class
job.setOutputKeyClass(BytesWritable.class);
job.setOutputValueClass(BytesWritable.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(SequenceFileAsBinaryOutputFormat.class);

job.setNumReduceTasks(0); //don't use reduce class

2.In the Map class
public class Map extends
Mapper< LongWritable, Text, BytesWritable, BytesWritable>
{
private BytesWritable one = new BytesWritable("1".getBytes());
private BytesWritable val = new BytesWritable();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
{
String s = value.toString();
val.set(s.getBytes(),0,s.length());
context.write(one,val);

}
}

Using SequenceFileAsBinaryInputFormat & OutputFormat v0.21.0

1.First you have to make sure your input files are binary sequence files.
To more detail, please see
http://www.hadoop.tw/2008/12/hadoop-uncompressed-sequencefi.html
If you don't know how to Covert Text file to Binary sequence file, please see
http://kuanyuhadoop.blogspot.com/2011/05/coverting-text-file-to-binary-sequence.html

2.In the Main Class, you have to set the following things:
job.setOutputKeyClass(BytesWritable.class)
job.setOutputValueClass(BytesWritable.class)

job.setInputFormatClass(SequenceFileAsBinaryInputFormat.class)
job.setOutputFormatClass(SequenceFileAsBinaryOutputFormat.class)

3.In the Map Class
public class Map extends Mapper
{
public void map(BytesWritable key, BytesWritable value, Context context)
{
...
}
}

Reduce class is the same as Map Class...