com.datasalt.pangool.tuplemr.mapred.lib.output
Class TupleTextOutputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ITuple,org.apache.hadoop.io.NullWritable>
          extended by com.datasalt.pangool.tuplemr.mapred.lib.output.TupleTextOutputFormat
All Implemented Interfaces:
Serializable

public class TupleTextOutputFormat
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ITuple,org.apache.hadoop.io.NullWritable>
implements Serializable

A special output format that supports converting a ITuple into text. It supports CSV-like semantics such as separator character, quote character and escape character. It uses Open CSV underneath (http://opencsv.sourceforge.net/).

See Also:
Serialized Form

Nested Class Summary
static class TupleTextOutputFormat.CustomCSVWriter
          We had to almost re-implement CSVWriter for properly supporting null strings.
static class TupleTextOutputFormat.TupleTextRecordWriter
           
 
Field Summary
static char NO_ESCAPE_CHARACTER
           
static char NO_QUOTE_CHARACTER
           
 
Constructor Summary
TupleTextOutputFormat(Schema schema, boolean addHeader, char separatorCharacter, char quoteCharacter, char escapeCharacter)
           
TupleTextOutputFormat(Schema schema, boolean addHeader, char separatorCharacter, char quoteCharacter, char escapeCharacter, String nullString)
          You must specify the Schema that will be used for Tuples being written and the CSV semantics (if any).
 
Method Summary
 org.apache.hadoop.mapreduce.RecordWriter<ITuple,org.apache.hadoop.io.NullWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NO_QUOTE_CHARACTER

public static final char NO_QUOTE_CHARACTER
See Also:
Constant Field Values

NO_ESCAPE_CHARACTER

public static final char NO_ESCAPE_CHARACTER
See Also:
Constant Field Values
Constructor Detail

TupleTextOutputFormat

public TupleTextOutputFormat(Schema schema,
                             boolean addHeader,
                             char separatorCharacter,
                             char quoteCharacter,
                             char escapeCharacter)

TupleTextOutputFormat

public TupleTextOutputFormat(Schema schema,
                             boolean addHeader,
                             char separatorCharacter,
                             char quoteCharacter,
                             char escapeCharacter,
                             String nullString)
You must specify the Schema that will be used for Tuples being written and the CSV semantics (if any). Use NO_ESCAPE_CHARACTER and NO_QUOTE_CHARACTER if you don't want to add CSV semantics to the output. If addHeader is true, the name of the Fields in the Schema will be used to add a header to the file.

Use "nullString" to replace nulls with some string.

Method Detail

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<ITuple,org.apache.hadoop.io.NullWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                                   throws IOException,
                                                                                                          InterruptedException
Specified by:
getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ITuple,org.apache.hadoop.io.NullWritable>
Throws:
IOException
InterruptedException


Copyright © –2014 Datasalt Systems S.L.. All rights reserved.