com.datasalt.pangool.tuplemr.avro
Class AvroOutputFormat<T>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<T>,org.apache.hadoop.io.NullWritable>
          extended by com.datasalt.pangool.tuplemr.avro.AvroOutputFormat<T>
All Implemented Interfaces:
Serializable

public class AvroOutputFormat<T>
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<T>,org.apache.hadoop.io.NullWritable>
implements Serializable

This is the Pangool's version of AvroOutputFormat. It implements the new Hadoop's api in package org.apache.hadoop.mapreduce.lib.output Any AvroOutputFormat instance is stateful and is not configured via Configuration. Instead, it uses Java-serialization to store its state in a Distributed Cache file.

See Also:
Serialized Form

Field Summary
static int DEFAULT_DEFLATE_LEVEL
          The default deflate level.
static String DEFLATE_LEVEL_KEY
          The configuration key for Avro deflate level.
static String EXT
          The file name extension for avro data files.
static String SYNC_INTERVAL_KEY
          The configuration key for Avro sync interval.
 
Constructor Summary
AvroOutputFormat(org.apache.avro.Schema schema)
           
AvroOutputFormat(org.apache.avro.Schema schema, String codecName)
           
AvroOutputFormat(org.apache.avro.Schema schema, String codecName, int deflateLevel)
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordWriter<org.apache.avro.mapred.AvroWrapper<T>,org.apache.hadoop.io.NullWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job)
           
 org.apache.avro.Schema getSchema()
           
static void setDeflateLevel(org.apache.hadoop.mapreduce.Job job, int level)
          Enable output compression using the deflate codec and specify its level.
static void setSyncInterval(org.apache.hadoop.mapreduce.Job job, int syncIntervalInBytes)
          Set the sync interval to be used by the underlying DataFileWriter.
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EXT

public static final String EXT
The file name extension for avro data files.

See Also:
Constant Field Values

DEFLATE_LEVEL_KEY

public static final String DEFLATE_LEVEL_KEY
The configuration key for Avro deflate level.

See Also:
Constant Field Values

SYNC_INTERVAL_KEY

public static final String SYNC_INTERVAL_KEY
The configuration key for Avro sync interval.

See Also:
Constant Field Values

DEFAULT_DEFLATE_LEVEL

public static final int DEFAULT_DEFLATE_LEVEL
The default deflate level.

See Also:
Constant Field Values
Constructor Detail

AvroOutputFormat

public AvroOutputFormat(org.apache.avro.Schema schema)

AvroOutputFormat

public AvroOutputFormat(org.apache.avro.Schema schema,
                        String codecName)

AvroOutputFormat

public AvroOutputFormat(org.apache.avro.Schema schema,
                        String codecName,
                        int deflateLevel)
Method Detail

setDeflateLevel

public static void setDeflateLevel(org.apache.hadoop.mapreduce.Job job,
                                   int level)
Enable output compression using the deflate codec and specify its level.


setSyncInterval

public static void setSyncInterval(org.apache.hadoop.mapreduce.Job job,
                                   int syncIntervalInBytes)
Set the sync interval to be used by the underlying DataFileWriter.


getSchema

public org.apache.avro.Schema getSchema()

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<org.apache.avro.mapred.AvroWrapper<T>,org.apache.hadoop.io.NullWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job)
                                                                                                                                  throws IOException,
                                                                                                                                         InterruptedException
Specified by:
getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<T>,org.apache.hadoop.io.NullWritable>
Throws:
IOException
InterruptedException


Copyright © –2014 Datasalt Systems S.L.. All rights reserved.