com.datasalt.pangool.tuplemr.mapred.lib.output
Class PangoolMultipleOutputs<KEYOUT,VALUEOUT>

java.lang.Object
  extended by com.datasalt.pangool.tuplemr.mapred.lib.output.PangoolMultipleOutputs<KEYOUT,VALUEOUT>

public class PangoolMultipleOutputs<KEYOUT,VALUEOUT>
extends Object

This class is inspired by the MultipleOutputs class of Hadoop. The difference is that it allows an arbitrary OutputFormat to be written in sub-folders of the output path.


Nested Class Summary
static class PangoolMultipleOutputs.InvalidNamedOutputException
          Exception that is thrown when someone tries to access an invalid named output.
 
Constructor Summary
PangoolMultipleOutputs(org.apache.hadoop.mapreduce.TaskInputOutputContext<?,?,KEYOUT,VALUEOUT> context)
          Creates and initializes multiple outputs support, it should be instantiated in the Mapper/Reducer setup method.
 
Method Summary
static String addNamedOutput(org.apache.hadoop.mapreduce.Job job, String namedOutput, org.apache.hadoop.mapreduce.OutputFormat outputFormat, Class<?> keyClass, Class<?> valueClass)
          Adds a named output for the job.
static void addNamedOutputContext(org.apache.hadoop.mapreduce.Job job, String namedOutput, String key, String value)
          Added this method for allowing specific (key, value) configurations for each Output.
 void close()
          Closes all the opened outputs.
static boolean getCountersEnabled(org.apache.hadoop.mapreduce.JobContext job)
          Returns if the counters for the named outputs are enabled or not.
 org.apache.hadoop.mapreduce.RecordWriter getRecordWriter(String baseFileName)
           
static void setCountersEnabled(org.apache.hadoop.mapreduce.Job job, boolean enabled)
          Enables or disables counters for the named outputs.
static String setDefaultNamedOutput(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.mapreduce.OutputFormat outputFormat, Class<?> keyClass, Class<?> valueClass)
          Adds a the specs of the default named output for the job (any named output which is not explicitly defined).
static void setSpecificNamedOutputContext(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.mapreduce.Job job, String namedOutput)
          Iterates over the Configuration and sets the specific context found for the namedOutput in the Job instance.
static void validateOutputName(String namedOutput)
          Convenience method for validating output names externally.Will throw InvalidArgumentException if parameter name is not a valid output name according to this implementation.
<K,V> void
write(String namedOutput, K key, V value)
          Write key and value to the namedOutput.
<K,V> void
write(String namedOutput, K key, V value, String baseOutputPath)
          Write key and value to baseOutputPath using the namedOutput.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PangoolMultipleOutputs

public PangoolMultipleOutputs(org.apache.hadoop.mapreduce.TaskInputOutputContext<?,?,KEYOUT,VALUEOUT> context)
Creates and initializes multiple outputs support, it should be instantiated in the Mapper/Reducer setup method.

Parameters:
context - the TaskInputOutputContext object
Method Detail

validateOutputName

public static void validateOutputName(String namedOutput)
Convenience method for validating output names externally.Will throw InvalidArgumentException if parameter name is not a valid output name according to this implementation.

Parameters:
namedOutput - the name to validate (e.g. "part" not allowed)

addNamedOutput

public static String addNamedOutput(org.apache.hadoop.mapreduce.Job job,
                                    String namedOutput,
                                    org.apache.hadoop.mapreduce.OutputFormat outputFormat,
                                    Class<?> keyClass,
                                    Class<?> valueClass)
                             throws FileNotFoundException,
                                    IOException,
                                    URISyntaxException
Adds a named output for the job. Returns the instance file that has been created.

Throws:
FileNotFoundException
IOException
URISyntaxException

setDefaultNamedOutput

public static String setDefaultNamedOutput(org.apache.hadoop.mapreduce.Job job,
                                           org.apache.hadoop.mapreduce.OutputFormat outputFormat,
                                           Class<?> keyClass,
                                           Class<?> valueClass)
                                    throws FileNotFoundException,
                                           IOException,
                                           URISyntaxException
Adds a the specs of the default named output for the job (any named output which is not explicitly defined). Returns the instance file that has been created.

Throws:
FileNotFoundException
IOException
URISyntaxException

addNamedOutputContext

public static void addNamedOutputContext(org.apache.hadoop.mapreduce.Job job,
                                         String namedOutput,
                                         String key,
                                         String value)
Added this method for allowing specific (key, value) configurations for each Output. Some Output Formats read specific configuration values and act based on them.

Parameters:
namedOutput -
key -
value -

setSpecificNamedOutputContext

public static void setSpecificNamedOutputContext(org.apache.hadoop.conf.Configuration conf,
                                                 org.apache.hadoop.mapreduce.Job job,
                                                 String namedOutput)
Iterates over the Configuration and sets the specific context found for the namedOutput in the Job instance. Package-access so it can be unit tested. The specific context is configured in method this. addNamedOutputContext(Job, String, String, String).

Parameters:
conf - The configuration that may contain specific context for the named output
job - The Job where we will set the specific context
namedOutput - The named output

setCountersEnabled

public static void setCountersEnabled(org.apache.hadoop.mapreduce.Job job,
                                      boolean enabled)
Enables or disables counters for the named outputs. The counters group is the PangoolMultipleOutputs class name. The names of the counters are the same as the named outputs. These counters count the number records written to each output name. By default these counters are disabled.

Parameters:
job - job to enable counters
enabled - indicates if the counters will be enabled or not.

getCountersEnabled

public static boolean getCountersEnabled(org.apache.hadoop.mapreduce.JobContext job)
Returns if the counters for the named outputs are enabled or not. By default these counters are disabled.

Parameters:
job - the job
Returns:
TRUE if the counters are enabled, FALSE if they are disabled.

write

public <K,V> void write(String namedOutput,
                        K key,
                        V value)
           throws IOException,
                  InterruptedException
Write key and value to the namedOutput. Output path is a unique file generated for the namedOutput. For example, {namedOutput}-(m|r)-{part-number}

Parameters:
namedOutput - the named output name
key - the key
value - the value
Throws:
IOException
InterruptedException

write

public <K,V> void write(String namedOutput,
                        K key,
                        V value,
                        String baseOutputPath)
           throws IOException,
                  InterruptedException
Write key and value to baseOutputPath using the namedOutput.

Parameters:
namedOutput - the named output name
key - the key
value - the value
baseOutputPath - base-output path to write the record to. Note: Framework will generate unique filename for the baseOutputPath
Throws:
IOException
InterruptedException

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter getRecordWriter(String baseFileName)
                                                         throws IOException,
                                                                InterruptedException
Throws:
IOException
InterruptedException

close

public void close()
           throws IOException,
                  InterruptedException
Closes all the opened outputs. This should be called from cleanup method of map/reduce task. If overridden subclasses must invoke super.close() at the end of their close()

Throws:
IOException
InterruptedException


Copyright © –2014 Datasalt Systems S.L.. All rights reserved.