com.datasalt.pangool.tuplemr.mapred.lib.input
Class HCatTupleInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<ITuple,org.apache.hadoop.io.NullWritable>
      extended by com.datasalt.pangool.tuplemr.mapred.lib.input.HCatTupleInputFormat
All Implemented Interfaces:
Serializable

public class HCatTupleInputFormat
extends org.apache.hadoop.mapreduce.InputFormat<ITuple,org.apache.hadoop.io.NullWritable>
implements Serializable

A bridge between HCatalog and Pangool that makes any HCatInputFormat compatible with Pangool. It delegates to HCatInputFormat and returns a Pangool Tuple mapped from an HCatRecord.

The type mapping is (HCatalog - Pangool):

See: http://incubator.apache.org/hcatalog/docs/r0.4.0/inputoutput.html

See Also:
Serialized Form

Constructor Summary
HCatTupleInputFormat(String dbName, String tableName, org.apache.hadoop.conf.Configuration conf)
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<ITuple,org.apache.hadoop.io.NullWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
           
 Schema getPangoolSchema()
           
 org.apache.hcatalog.data.schema.HCatSchema getSchema()
           
 List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobcontext)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HCatTupleInputFormat

public HCatTupleInputFormat(String dbName,
                            String tableName,
                            org.apache.hadoop.conf.Configuration conf)
                     throws IOException
Throws:
IOException
Method Detail

getSchema

public org.apache.hcatalog.data.schema.HCatSchema getSchema()

getPangoolSchema

public Schema getPangoolSchema()

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<ITuple,org.apache.hadoop.io.NullWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                             org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
                                                                                                      throws IOException,
                                                                                                             InterruptedException
Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<ITuple,org.apache.hadoop.io.NullWritable>
Throws:
IOException
InterruptedException

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobcontext)
                                                       throws IOException,
                                                              InterruptedException
Specified by:
getSplits in class org.apache.hadoop.mapreduce.InputFormat<ITuple,org.apache.hadoop.io.NullWritable>
Throws:
IOException
InterruptedException


Copyright © –2014 Datasalt Systems S.L.. All rights reserved.