com.datasalt.pangool.solr
Class TupleSolrOutputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ITuple,org.apache.hadoop.io.NullWritable>
          extended by com.datasalt.pangool.solr.TupleSolrOutputFormat
All Implemented Interfaces:
Serializable

public class TupleSolrOutputFormat
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ITuple,org.apache.hadoop.io.NullWritable>
implements Serializable

Instantiable OutputFormat that can be used in Pangool for indexing ITuple in SOLR. It behaves similar to SOLR-1301's SolrOutputFormat with the difference that configuration is passed via instance (constructor params). This allows us to easily have multiple TupleSolrOutputFormat in the same Pangool Job. Also, it is much easier to configure: it just needs to be instantiated (no need to call multiple static methods to configure it). Everything will be configured underneath.

Things that can be configured via constructor:

For a usage example see test class TupleSolrOutputFormatExample.

See Also:
Serialized Form

Field Summary
static String ZIP_FILE_BASE_NAME
          The base name of the zip file containing the configuration information.
 
Constructor Summary
TupleSolrOutputFormat(File solrHome, org.apache.hadoop.conf.Configuration hadoopConf)
           
TupleSolrOutputFormat(File solrHome, org.apache.hadoop.conf.Configuration hadoopConf, TupleDocumentConverter converter)
           
TupleSolrOutputFormat(File solrHome, org.apache.hadoop.conf.Configuration hadoopConf, TupleDocumentConverter converter, boolean outputZipFile, int batchSize, int threadCount, int queueSize)
           
 
Method Summary
 void checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext job)
           
 org.apache.hadoop.mapreduce.RecordWriter<ITuple,org.apache.hadoop.io.NullWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ZIP_FILE_BASE_NAME

public static final String ZIP_FILE_BASE_NAME
The base name of the zip file containing the configuration information. This file is passed via the distributed cache using a unique name, obtained via #getZipName(Configuration jobConf).

See Also:
Constant Field Values
Constructor Detail

TupleSolrOutputFormat

public TupleSolrOutputFormat(File solrHome,
                             org.apache.hadoop.conf.Configuration hadoopConf)
                      throws IOException
Throws:
IOException

TupleSolrOutputFormat

public TupleSolrOutputFormat(File solrHome,
                             org.apache.hadoop.conf.Configuration hadoopConf,
                             TupleDocumentConverter converter)
                      throws IOException
Throws:
IOException

TupleSolrOutputFormat

public TupleSolrOutputFormat(File solrHome,
                             org.apache.hadoop.conf.Configuration hadoopConf,
                             TupleDocumentConverter converter,
                             boolean outputZipFile,
                             int batchSize,
                             int threadCount,
                             int queueSize)
                      throws IOException
Throws:
IOException
Method Detail

checkOutputSpecs

public void checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext job)
                      throws IOException
Overrides:
checkOutputSpecs in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ITuple,org.apache.hadoop.io.NullWritable>
Throws:
IOException

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<ITuple,org.apache.hadoop.io.NullWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                                   throws IOException,
                                                                                                          InterruptedException
Specified by:
getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ITuple,org.apache.hadoop.io.NullWritable>
Throws:
IOException
InterruptedException


Copyright © –2014 Datasalt Systems S.L.. All rights reserved.