com.datasalt.pangool.solr
Class SolrRecordWriter

java.lang.Object
  extended by org.apache.hadoop.mapreduce.RecordWriter<ITuple,org.apache.hadoop.io.NullWritable>
      extended by com.datasalt.pangool.solr.SolrRecordWriter

public class SolrRecordWriter
extends org.apache.hadoop.mapreduce.RecordWriter<ITuple,org.apache.hadoop.io.NullWritable>

Instantiate a record writer that will build a Solr index. A zip file containing the solr config and additional libraries is expected to be passed via the distributed cache. The incoming written records are converted via the specified document converter, and written to the index in batches. When the job is done, the close copies the index to the destination output file system.

This class has been copied from SOLR-1301 patch although it might be slightly different from it.


Field Summary
static List<String> allowedConfigDirectories
           
static Set<String> requiredConfigDirectories
           
 
Constructor Summary
SolrRecordWriter(int batchSize, boolean outputZipFile, int threadCount, int queueSize, String localSolrHome, String zipName, TupleDocumentConverter converter, org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
 
Method Summary
static void addReducerContext(org.apache.hadoop.mapreduce.Reducer.Context context)
           
 void close(org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
static List<String> getAllowedConfigDirectories()
          Return the list of directories names that may be included in the configuration data passed to the tasks.
static void incrementCounter(org.apache.hadoop.mapreduce.TaskID taskId, String groupName, String counterName, long incr)
           
protected  boolean isClosing()
           
static boolean isRequiredConfigDirectory(String directory)
          check if the passed in directory is required to be present in the configuration data set.
static void process(org.apache.log4j.Logger log, String level)
           
static void process(Logger log, String level)
           
protected  void setClosing(boolean closing)
           
 void write(ITuple key, org.apache.hadoop.io.NullWritable value)
          Write a record.
static int zipDirectory(org.apache.hadoop.conf.Configuration conf, ZipOutputStream zos, String baseName, String root, org.apache.hadoop.fs.Path itemToZip)
          Write a file to a zip output stream, removing leading path name components from the actual file name when creating the zip file entry.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

allowedConfigDirectories

public static final List<String> allowedConfigDirectories

requiredConfigDirectories

public static final Set<String> requiredConfigDirectories
Constructor Detail

SolrRecordWriter

public SolrRecordWriter(int batchSize,
                        boolean outputZipFile,
                        int threadCount,
                        int queueSize,
                        String localSolrHome,
                        String zipName,
                        TupleDocumentConverter converter,
                        org.apache.hadoop.mapreduce.TaskAttemptContext context)
Method Detail

getAllowedConfigDirectories

public static List<String> getAllowedConfigDirectories()
Return the list of directories names that may be included in the configuration data passed to the tasks.

Returns:
an UnmodifiableList of directory names

isRequiredConfigDirectory

public static boolean isRequiredConfigDirectory(String directory)
check if the passed in directory is required to be present in the configuration data set.

Parameters:
directory - The directory to check
Returns:
true if the directory is required.

isClosing

protected boolean isClosing()

setClosing

protected void setClosing(boolean closing)

incrementCounter

public static void incrementCounter(org.apache.hadoop.mapreduce.TaskID taskId,
                                    String groupName,
                                    String counterName,
                                    long incr)

addReducerContext

public static void addReducerContext(org.apache.hadoop.mapreduce.Reducer.Context context)

write

public void write(ITuple key,
                  org.apache.hadoop.io.NullWritable value)
           throws IOException
Write a record. This method accumulates records in to a batch, and when batchSize items are present flushes it to the indexer. The writes can take a substantial amount of time, depending on batchSize. If there is heavy disk contention the writes may take more than the 600 second default timeout.

Specified by:
write in class org.apache.hadoop.mapreduce.RecordWriter<ITuple,org.apache.hadoop.io.NullWritable>
Throws:
IOException

close

public void close(org.apache.hadoop.mapreduce.TaskAttemptContext context)
           throws IOException,
                  InterruptedException
Specified by:
close in class org.apache.hadoop.mapreduce.RecordWriter<ITuple,org.apache.hadoop.io.NullWritable>
Throws:
IOException
InterruptedException

zipDirectory

public static int zipDirectory(org.apache.hadoop.conf.Configuration conf,
                               ZipOutputStream zos,
                               String baseName,
                               String root,
                               org.apache.hadoop.fs.Path itemToZip)
                        throws IOException
Write a file to a zip output stream, removing leading path name components from the actual file name when creating the zip file entry. The entry placed in the zip file is baseName/ relativePath, where relativePath is constructed by removing a leading root from the path for itemToZip. If itemToZip is an empty directory, it is ignored. If itemToZip is a directory, the contents of the directory are added recursively.

Parameters:
zos - The zip output stream
baseName - The base name to use for the file name entry in the zip file
root - The path to remove from itemToZip to make a relative path name
itemToZip - The path to the file to be added to the zip file
Returns:
the number of entries added
Throws:
IOException

process

public static void process(org.apache.log4j.Logger log,
                           String level)

process

public static void process(Logger log,
                           String level)


Copyright © –2014 Datasalt Systems S.L.. All rights reserved.