com.datasalt.pangool.tuplemr
Class TupleMRConfigBuilder

java.lang.Object
  extended by com.datasalt.pangool.tuplemr.TupleMRConfigBuilder
Direct Known Subclasses:
TupleMRBuilder

public class TupleMRConfigBuilder
extends Object

ConfigBuilder creates TupleMRConfig immutable instances.

See Also:
TupleMRConfig

Constructor Summary
TupleMRConfigBuilder()
           
 
Method Summary
 void addIntermediateSchema(Schema schema)
          Adds a Map-output schema.
 TupleMRConfig buildConf()
          Creates a brand new and immutable TupleMRConfig instance.
static void initializeComparators(org.apache.hadoop.conf.Configuration conf, TupleMRConfig groupConfig)
          Initializes the custom comparator instances inside the given config criterias, calling the Configurable.setConf(Configuration) method.
 void setCustomPartitionFields(String... fields)
          Sets the fields used to partition the tuples emmited by TupleMapper .
 void setFieldAliases(String schemaName, Aliases aliases)
          Permits to set aliases, or alternate names,to fields that belong to intermediate schema's.
 void setGroupByFields(String... groupByFields)
          Defines the fields used to group tuples by.
 void setOrderBy(OrderBy ordering)
          Sets the criteria to sort the tuples by.
 void setRollupFrom(String rollupFrom)
           
 void setSpecificOrderBy(String schemaName, OrderBy ordering)
          Sets how tuples from the specific schemaName will be sorted after being sorted by commonOrderBy and schemaOrder
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TupleMRConfigBuilder

public TupleMRConfigBuilder()
Method Detail

addIntermediateSchema

public void addIntermediateSchema(Schema schema)
                           throws TupleMRException
Adds a Map-output schema. Tuples emitted by TupleMapper will use one of the schemas added by this method. Schemas added in consecutive calls to this method must be named differently.

Throws:
TupleMRException

setGroupByFields

public void setGroupByFields(String... groupByFields)
                      throws TupleMRException
Defines the fields used to group tuples by. Similar to the GROUP BY in SQL. Tuples whose group-by fields are the same will be grouped and received in the same TupleReducer.reduce(com.datasalt.pangool.io.ITuple, java.lang.Iterable, com.datasalt.pangool.tuplemr.TupleReducer.TupleMRContext, com.datasalt.pangool.tuplemr.TupleReducer.Collector) call.

When multiple schemas are set then the groupBy fields are used to perform co-grouping among tuples with different schemas. The groupBy fields specified in this method in a multi-source scenario must be present in every intermediate schema defined.

A field that's named differently among the intermediate schemas must be aliased in order to be used in the groupBy. For that purpose, use setFieldAliases(String, Aliases).

Throws:
TupleMRException

setRollupFrom

public void setRollupFrom(String rollupFrom)
                   throws TupleMRException
Throws:
TupleMRException

setCustomPartitionFields

public void setCustomPartitionFields(String... fields)
                              throws TupleMRException
Sets the fields used to partition the tuples emmited by TupleMapper . The default implementation performs a partial hashing over the group-by fields.

Throws:
TupleMRException
See Also:
TupleHashPartitioner

setFieldAliases

public void setFieldAliases(String schemaName,
                            Aliases aliases)
                     throws TupleMRException
Permits to set aliases, or alternate names,to fields that belong to intermediate schema's. This allows to group tuples by fields that are named differently across the schemas. For instance:
 b.addIntermediateSchema(new Schema("schema1", Fields.parse("my_url:string, my_id:int")
 b.addIntermediateSchema(new Schema("schema2",Fields.parse("site:string,visits:int")
 b.setFieldAliases("schema1",new Aliases().add("url","my_url"));
 b.setFieldAliases("schema2",new Aliases().add("url","site"));
 b.setGroupByFields("url");
  
 

Parameters:
schemaName - The schema the fields to be aliased belong to.
aliases - An Aliases instance that contains pairs of (alias, referenced_field) pairs.
Throws:
TupleMRException

setOrderBy

public void setOrderBy(OrderBy ordering)
                throws TupleMRException
Sets the criteria to sort the tuples by. In a multi-schema scenario all the fields defined in the specified ordering must be present in every intermediate schema defined.

Throws:
TupleMRException
See Also:
OrderBy

setSpecificOrderBy

public void setSpecificOrderBy(String schemaName,
                               OrderBy ordering)
                        throws TupleMRException
Sets how tuples from the specific schemaName will be sorted after being sorted by commonOrderBy and schemaOrder

Throws:
TupleMRException

buildConf

public TupleMRConfig buildConf()
                        throws TupleMRException
Creates a brand new and immutable TupleMRConfig instance.

Throws:
TupleMRException

initializeComparators

public static void initializeComparators(org.apache.hadoop.conf.Configuration conf,
                                         TupleMRConfig groupConfig)
Initializes the custom comparator instances inside the given config criterias, calling the Configurable.setConf(Configuration) method.



Copyright © –2014 Datasalt Systems S.L.. All rights reserved.