com.datasalt.pangool.tuplemr
Class SerializationInfo

java.lang.Object
  extended by com.datasalt.pangool.tuplemr.SerializationInfo

public class SerializationInfo
extends Object

Contains information about how to perform binary internal serialization and comparison.

This is used ,among others, in TupleSerialization , TupleHashPartitioner , SortComparator, as well SimpleReducer and RollupReducer.


Constructor Summary
SerializationInfo(TupleMRConfig tupleMRConfig)
           
 
Method Summary
 Schema getCommonSchema()
          Returns the schema that contains fields that will be hadoopSer/deserialized before the schemaId.
 org.apache.hadoop.io.serializer.Deserializer[] getCommonSchemaDeserializers()
           
 int[] getCommonSchemaIndexTranslation(int schemaId)
          Given a intermediate schema id, returns an index correlation from common schema indexes to the specified intermediate schema indexes.
 org.apache.hadoop.io.serializer.Serializer[] getCommonSchemaSerializers()
           
static org.apache.hadoop.io.serializer.Deserializer[] getDeserializers(Schema readSchema, Schema targetSchema, org.apache.hadoop.conf.Configuration conf)
           
 int[] getFieldsToPartition(int schemaId)
          Given a schema returns the fields (indexes) that will be used to calculate a partial hashing by TupleHashPartitioner
 Schema getGroupSchema()
          Returns the schema containing the group-by fields ordered by the common sorting criteria.
 org.apache.hadoop.io.serializer.Deserializer[] getGroupSchemaDeserializers()
           
 int[] getGroupSchemaIndexTranslation(int schemaId)
          Given a intermediate schema id, returns an index correlation from the group schema to the intermediate schema.
 org.apache.hadoop.io.serializer.Serializer[] getGroupSchemaSerializers()
           
 List<int[]> getPartitionFieldsIndexes()
           
static org.apache.hadoop.io.serializer.Serializer[] getSerializers(Schema schema, org.apache.hadoop.conf.Configuration conf)
           
 Schema getSpecificSchema(int schemaId)
          Given a intermediate schema id it returns a subschema from that intermediate schema that contains fields that will be serialized after the schemaId.
 List<org.apache.hadoop.io.serializer.Deserializer[]> getSpecificSchemaDeserializers()
           
 int[] getSpecificSchemaIndexTranslation(int schemaId)
          Given a intermediate schema id, returns an index correlation from the specific schema to the intermediate schema.
 List<Schema> getSpecificSchemas()
          Returns a list containing all the specific schemas ordered by schema id.
 List<org.apache.hadoop.io.serializer.Serializer[]> getSpecificSchemaSerializers()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SerializationInfo

public SerializationInfo(TupleMRConfig tupleMRConfig)
                  throws TupleMRException
Throws:
TupleMRException
Method Detail

getPartitionFieldsIndexes

public List<int[]> getPartitionFieldsIndexes()

getFieldsToPartition

public int[] getFieldsToPartition(int schemaId)
Given a schema returns the fields (indexes) that will be used to calculate a partial hashing by TupleHashPartitioner


getCommonSchemaIndexTranslation

public int[] getCommonSchemaIndexTranslation(int schemaId)
Given a intermediate schema id, returns an index correlation from common schema indexes to the specified intermediate schema indexes. The length of this array matches the number of fields in the common schema.


getSpecificSchemaIndexTranslation

public int[] getSpecificSchemaIndexTranslation(int schemaId)
Given a intermediate schema id, returns an index correlation from the specific schema to the intermediate schema. The length of this array matches the number of fields in the specific schema.


getGroupSchemaIndexTranslation

public int[] getGroupSchemaIndexTranslation(int schemaId)
Given a intermediate schema id, returns an index correlation from the group schema to the intermediate schema. The length of this array matches the number of fields in the group schema.


getSpecificSchemaSerializers

public List<org.apache.hadoop.io.serializer.Serializer[]> getSpecificSchemaSerializers()

getSpecificSchemaDeserializers

public List<org.apache.hadoop.io.serializer.Deserializer[]> getSpecificSchemaDeserializers()

getCommonSchemaSerializers

public org.apache.hadoop.io.serializer.Serializer[] getCommonSchemaSerializers()

getCommonSchemaDeserializers

public org.apache.hadoop.io.serializer.Deserializer[] getCommonSchemaDeserializers()

getGroupSchemaSerializers

public org.apache.hadoop.io.serializer.Serializer[] getGroupSchemaSerializers()

getGroupSchemaDeserializers

public org.apache.hadoop.io.serializer.Deserializer[] getGroupSchemaDeserializers()

getSerializers

public static org.apache.hadoop.io.serializer.Serializer[] getSerializers(Schema schema,
                                                                          org.apache.hadoop.conf.Configuration conf)

getDeserializers

public static org.apache.hadoop.io.serializer.Deserializer[] getDeserializers(Schema readSchema,
                                                                              Schema targetSchema,
                                                                              org.apache.hadoop.conf.Configuration conf)

getCommonSchema

public Schema getCommonSchema()
Returns the schema that contains fields that will be hadoopSer/deserialized before the schemaId. In case that one intermediate schema used then returns a schema containing all the fields from the provided intermediate schema with fields sorted by common criteria.


getSpecificSchema

public Schema getSpecificSchema(int schemaId)
Given a intermediate schema id it returns a subschema from that intermediate schema that contains fields that will be serialized after the schemaId. Returns null if no fields are serialized after the schemaId.


getSpecificSchemas

public List<Schema> getSpecificSchemas()
Returns a list containing all the specific schemas ordered by schema id. see getSpecificSchema(int)


getGroupSchema

public Schema getGroupSchema()
Returns the schema containing the group-by fields ordered by the common sorting criteria.



Copyright © –2014 Datasalt Systems S.L.. All rights reserved.