Tuple MapReduce for Hadoop: MapReduce made easy
Hadoop has a steep learning curve. Pangool aims to simplify Hadoop development without loosing the performance and flexibility that the low-level Hadoop API provides.
The most common patterns that arise when writing MapReduce jobs are easier to implement with Pangool with similar performance.
Although it is commonly needed in parallel data processing, secondary sorting is a nightmare to accomplish with the standard Java MapReduce Hadoop API.
Check how easy secondary sorting is with Pangool:
job.setGroupByFields("word"); job.setOrderBy(new OrderBy() .add("word", Order.ASC) .add("count", Order.DESC));
Although it is a common pattern when working with big data, joining heterogeneous data sources is extremely complex to implement with the the standard Java MapReduce Hadoop API.
With Pangool it is as easy as it can get:
job.addIntermediateSchema(urlMapSchema); job.addIntermediateSchema(urlRegisterSchema); job.setGroupByFields("url");