Instant MapReduce Patterns – Hadoop Essentials How-to

julho 07, 2013

Recently, I got a chance to read the 'Instant MapReduce Patterns - Hadoop Essentials How-to' by Dr Srinath Perera. (thanks to Packetpub who offered me a free e-book for this).

I found this book, a pretty handy book for anybody who wants to get started with MapReduce with Hadoop. In fact, we can find many Map Reduce books which we have a lot of theories and very abstract use cases, but the gap that this book tries to give the readers a complete end-to-end experience on how to program with MapReduce.

When you are solving a problem with MapReduce/Hadoop, you often find that just knowing some HelloWorld type of examples won't be enough and you hardly find resource on how to solve different types of MapReduce problems using Hadoop. This book address that requirement and provide a concise introduction to solving different types of problems using MapReduce.

Book is available for purchase at :

http://www.packtpub.com/mapreduce-patterns-hadoop-essentials/book

Here I have summarized the various types of recipes that are presented in the book with complete code samples and contains all the instructions to run the samples.

Word Count

This would be the very first program you will write, when you start learning MapReduce. This is a very good starting point to learn the basic of MapRecuce and understand the Hadoop Mapper and Reducer APIs. Throughout the book, for all examples a real data set (1G) from

http://snap.stanford.edu/data/#amazon was used. Using that sort of a data set make more sense to the readers as it gives a feeling that we are dealing with real 'large' data sets.

Installing Hadoop in a distributed setup and running Word Count

This example gives the readers a good understanding about a typical Hadoop deployment. The responsibilities of name node, data nodes, job tracker, and task trackers are clearly explained. The example given in the book can be deployed either in a single machine or a set of machines.

Foramtters

When we run MapReduce jobs on a given set of data, by default Hadoop reads the input data line by line. You could write your own formatter to process data which spans across the data set and then feed that in to the MapReduce jobs. This recipe provides a complete example of how to use Formatters with Hadoop.

Analytics

This sample is about doing a statistical analysis of a given data set. From a given data set how we can formulate a frequency distribution histogram with Hadoop and then plot the results with gnuplot.

Relational Operations - Join two data sets

It is often required to process two large data sets and merge them on a given relational operation such as join. The example provided to show the Joining of two operations, combine two data sets one with 'list of most frequent customers' and 'items bought by customer' and then find out the 'items bough by 100 most frequent customers'.

In addition to above recipes, you will find complete examples of Set Operations, Cross Corelation, Simple Search ,Graph Operations and K-Means in the 'Instant MapReduce Patterns - Hadoop Essentials How-to' book.

So, in summary, 'Instant MapReduce Patterns - Hadoop Essentials How-to' is book that any one who is willing to have a deep dive in to MapReduce with Hadoop, must have :).

Pesquisar este blog

CLOUD