MapReduce Tutorial: A Word Count Example of MapReduce

 

Let us understand, how a MapReduce works by taking an example where I have a text file called example.txt whose contents are as follows:

 

Dear, Bear, River, Car, Car, River, Deer, Car and Bear

 

Now, suppose, we have to perform a word count on the sample.txt using MapReduce. So, we will be finding the unique words and the number of occurrences of those unique words.

 

 

  • First, we divide the input into three splits as shown in the figure. This will distribute the work among all the map nodes.
  • Then, we tokenize the words in each of the mappers and give a hardcoded value (1) to each of the tokens or words. The rationale behind giving a hardcoded value equal to 1 is that every word, in itself, will occur once.
  • Now, a list of key-value pair will be created where the key is nothing but the individual words and value is one. So, for the first line (Dear Bear River) we have 3 key-value pairs – Dear, 1; Bear, 1; River, 1. The mapping process remains the same on all the nodes.
  • After the mapper phase, a partition process takes place where sorting and shuffling happen so that all the tuples with the same key are sent to the corresponding reducer.
  • So, after the sorting and shuffling phase, each reducer will have a unique key and a list of values corresponding to that very key. For example, Bear, [1,1]; Car, [1,1,1].., etc. 
  • Now, each Reducer counts the values which are present in that list of values. As shown in the figure, reducer gets a list of values which is [1,1] for the key Bear. Then, it counts the number of ones in the very list and gives the final output as – Bear, 2.
  • Finally, all the output key/value pairs are then collected and written in the output file.

 

References:

  1. https://thirdeyedata.ai/hadoop-mapreduce/