June 19, 2016

Apache Spark | Map and FlatMap

03:12 Posted by DurgaSwaroop , , , , No comments
Map and FlatMap functions transform one collection in to another just like the map and flatmap functions in several other functional languages. In the context of Apache Spark, they transform one RDD in to another RDD.

Here is how they differ from each other.

Map converts an RDD of size ’n’ in to another RDD of size ‘n’. The input and output size of the RDD's will be the same. Or to put it in another way, one element in input gets mapped to only one element in the output.
So, for example let’s say I have an array [1,2,3,4] and I want to increment each element by 10. The input size and output size are same, so we can use map for this transformation.
Required :
[1,2,3,4] -> [11,12,13,14]
Spark code :
myRdd.map(x -> x+10)
So, that is what map function does. While using map, you can be sure that the size of input and output will remain the same and so even if you put a hundred map functions in series, the output and the input will have the same number of elements.


Coming to FlatMap, it does a similar job. Transforming one collection to another. Or in spark terms, one RDD to another RDD. But there is no condition that output size has to be equal to the input size. Or to put it in another way, one element in input can map to zero or more elements in the output.
Also, the output of flatMap is flattened . Though the function in flatMap returns a list of element(s) for each individual element of the input, the output of FlatMap will be an RDD which has all the elements flattened to a single list.
Let’s see this with an example.
Say you have a text file as follows
Hello World
Who are you
Now, if you run a flatMap on the textFile rdd,
words = linesRDD.flatMap(x -> List(x.split(“ “)))

And, the value in the words RDD would be,
[“Hello”, “World”, “Who”, “are”, “you”]
so, the transformation process looks like this,
 linesRDD -> [ [“Hello”, “World”],[“Who”,”are”,”you”] ]
                                   -> [“Hello”, “World”, “Who”, “are”, “you”]
So, those are the differences between Map and FlatMap of Apache Spark.
Keep Practicing and Keep Learning!

If you have liked this article and would like to see more, subscribe to our Facebook and G+ pages.
Facebook page @ Facebook.com/freblogg
Google Plus Page @ Google.com/freblogg
Image Credits : http://spark.apache.org/images/spark-logo-trademark.png


Post a comment

Please Enter your comment here......