Apache Spark | Map and FlatMap

Map and FlatMap functions transform one collection in to another just like the map and flatmap functions in several other functional languages. In the context of Apache Spark, they transform one RDD in to another RDD.

Apache Spark Logo

Here is how they differ from each other.

Map

Map converts an RDD of size ’n’ in to another RDD of size ‘n’. The input and output size of the RDD's will be the same. Or to put it in another way, [one element in input gets mapped to only one element in the output.

Map venn diagram

So, for example let’s say I have an array [1,2,3,4] and I want to increment each element by 10. The input size and output size are same, so we can use map for this transformation.

Required :

[1,2,3,4] -> [11,12,13,14]

Spark code :

myRdd.map(x -> x+10)

So, that is what map function does. While using map, you can be sure that the size of input and output will remain the same and so even if you put a hundred map functions in series, the output and the input will have the same number of elements.

FlatMap

Coming to FlatMap, it does a similar job. Transforming one collection to another. Or in spark terms, one RDD to another RDD. But there is no condition that output size has to be equal to the input size. Or to put it in another way, [one element in input can map to zero or more elements in the output.

Flatmap venn diagram

Also, the output of flatMap is flattened . Though the function in flatMap returns a list of element(s) for each individual element of the input, the output of FlatMap will be an RDD which has all the elements flattened to a single list.

Let’s see this with an example.

Say you have a text file as follows

Hello World
Who are you

Now, if you run a flatMap on the textFile rdd,

words = linesRDD.flatMap(x -> List(x.split( )))

And, the value in the words RDD would be,

[“Hello”, “World”, “Who”, “are”, “you”]

so, the transformation process looks like this,

 linesRDD -> [ [Hello, World],[Who,are,you] ]
          -> [Hello, World, Who, are, you]

So, those are the differences between Map and FlatMap of Apache Spark.
Keep Practicing and Keep Learning!


If you have liked this article and would like to see more, subscribe to our Facebook and G+ pages.
Facebook page @ Facebook.com/freblogg

Google Plus Page @ Google.com/freblogg

Image Credits : http://spark.apache.org/images/spark-logo-trademark.png

links

social