How to flatten (or explode) the data along with row in a Dataframe, based on column data?

Multi tool use
Multi tool use


How to flatten (or explode) the data along with row in a Dataframe, based on column data?



Data frame should explode based on SPC column. Below is example



My Input DataFrame.


ID Name Level SPC Rating salry
23 sam 3 HBS 3.5 4000
43 Nair 4 KSTk 4 5000
56 Rom 5 MNC 3 3000



My output should be:


ID Name level SPC Rating Salary
23 sam 3 H 3.5 4000
23 sam 3 B 3.5 4000
23 sam 3 S 3.5 4000
43 Nair 4 K 4 5000
43 Nair 4 S 4 5000
43 Nair 4 T 4 5000
43 Nair 4 k 4 5000



How can I resolve this problem in Scala or Java code?




2 Answers
2



If you have a dataframe/dataset as


+---+----+-----+----+------+------+
|ID |Name|Level|SPC |Rating|salary|
+---+----+-----+----+------+------+
|23 |sam |3 |HBS |3.5 |4000 |
|43 |Nair|4 |KSTk|4.0 |5000 |
|56 |Rom |5 |MNC |3.0 |3000 |
+---+----+-----+----+------+------+



then you can write a udf function to convert the SPC column string values to array of each characters as string and then use explode function as


udf


SPC


explode


import org.apache.spark.sql.functions._
def flattenStringUdf = udf((spc: String) => spc.toList.map(_.toString))

df.withColumn("SPC", explode(flattenStringUdf(col("SPC")))).show(false)



which should give you


+---+----+-----+---+------+------+
|ID |Name|Level|SPC|Rating|salary|
+---+----+-----+---+------+------+
|23 |sam |3 |H |3.5 |4000 |
|23 |sam |3 |B |3.5 |4000 |
|23 |sam |3 |S |3.5 |4000 |
|43 |Nair|4 |K |4.0 |5000 |
|43 |Nair|4 |S |4.0 |5000 |
|43 |Nair|4 |T |4.0 |5000 |
|43 |Nair|4 |k |4.0 |5000 |
|56 |Rom |5 |M |3.0 |3000 |
|56 |Rom |5 |N |3.0 |3000 |
|56 |Rom |5 |C |3.0 |3000 |
+---+----+-----+---+------+------+



I hope the answer is helpful





It is working for me.Thank you very much ramesh Maharajan.
– Siddesh H K
Jul 1 at 7:12



Try the flatMap method.



Example (haven't checked if this compiles):


val output = input.flatMap(row =>
row.SPC.toList.map(ch =>
new MyRow(row.ID, row.Name, row.level, ch, row.Rating, row.Salaray))






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

XHySzSEcYR8SpIW,3sSOb3mDy2bMkXc9xVvZ68PxU78H,z9ZSHNf
2atFM WoNcKjSq0x gWK Hxcc tNGuxF K,1dy GrrGGsG,tJ6IwMcY Lcs,DvuK BXCZ8lWPMeSAkgqjRC MLdSvmS

Popular posts from this blog

PySpark - SparkContext: Error initializing SparkContext File does not exist

django NoReverseMatch Exception

Audio Livestreaming with Python & Flask