Skip to main content

A way to Merge Columns of DataFrames in Spark with no Common Column Key

Made post at Databricks forum, thinking about how to take two DataFrames of the same number of rows and combine, merge, all columns into one DataFrame. This is straightforward, as we can use the monotonically_increasing_id() function to assign unique IDs to each of the rows, the same for each Dataframe. It would be ideal to add extra rows which are null to the Dataframe with fewer rows so they match, although the code below does not do this.

Once the IDs are added, a DataFrame join will merge all the columns into one Dataframe.


# For two Dataframes that have the same number of rows, merge all columns, row by row.

# Get the function monotonically_increasing_id so we can assign ids to each row, when the
# Dataframes have the same number of rows.
from pyspark.sql.functions import monotonically_increasing_id

#Create some test data with 3 and 4 columns.
df1 = sqlContext.createDataFrame([("foo", "bar","too","aaa"), ("bar", "bar","aaa","foo"), ("aaa", "bbb","ccc","ddd")], ("k", "K" ,"v" ,"V"))
df2 = sqlContext.createDataFrame([("aaa", "bbb","ddd"), ("www", "eee","rrr"), ("jjj", "rrr","www")], ("m", "M" ,"n"))

# Add increasing Ids, and they should be the same.
df1 = df1.withColumn("id", monotonically_increasing_id())
df2 = df2.withColumn("id", monotonically_increasing_id())

# Perform a join on the ids.
df3 = df2.join(df1, "id", "outer").drop("id")
df3.show()

Started a GitHub repository as look at code snippets for Apache Spark.



Popular posts from this blog

Drupal 7 EOL and how long will Drupal 9 be Supported

How long will Drupal 9 be supported.   Currently, it is 2023. This is a crucial question for Drupal site owners and builders. While that may seem like a long time away, upgrading from Drupal 8 to 9 is relatively easy compared to previous upgrades from Drupal 5 to 6 and 6 to Drupal 7. Where does this leave the Drupal 7 sites that still need to be upgraded to Drupal 9? The year is 2022, and Drupal 7 continues to be supported for a bit longer to help developers and owners with the upgrade process. Drupal 8 and 9 have really come into their own in recent years. Drupal 8 offered significant enhancements compared to Drupal 7, and the contributed modules are looking promising for the future of Drupal.  

Darrell Ulm Wordpress Profile Pages

Looking into Drupal to Wordpress migrations, and WP to Drupal imports and different ways to do the database migration with Drupal views or Ubercart and how it can work with WooComerce. Also looking into other Wordpress plug-ins and API for custom plug-ins. Here are the Wordpress profile links for Darrell Ulm : Darrell Ulm Wordpress Support User Profile Main Darrell Ulm Wordpress Profile Tumblr , Wordpress