Skip to main content

Threads profile for Darrell Ulm

I've recently taken the step of joining Threads as Darrell Ulm ( https://www.threads.com/@darrell_ulm ),  as I embark on a journey to relearn and expand my existing knowledge in areas like artificial intelligence. My current focus involves delving deeper into the intricacies of AI, particularly exploring the fascinating world of Large Language Models (LLMs) and understanding how these sophisticated models are developed and utilized. I'm also revisiting the fundamentals of Neural Networks, the core building blocks that enable AI systems to learn and make predictions. Given the computational demands of these fields, I'm also keen on extending the principles and applications I previously learned in parallel processing, which plays a crucial role in efficiently handling the complex computations involved in AI.

Darrell R. Ulm

Popular posts from this blog

Getting back into parallel computing with Apache Spark

Returning to parallel computing with Apache Spark has been insightful, especially observing the increasing mainstream adoption of the McColl and Valiant BSP (Bulk Synchronous Parallel) model beyond GPUs. This structured approach to parallel computation, with its emphasis on synchronized supersteps, offers a practical framework for diverse parallel architectures.While setting up Spark on clusters can involve effort and introduce overhead, ongoing optimizations are expected to enhance its efficiency over time. Improvements in data handling, memory management, and query execution aim to streamline parallel processing.A GitHub repository for Spark snippets has been created as a resource for practical examples. As Apache Spark continues to evolve in parallel with the HDFS (Hadoop Distributed File System), this repository intends to showcase solutions leveraging their combined strengths for scalable data processing.

A way to Merge Columns of DataFrames in Spark with no Common Column Key

Made post at Databricks forum, thinking about how to take two DataFrames of the same number of rows and combine, merge, all columns into one DataFrame. This is straightforward, as we can use the  monotonically_increasing_id() function to assign unique IDs to each of the rows, the same for each Dataframe. It would be ideal to add extra rows which are null to the Dataframe with fewer rows so they match, although the code below does not do this. Once the IDs are added, a DataFrame join will merge all the columns into one Dataframe. # For two Dataframes that have the same number of rows, merge all columns, row by row. # Get the function monotonically_increasing_id so we can assign ids to each row, when the # Dataframes have the same number of rows. from pyspark.sql.functions import monotonically_increasing_id #Create some test data with 3 and 4 columns. df1 = sqlContext.createDataFrame([("foo", "bar","too","aaa"), ("bar...