Skip to main content

Getting back into parallel computing with Apache Spark

Returning to parallel computing with Apache Spark has been insightful, especially observing the increasing mainstream adoption of the McColl and Valiant BSP (Bulk Synchronous Parallel) model beyond GPUs. This structured approach to parallel computation, with its emphasis on synchronized supersteps, offers a practical framework for diverse parallel architectures.While setting up Spark on clusters can involve effort and introduce overhead, ongoing optimizations are expected to enhance its efficiency over time. Improvements in data handling, memory management, and query execution aim to streamline parallel processing.A GitHub repository for Spark snippets has been created as a resource for practical examples. As Apache Spark continues to evolve in parallel with the HDFS (Hadoop Distributed File System), this repository intends to showcase solutions leveraging their combined strengths for scalable data processing.



Popular posts from this blog

Discovering ORCID.org and Revisiting My ( Darrell Ulm )Research in Parallel Processing and Associative Computing

ORCid.org is a research publication database (mine: Darrell Ulm)  I recently came across ORCID.org, a platform that\helps researchers organize and present their scholarly work in a structured and reliable way. It surprised me that I had not used it earlier because it offers a level of control and clarity that is incredibly useful when managing decades of publications. As I began adding my research history, I found myself reflecting on the themes that have shaped my work in parallel processing, associative computing, and algorithmic problem solving. It felt a bit like rediscovering old tools in a workshop that I somehow forgot I built. A Look Back at My Research Contributions Much of my work has focused on high performance computing, data parallelism, and innovative approaches to classic optimization problems. ORCID gave me a chance to revisit these contributions and understand how they fit together across time. Parallel and Distributed Processing Several of my publications appeared...