Getting back into parallel computing with Apache Spark has been great, and it has been interesting to see the McColl and Valiant BSP (Bulk Synchronous Parallel) model finally start becoming mainstream beyond GPUs.
While Spark can be some effort to setup on actual clusters and does have an overhead, thinking that these will be optimized over time and Spark will become more and more efficient.
I have started a GitHub repo for Spark snippets if any are of interest as Apache Spark moves forward 'in parallel' to the HDFS (Hadoop Distributed File System).