Skip to main content

Popular posts from this blog

Python for Data Science

Looking at more resources online for Python for Data Science. There are many good resources available. Of course the main tools are:  Numpy ,  Pandas ,  MathPlotLib ,  SkiKit-Learn  has some amazing tools. Kaggle  for instance has Data Science contents, but good to install a local system like the  Jupyter Notebook  to speed things up as the Kaggle editor can lag and take some time to run on small data-sets. The newer  DataCamp  has some neat tutorials on it and simple App to do daily exercises on your mobile device. Here is the  Python DataScience Handbook . Really useful. A short tutorial:  Learn Python for Data Science , a fun read. A list of cool  DataSci tutorials is here , and another how to get started with  Python for DS . Will add more later.

Drupal 7 EOL and how long will Drupal 9 be Supported

How long will Drupal 9 be supported.   Currently, it is 2023. This is a crucial question for Drupal site owners and builders. While that may seem like a long time away, upgrading from Drupal 8 to 9 is relatively easy compared to previous upgrades from Drupal 5 to 6 and 6 to Drupal 7. Where does this leave the Drupal 7 sites that still need to be upgraded to Drupal 9? The year is 2022, and Drupal 7 continues to be supported for a bit longer to help developers and owners with the upgrade process. Drupal 8 and 9 have really come into their own in recent years. Drupal 8 offered significant enhancements compared to Drupal 7, and the contributed modules are looking promising for the future of Drupal.  

Getting back into parallel computing with Apache Spark

Returning to parallel computing with Apache Spark has been insightful, especially observing the increasing mainstream adoption of the McColl and Valiant BSP (Bulk Synchronous Parallel) model beyond GPUs. This structured approach to parallel computation, with its emphasis on synchronized supersteps, offers a practical framework for diverse parallel architectures.While setting up Spark on clusters can involve effort and introduce overhead, ongoing optimizations are expected to enhance its efficiency over time. Improvements in data handling, memory management, and query execution aim to streamline parallel processing.A GitHub repository for Spark snippets has been created as a resource for practical examples. As Apache Spark continues to evolve in parallel with the HDFS (Hadoop Distributed File System), this repository intends to showcase solutions leveraging their combined strengths for scalable data processing.