Skip to main content

Popular posts from this blog

Apache Spark Knapsack Approximation Algorithm in Python

The code shown below computes an approximation algorithm, greedy heuristic, for the 0-1 knapsack problem in Apache Spark. Having worked with parallel dynamic programming algorithms a good amount, wanted to see what this would look like in Spark. The Github code repo. for the Knapsack approximation algorithms is here , and it includes a Scala solution. The work on a Java version is in progress at time of this writing. Below we have the code that computes the solution that fits within the knapsack W for a set of items each with it's own weight and profit value. We look to maximize the final sum of selected items profits while not exceeding the total possible weight, W. First we import some spark libraries into Python. # Knapsack 0-1 function weights, values and size-capacity. from pyspark.sql import SparkSession from pyspark.sql.functions import lit from pyspark.sql.functions import col from pyspark.sql.functions import sum Now define the function, which will take a Spark ...

Threads profile for Darrell Ulm

I've recently taken the step of joining Threads as Darrell Ulm ( https://www.threads.com/@darrell_ulm ),  as I embark on a journey to relearn and expand my existing knowledge in areas like artificial intelligence. My current focus involves delving deeper into the intricacies of AI, particularly exploring the fascinating world of Large Language Models (LLMs) and understanding how these sophisticated models are developed and utilized. I'm also revisiting the fundamentals of Neural Networks, the core building blocks that enable AI systems to learn and make predictions. Given the computational demands of these fields, I'm also keen on extending the principles and applications I previously learned in parallel processing, which plays a crucial role in efficiently handling the complex computations involved in AI. Darrell R. Ulm

Python for Data Science

Looking at more resources online for Python for Data Science. There are many good resources available. Of course the main tools are:  Numpy ,  Pandas ,  MathPlotLib ,  SkiKit-Learn  has some amazing tools. Kaggle  for instance has Data Science contents, but good to install a local system like the  Jupyter Notebook  to speed things up as the Kaggle editor can lag and take some time to run on small data-sets. The newer  DataCamp  has some neat tutorials on it and simple App to do daily exercises on your mobile device. Here is the  Python DataScience Handbook . Really useful. A short tutorial:  Learn Python for Data Science , a fun read. A list of cool  DataSci tutorials is here , and another how to get started with  Python for DS . Will add more later.