IEEE Xplore Digital Library Search on author Darrell Ulm

A search for IEEE Xplore entries of research papers in parallel computing for Darrell Ulm.

Tumblr, Wordpress

IEEE Xplore Digital Library Search on author Darrell Ulm

A search for IEEE Xplore entries of research papers in parallel computing for Darrell Ulm.

Tumblr, Wordpress

The Github code repo. for the Knapsack approximation algorithms is here, and it includes a Scala solution. The work on a Java version is in progress at time of this writing.

Below we have the code that computes the solution that fits within the knapsack W for a set of items each with it's own weight and profit value. We look to maximize the final sum of selected items profits while not exceeding the total possible weight, W.

First we import some spark libraries into Python.

# Knapsack 0-1 function weights, values and size-capacity. from pyspark.sql import SparkSession from pyspark.sql.functions import lit from pyspark.sql.functions import col from pyspark.sql.functions import sum

Now define the function, which will take a Spark Dataframe w…

I ran this on a local setup, so it may require modification if you are using something like a Databricks environment. Also you will likely need to setup your Scala environment.

All the code for this is at GitHub

First, let's import all the libraries we need.

Again, we'll define the knapsack approximation algorithm, expecting a dataframe with the profits and weights, as well as W, a total weight.

Calculate t…

While Spark can be some effort to setup on actual clusters and does have an overhead, thinking that these will be optimized over time and Spark will become more and more efficient.

I have started a GitHub repo for Spark snippets if any are of interest as Apache Spark moves forward 'in parallel' to the HDFS (Hadoop Distributed File System).