Skip to main content

Darrell Ulm About and Comp. Sci.

Algorithms, Parallel Computing, Apache Spark, C/C++, Unity3d / C#, Graphics, Open Source Software, Signal Processing, Assembly Language, PHP, MySQL, Drupal, Software Development

Quick review and overview of development and major site profiles evaluated, more details in the blog posts:
  • The micro blog: Twitter
  • Possibly the most complex content management system (CMS): Drupal
  • Git code repository with the best front end, Ruby implementation: GitHub
  • Excellent Apache Spark in the cloud with an Amazing notebook style front ent: Databricks
  • Hadoop and more Hadoop, Hortonworks
  • Tumblr is interesting, like Twitter meets Wordpress, somewhat: Darrell Ulm Tumblr site
  • Wordpress is great and has been putting in all types of cool enhancements in the past years, and is really widespread: Darrell Ulm Wordpress profile
  • Find answers, ask question about tech and about many useful things, in this case for Drupal: StackExchange Drupal, Darrell Ulm Profile
  • An older site for open source code, have some small code exercises here, SourceForge, just because SourceForge is there. 
  • Evaluating Weebly just to be complete, made a quick Weebly Site Profile for myself to see how it worked. Surprisingly for a quick small web site, it is fine.
  • The OpenHub profile, is an interesting site which pulls in open source code by user and presents a listing. Not sure how much it is used these days, but it reminds one of GitHub in the early days.
  • I had to try out how Ted Profiles, pretty clean and the idea is mainly to like or keep track of Ted Talks a user is interested.
  • Along the same lines created a more useful Darrell Ulm GoodReads list for books read, books interested in reading or rating. Software is pretty good with a few interface issues.
  • There are many good presentations at SlideShare, and I had to make an account. Some of the profile linkage does not work, and wondering if updates are not happening on the software. Even so, SlideShare is a useful tool to bookmark slide presentations for researching a topic.
  • The Quora site is an interesting place to have a Profile, and I made one for Darrell Ulm, to learn and answer questions about technologies, software development, computer science and math.
  • So I was ready when needing to work with Wordpress, have an account for Darrell Ulm on for support , and again Wordpress is impressive these days, usful for more enterprise custom sites than ever before with an active plug-in and theme development community.
  • There are two profile for Wordpress, so have one for my Darrell Ulm profiles, this one different, from the support profile and is for rating modules and similar functions.
  • Have to have a Google+ for Darrell Ulm profile page because, it's Google. Seems like people are using it and it could be a useful tool.
  • I designed and developer the Google Books Drupal Module, which is here on Github which uses the Google Books API for a search term or ISBN and returns data to use in a Drupal Text Filter.
  • The CodeProject Profile again is a good site for finding coding standards and tricks and tips on writing software in many languages.
  • This is an outdated link for Kent State University, research papers as well as others. This link is still a decent compilation of the pdf files up to a certain date for associative computing (data-parallel computing).
  • At this site is a listing for Publications on Research Gate, a site with possibly the best interface for an online academic publication profile. 
  • Made an Etsy Profile for Darrell Ulm, as there are some interesting tech or nerd related gadgetry available on Etsy. 
  • There is a public blank Trello Profile for Darrell Ulm out there. Odd that it is essentially empty.
  • Another profile: Instagram Profile for Darrell Ulm, and someday I could post something. Apparently that is still OK.
  • A genuinely excellent site codecademy, and I've got a Profile for Darrell Ulm here also, has great tutorials for several popular computer languages, and it is worth checking out.
  • A new page popped up called Libraries.IO Github, and it appears to be an automated overview of Github users and a short list of code contributed.
  • The site has a nice Computer Science Bibliography which has been around for some time and is pretty accurate as far as the data goes.
  • The Mozilla project has a profile for plug-in developers here , it's honed down to a simple setup and this is where you can post developer Firefox plug-ins.
  • As for a gamified  developer profile, Microsoft has one, i.e. Microsoft MSDN, and it has some gamified elements that other developer profiles are starting to show. The whole idea of goal setting for development is interesting.
  • As most of these profiles, looking at evaluating my TopCoder Profile, and this one is pretty great. When things are not as intense with projects, need to try out a couple of contests.
  • And a Pinterest profile mostly with ceramic handmade tiles.

Popular posts from this blog

Scala Version of Approximation Algorithm for Knapsack Problem for Apache Spark

This is the Scala version of the approximation algorithm for the knapsack problem using Apache Spark.

I ran this on a local setup, so it may require modification if you are using something like a Databricks environment. Also you will likely need to setup your Scala environment.

All the code for this is at GitHub

First, let's import all the libraries we need.

import org.apache.spark._ import org.apache.spark.rdd.RDD import org.apache.spark.SparkConf import org.apache.spark.SparkContext._ import org.apache.spark.sql.DataFrame import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions.sum We'll define this object knapsack, although it could be more specific for what this is doing, it's good enough for this simple test.

object knapsack {
Again, we'll define the knapsack approximation algorithm, expecting a dataframe with the profits and weights, as well as W, a total weight.

def knapsackApprox(knapsackDF: DataFrame, W: Double): DataFrame = {
Calculate t…

A way to Merge Columns of DataFrames in Spark with no Common Column Key

Made post at Databricks forum, thinking about how to take two DataFrames of the same number of rows and combine, merge, all columns into one DataFrame. This is straightforward, as we can use the monotonically_increasing_id() function to assign unique IDs to each of the rows, the same for each Dataframe. It would be ideal to add extra rows which are null to the Dataframe with fewer rows so they match, although the code below does not do this.

Once the IDs are added, a DataFrame join will merge all the columns into one Dataframe.

# For two Dataframes that have the same number of rows, merge all columns, row by row.
# Get the function monotonically_increasing_id so we can assign ids to each row, when the # Dataframes have the same number of rows. from pyspark.sql.functions import monotonically_increasing_id
#Create some test data with 3 and 4 columns. df1 = sqlContext.createDataFrame([("foo", "bar","too","aaa"), ("bar", "bar","aa…

Apache Spark Knapsack Approximation Algorithm in Python

The code shown below computes an approximation algorithm, greedy heuristic, for the 0-1 knapsack problem in Apache Spark. Having worked with parallel dynamic programming algorithms a good amount, wanted to see what this would look like in Spark.

The Github code repo. for the Knapsack approximation algorithms is here, and it includes a Scala solution. The work on a Java version is in progress at time of this writing.

Below we have the code that computes the solution that fits within the knapsack W for a set of items each with it's own weight and profit value. We look to maximize the final sum of selected items profits while not exceeding the total possible weight, W.

First we import some spark libraries into Python.

# Knapsack 0-1 function weights, values and size-capacity. from pyspark.sql import SparkSession from pyspark.sql.functions import lit from pyspark.sql.functions import col from pyspark.sql.functions import sum
Now define the function, which will take a Spark Dataframe w…