In this workshop we explain what Spark is and how it works. Then we will describe the basics of running Spark on your computer and programming for it through an iPython notebook. We will write some Python to perform text mining with Spark on a small data set. At the end of the workshop we will demonstrate running the same code distributed across a cluster with a much larger data set to show how Spark parallelizes and distributes computation.

Software Configuration

Please try setting your computer up to run Spark with these instructions before the workshop: . A web-based Spark environment will be available during the workshop if Spark does not run on your computer