Home

Awesome

#Apache Spark 2 for Beginners This is the code repository for Apache Spark 2 for Beginners, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish. ##Instructions and Navigations All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.

Software and Hardware List

Chapter numberSoftware required (with version)Free/ProprietaryIf proprietary, can code testing be performed using a trial versionIf proprietary, then cost of the softwareDownload links to the softwareHardware specificationsOS required
AllApache Spark 2.0.0FreeNANAhttp://spark.apache.org/downloads.htmlX86UNIX or MacOSX
6Apache Kafka 0.9.0.0FreeNANAhttp://www.sublimetext.com/3X86UNIX or MacOSX

Detailed installation steps (software-wise)

The steps should be listed in a way that it prepares the system environment to be able to test the codes of the book. ###1. Apache Spark: a. Download Spark version mentioned in the table<br> b. Build Spark from source or use the binary download and follow the detailed instructions given in the page http://spark.apache.org/docs/latest/building-spark.html<br> c. If building Spark from source, make sure that the R profile is also built and the instructions to do that is given in the link given inthe step b.<br> ###2. Apache Kafka a. Download Kafka version mentioned in the table<br> b. The “quick start” section of the Kafka documentation gives the instructions to setup Kafka. http://kafka.apache.org/documentation.html#quickstart<br> c. Apart from the installation instructions, the topic creation and the other Kafka setup pre-requisites have been covered in detail in the chapter of the book<br>

The code will look like the following:

Python 3.5.0 (v3.5.0:374f501f4567, Sep 12 2015, 11:00:19)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

Spark 2.0.0 or above is to be installed on at least a standalone machine to run the code samples and do further activities to learn more about the subject. For Spark Stream Processing, Kafka needs to be installed and configured as a message broker with its command line producer producing messages and the application developed using Spark as a consumer of those messages.

##Related Products