Awesome
awesome-ApacheSpark-collections or Awesome Spark
Book keeping of Apache Spark web search!
Also a curated list of awesome Apache Spark packages and resources.
Other github awesome links
Online Free Clusters
Notebooks and IDEs
- Apache Zeppelin - Web-based notebook that enables interactive data analytics with plugable backends, integrated plotting, and extensive Spark support out-of-the-box.
- Spark Notebook - Scalable and stable Scala and Spark focused notebook bridging the gap between JVM and Data Scientists (incl. extendable, typesafe and reactive charts).
- sparkmagic - Jupyter magics and kernels for working with remote Spark clusters, for interactively working with remote Spark clusters through Livy, in Jupyter notebooks.
Books on Apache Spark
- Spark Internals
- Mastering Apache Spark 2.0
- Jaceklaskowski Gitbooks
- RDD Collection API Examples
- Databricks Knowledge Base GitBook
- Spark in Action - https://github.com/spark-in-action
- Learning Spark: Lightning-Fast Big Data Analysis 2015 by Holden Karau, Andy Kowinski, Matei Zaharia, Patrick Wendell Git: https://github.com/databricks/learning-spark
- Fast Data Processing with Spark by Holden Karau
- Advanced Analytics with Spark Patterns for Learning from Data at Scale by Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills https://github.com/sryza/aas
- Machine Learning with Spark by Nick Pentreath
- Apache Spark Graph Processing by Rindra Ramamonjison
- Big Data Analytics with Spark - A Practitioner’s Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller
- Mastering Apache Spark by Packt
- Apache Spark 2 for Beginners https://github.com/PacktPublishing/Apache-Spark-2-for-Beginners
Blogs
Must Read list
##Introduction
- Exploring Stockoverflow Dataset
Spark + Hadoop
Spark Internals
SparkSQL
Streaming
Spark on GPU / DeepLearning
- https://databricks.com/blog/2016/10/27/gpu-acceleration-in-databricks.html
- https://databricks.com/blog/2016/12/21/deep-learning-on-databricks.html
- https://docs.databricks.com/applications/deep-learning/tensorflow.html
Tips & Tricks
-http://blog.smaato.com/tuning-spark-streaming-applications/
Spark Packages
- http://spark-packages.org/
- File I/O CSV
- Neo4j [Neo4j-Spark Connector] (https://github.com/neo4j-contrib/neo4j-spark-connector)
Videos on Apache Spark
Channels
- https://www.youtube.com/user/TheApacheSpark
- https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA
- https://www.youtube.com/user/BerkeleyAMPLab
- https://www.youtube.com/user/typesafehub
Playlists
- Apache Spark and Scala
- Popular Apache Spark & Machine learning videos
- Popular Videos - Apache Spark & Tutorial
- Popular Videos - Apache Spark
Github Projects - Ever Growing List!
Setup
- https://github.com/clearstorydata-cookbooks/apache_spark
- https://github.com/gwik/spark-cookbook
- https://github.com/azavea/ansible-spark
- https://github.com/tzolov/apache-spark-build-pipeline
- https://github.com/aur-atomica-net/apache-spark
- https://github.com/GELOG/docker-ubuntu-spark
- https://github.com/kbastani/spark-neo4j
Spark Internals
Spark Learning/Workshop
- https://github.com/Mageswaran1989/aja
- https://github.com/deanwampler/spark-workshop
- https://github.com/ceteri/spark-exercises
- https://github.com/lenards/explore-spark
- https://github.com/seglo/learning-spark
- https://github.com/ceteri/intro_spark
- https://github.com/HadoopTW/CS100.1x
- https://github.com/EvanZ/myvagrant
- https://github.com/zfz/spark-cs100.1x
- https://github.com/StephenHarrington/spark
- https://github.com/gudiseva/Spark
- https://github.com/hoangtamvo/spark
- https://github.com/okaram/spark
- https://github.com/linshiu/spark
- https://github.com/jingjinggu/Apache_Spark
- https://github.com/aur-atomica-net/apache-spark
- https://github.com/dhesse/SparkTalk
- https://github.com/adamliesko/bigdata-spark
- https://github.com/skrusche63/spark-connect
- https://github.com/spirom/LearningSpark
Spark
- https://github.com/hohonuuli/sparknotebook
- https://github.com/googlegenomics/spark-examples
- https://github.com/sujee81/SparkApps
- https://github.com/praveensripati/spark-examples
- https://github.com/jdutton/spark-playground
- https://github.com/arjones/spark-news
- https://github.com/felixcheung/spark-notebook-examples
- https://github.com/manku-timma/spark
- https://github.com/joseratts/Spark
- https://github.com/giocode/SparkTutorial
- https://github.com/eenov8/apacheSpark
- https://github.com/yu-iskw/spark-dataframe-introduction
- https://github.com/rajanpupa/ApacheSparkExample
- https://github.com/XD-DENG/Spark-practice
Streaming
- https://github.com/prabeesh/SparkTwitterAnalysis
- https://github.com/cotdp/spark-example-clickstream-social
- https://github.com/ippontech/metrics-spark-receiver
- https://github.com/aleph-w/ApacheSparkLearning
Sql
MLLib
- https://github.com/OndraFiedler/spark-recommender
- https://github.com/marklit/recommend
- https://github.com/staple/spark-agd
- https://github.com/tizfa/sparkboost
- https://github.com/rahmanusta/Spark-Bayes
- https://github.com/spacedotworks/decisiontree_ApacheSpark
Spark Machine Learning
- https://github.com/PredictionIO/PredictionIO
- https://github.com/BaiGang/spark_multiboost
- https://github.com/alitouka/spark_dbscan
- https://github.com/amplab/keystone
- https://github.com/krasserm/akka-analytics
Spark Streaming
- https://github.com/miguno/kafka-storm-starter
- https://github.com/killrweather/killrweather
- https://github.com/NFLabs/ambari
- https://github.com/rustyrazorblade/killranalytics
Spark + Visulization
Spark + WebServer
Spark + REST
Spark + Cassendra
Spark + NoSQL datastore
- https://github.com/Stratio/deep-spark
- https://github.com/RussellSpitzer/spark-cassandra-csv
- https://github.com/haosdent/spark-hbase
Spark + Elastic search
- https://github.com/skrusche63/spark-elastic
- https://github.com/mhausenblas/elsa
- https://github.com/SHSE/spark-es
Spark + Azure + PowerBI
Spark + Genomics
Spark + Ruby
Usefull Addons
- https://github.com/amplab/spark-indexedrdd
- https://github.com/mrsqueeze/spark-hash
- https://github.com/simplymeasured/phoenix-spark
- https://github.com/calrissian/spark-jetty-server
- https://github.com/cloudera/spark-timeseries
- https://github.com/skrusche63/spark-weblog