Home

Awesome

DAT8 Course Repository

Course materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15).

Instructor: Kevin Markham (Data School blog, email newsletter, YouTube channel)

Binder

TuesdayThursday
8/18: Introduction to Data Science8/20: Command Line, Version Control
8/25: Data Reading and Cleaning8/27: Exploratory Data Analysis
9/1: Visualization9/3: Machine Learning
9/8: Getting Data9/10: K-Nearest Neighbors
9/15: Basic Model Evaluation9/17: Linear Regression
9/22: First Project Presentation9/24: Logistic Regression
9/29: Advanced Model Evaluation10/1: Naive Bayes and Text Data
10/6: Natural Language Processing10/8: Kaggle Competition
10/13: Decision Trees10/15: Ensembling
10/20: Advanced scikit-learn, Clustering10/22: Regularization, Regex
10/27: Course Review10/29: Final Project Presentation
<!-- ### Before the Course Begins * Install [Git](http://git-scm.com/downloads). * Create an account on the [GitHub](https://github.com/) website. * It is not necessary to download "GitHub for Windows" or "GitHub for Mac" * Install the [Anaconda distribution](http://continuum.io/downloads) of Python 2.7x. * If you choose not to use Anaconda, here is a list of the [Python packages](other/python_packages.md) you will need to install during the course. * We would like to check the setup of your laptop before the course begins: * You can have your laptop checked before the intermediate Python workshop on Tuesday 8/11 (5:30pm-6:30pm), at the [15th & K Starbucks](http://www.yelp.com/biz/starbucks-washington-15) on Saturday 8/15 (1pm-3pm), or before class on Tuesday 8/18 (5:30pm-6:30pm). * Alternatively, you can walk through the [setup checklist](other/setup_checklist.md) yourself. * Once you receive an email invitation from Slack, join our "DAT8 team" and add your photo. * Practice Python using the resources below. -->

Python Resources

<!-- ### Submission Forms * [Feedback form](http://bit.ly/dat8feedback) * [Homework and project submissions](http://bit.ly/dat8homework) -->

Course project

Comparison of machine learning models

Comparison of model evaluation procedures and metrics

Advice for getting better at data science

Additional resources


Class 1: Introduction to Data Science

Homework:

Resources:


Class 2: Command Line and Version Control

Homework:

Git and Markdown Resources:

Command Line Resources:


Class 3: Data Reading and Cleaning

Homework:

Resources:


Class 4: Exploratory Data Analysis

Homework:

Resources:


Class 5: Visualization

Homework:

Pandas Resources:

Visualization Resources:


Class 6: Machine Learning

Homework:

Machine Learning Resources:

IPython Notebook Resources:


Class 7: Getting Data

Homework:

API Resources:

Web Scraping Resources:


Class 8: K-Nearest Neighbors

Homework:

KNN Resources:

Seaborn Resources:


Class 9: Basic Model Evaluation

Homework:

Model Evaluation Resources:

Reproducibility Resources:


Class 10: Linear Regression

Homework:

Linear Regression Resources:

Other Resources:


Class 11: First Project Presentation

Homework:


Class 12: Logistic Regression

Homework:

Logistic Regression Resources:

Confusion Matrix Resources:


Class 13: Advanced Model Evaluation

Homework:

ROC Resources:

Cross-Validation Resources:

Other Resources:


Class 14: Naive Bayes and Text Data

Homework:

Resources:


Class 15: Natural Language Processing

Homework:

NLP Resources:


Class 16: Kaggle Competition

Homework:

Resources:


Class 17: Decision Trees

Homework:

Resources:


Class 18: Ensembling

Resources:


Class 19: Advanced scikit-learn and Clustering

Homework:

scikit-learn Resources:

Clustering Resources:


Class 20: Regularization and Regular Expressions

Homework:

Regularization Resources:

Regular Expressions Resources:


Class 21: Course Review and Final Project Presentation

Resources:


Class 22: Final Project Presentation


Additional Resources

Tidy Data

Databases and SQL

Recommendation Systems