Home

Awesome

Tutorial Objective

Let's start doing our data analysis not in a spreadsheet program and learn Python and Pandas along the way.

Don't get me wrong, I use spreadsheets, but not for data analysis.

Also, there are some notes from people who I've talked to during the conference in the notes folder. Click the .md file, and github will render the document on the website (like this README.md file you are reading now).

2016-pydata-carolinas-pandas

Material for Pandas Tutorial at Pydata Carolinas 2016

PyData Carolinas 2016
September 14-16, 2016
Hosted by IBM Emerging Technologies
Research Triangle Park, NC

IBM RTP Activity Center 3039 East Cornwallis Road, Building 400 Research Triangle, NC 27709

PyData Schedule

http://pydata.org/carolinas2016/schedule/

Syllabus

Covered in the tutorial

  1. Pandas DataFrame basics
  2. Data assembly
  3. Missing Data

Not covered in the tutorial

  1. Plotting

Setup

The easiest way to get everything you need to the tutorial is to install anaconda

You can download and install it here: https://www.continuum.io/downloads

I will be using the Python 3 version during the tutorial.

I actually ended up using Python 2 because of I had a last minute computer change

Install seaborn for plotting

conda install seaborn

Data

  1. Gapminder: https://github.com/jennybc/gapminder/raw/master/inst/gapminder.tsv
  2. Survey: Comes from the Software-Carpentry SQL lesson
  3. Ebola: www.github.com/cmrivers/ebola