Awesome
Tutorial Objective
Let's start doing our data analysis not in a spreadsheet program and learn Python and Pandas along the way.
Don't get me wrong, I use spreadsheets, but not for data analysis.
Also, there are some notes from people who I've talked to during the conference in the notes
folder.
Click the .md
file, and github will render the document on the website (like this README.md
file you are reading now).
2016-pydata-carolinas-pandas
Material for Pandas Tutorial at Pydata Carolinas 2016
PyData Carolinas 2016
September 14-16, 2016
Hosted by IBM Emerging Technologies
Research Triangle Park, NC
IBM RTP Activity Center 3039 East Cornwallis Road, Building 400 Research Triangle, NC 27709
PyData Schedule
http://pydata.org/carolinas2016/schedule/
Syllabus
Covered in the tutorial
- Pandas DataFrame basics
- Data assembly
- Missing Data
Not covered in the tutorial
- Plotting
Setup
The easiest way to get everything you need to the tutorial is to install anaconda
You can download and install it here: https://www.continuum.io/downloads
I will be using the Python 3 version during the tutorial.
I actually ended up using Python 2 because of I had a last minute computer change
Install seaborn for plotting
conda install seaborn
Data
- Gapminder: https://github.com/jennybc/gapminder/raw/master/inst/gapminder.tsv
- Survey: Comes from the Software-Carpentry SQL lesson
- Ebola: www.github.com/cmrivers/ebola