Awesome
Workshop: Development and Coding Standards
Fork and clone
As always you should first fork https://github.com/AICoE/workshop-coding-best-practices into your github account.
Click on the Fork
button in the upper right corner.
Then clone your fork to your local machine.
REPLACE <my_account_name> with your account name
git clone git@github.com:<my_account_name>/workshop-coding-best-practices.git
If the above fails, make sure you have setup git correctly, for help see: https://help.github.com/en/articles/testing-your-ssh-connection
Now you can cd workshop-coding-best-practices
into your fork.
Installation and execution the model code
To manage python requirements we're using Pipenv. On Fedora you should be
good with running sudo dnf install pipenv
.
TODO: check if this works and if the interns have rhel CSB?!
Now install all requirements with pipenv install
and enter a shell with the correct python settings: pipenv shell
If some dependencies are missing when you run your code, you might be in another shell without the correct settings.
Again, run pipenv shell
in this directory or always execute your code with pipenv run python model.py
Then you can run the code via python model.py
Clean Code
You'll go through 5 steps of refactoring to make the code a little bit better to understand and read. This is important, because not only you will need to understand your code one month ahead, but also your peers. So Clean Code is a book and a Google Search Term you should remember.
Naming Convention
When you write Python code, you have to name a lot of things: variables, functions, classes, packages, and so on. Choosing sensible names will save you time and energy later. You’ll be able to figure out, from the name, what a certain variable, function, or class represents. You’ll also avoid using inappropriate names that might result in errors that are difficult to debug.
Meaningful names
This is what makes code self-documenting. When you read over your old code, you shouldn’t have to look over every little comment and run every small piece of code to figure out what it all does! The code should roughly read like plain English. This is especially true for variable names, classes, and functions. Those three items should always have names that are self-explanatory. Rather than use a default name like “x” for example, call it “width” or “distance” or whatever the variable is supposed to represent in “read-world” terms. Coding in “real-world” terms will help make your code read in that way
Single Responsibility
One method should be responsible only for one action. If your method does two or three different things at a time then you should consider splitting the functionality of this method into other methods. Such method is easier to understand, scale and reuse in other methods.
10 Tips To Keep Your Code Clean
Keep it DRY
The DRY principle, formulated by Any Hunt and Dave Thomas in The Pragmatic Programmer, is the use of functions, classes and instances to allow you to avoid retyping code that has already been written once. This fundamental principle allows developers to avoid duplication to produce much cleaner code compared to the programmer who uses unnecessary repetition. Optimizing the code is what often separates a great coder from an average one.
Comments
- Always try to explain yourself in code.
- Don't be redundant.
- Don't add obvious noise.
🐍 Zen of Python
What happens if you run the following import in the Python interpreter?
import this
An easter egg shows up - Zen of Python:
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Data Science Perspective
Understand Data
Preview Data
It is always important to scan through the raw data to gain insights like the amount of data you are working with, understand data types or errors to see if any preprocessing is required, decide on how to split the data set for training and prediction etc., Here are some of the commands useful in previewing raw data. Try out the commands and observe what information is displayed.
print(dataset.shape)
print(dataset.dtypes)
print(dataset.head(15))
Descriptive Statistics
Descriptive statistics is a crucial step that helps you understand what the data is trying to tell you. Without descriptive statistics, you may miss valuable information that may eventually lead the model to provide incorrect conclusions. Try out the following command and observe the statistics displayed.
print(dataset.describe())
Correlation
In the model.py code, we have used linear regression to determine the linear relationship between the dependent and independent variable. But how did we know that there was any association between the variables in the data set before deciding on a dependent or independent variable. It is always recommended to use correlation to observe if there exists any association and if it exists, then progress to determining the linear relationship. Try out the following command to see the correlation matrix that shows the magnitude and direction of association between the variables.
print(dataset.corr(method='pearson'))
Model accuracy
Now that we have a model fitted to the data and performed prediction on the test data, it is important to evaluate the quality of this prediction. There are several metrics available to evaluate the accuracy of a predictive model. Try out the following command to view one such metric that shows the average error between the predicted and observed value.
from sklearn import metrics
print(metrics.mean_absolute_error(yPrediction, yTest))
Create a PR
- create a PR to the original Repo
- ask for a review by a peer
🤖 Bots
Project Thoth provides bots which simplify workflow for developers. This repository has already set up a bot which automatically performs tests on the source code each time a new pull request is opened. If you are interested in bot's configuration, take a look at .zuul.yaml configuration file which states two types of jobs performing checks - Coala and PyTest.
PyTest is a framework for building unit-tests for your applications (Python has its own unittest which is often sufficient enough but PyTest provides flexibility, extensibility and better user experience).
🤓 Task 0: Prepare your development environment
Before we run tests, let's clone this repository and prepare our virtual
environment if you haven't done so run the following command (note
the --dev
flag which will install development dependencies):
pipenv install --dev
All tests are available under tests/
directory - to run the test suite, you
can issue the following command:
PYTHONPATH=. pipenv run python3 setup.py test
The PYTHONPATH
directive tells the Python interpreter to consider importing
from local directory to make our codebase importable in tests. Tests are
executed in the virtual environment to guarantee presence of required
development libraries (hence pipenv run ...
).
If the command above looks too long for you, you can run a prepared script
test.sh
which will in fact run the command above:
./test.sh
🤔 Task 1: Let's fix tests!
After running commands above, your can see produced output stating each test
and its success or failure with detailed description on what went wrong. Your
real first task would be to fix test suite in a way it runs correctly (all the
tests pass). As a starting point take a look into tests/test_generic.py
file which will help you understand how the testing framework works:
vim tests/test_generic.py
Now, take a look at failing tests which verify if implementation of Fibonacci
function returns correct results (hint: the actual implementation of fib()
function is implemented correctly, errors are in the test suite).
vim tests/test_module.py
The implementation of fib()
function can be found at:
vim myproject/module.py
😎 Task 3: Open a pull request fixing issues in the test suite and interact with the bot.
Once you fix issues in the test suite, you can open a pull request. See how bots comment to your pull-requests and capture the output of test suite. This helps others verify your changes are correct, and also test changes in a reproducible environment against the current master branch.
Make sure thoth-pytest
is in successful state. You can check results of a run
by clicking on "thoth-pytest", follow link "ara-report", click on "5 Tasks" in
pytest.yaml
row and click on link in "Status column", raw "thoth-pytest : run pytest
" (either successful or failed state).
📝 Some notes...
-
note that each file has it's header stating author and license - this helps you to know who is the module author so you can reach out to her/him directly in case of any issues
-
each file has a brief docstring describing what the module is used for - it helps others and your future self to immediately understand what's the purpose of the module
-
tests follow package/module structure - it helps others and your future self to quickly navigate in the project structure and relate sources
-
names of tests are based on function (the same would apply for method names) and stating what functionality they test so the output of the test suite is human readable
-
at the end of each test run you can see code coverage - how well is your code covered with tests (it does not measure the quality of test suite though!!!)
🧚 Automatic management of package updates
Thoth provides a bot called
Kebechet. It can manage your
dependencies. If you work on a Python project which can benefit from package
updates, feel free to use it (contact one of the thoth-station inhabitants
present in AICoE - Thoth Station
channel on Google Chat).
Relevant info can be also found at Thoth's page.
🐨 Coala - check your sources
Coala is a tool which can automatically check quality of sources - besides Python sources, it can also check other source types - for example YAML files. It comes with plugins called "Bears" which can be configured with paraneters. See .coafile for configuration used in this repository and Coala's homepage for more info.
💾 Code formatting hints...
To automatically format your code, you can use popular tool called black.
You can install it using:
pip3 install black
Now you just run it and it will automatically format source code for you (note: some formatting changes might not be compatible with PEP8 and Coala might complain).:
black . # Directory or path to a file to be formatted (dot for the current directory).
Profiling Code
zak will add stuff