Awesome
Güttli's opinionated Python Tips
How to start?
If you are new to Python or to programmming, then my recommendation: Buy a book, switch off your PC. Read.
After you learned the basics, this text might help you.
Avoid
Avoid to write native GUIs (tkinter, gtk, Qt, ...) or native mobile apps.
If you need a GUI, then use HTML.
After the basics
After you learned the basics of Python learn web development:
- http
- html
- css
Javascript is not important. SQL is important.
IDE
I like PyCharm. See My PyCharm Introduction
Testing
PyTest
Use pytest, and if you use Django, then use pytest-django.
Reasons:
assert a == b
is far more easy to read and write thanself.assertEqual(a, b)
.- If your assertion fails, pytest will show you the values. Example:
assert a == b
fails. Then pytest will show you the value ofa
andb
. - The fixture system is really great. This is much more flexible than
setUp()
. In the old unittestsetUp()
method you tend to create things you finally don't need for all tests. This makes the inner edit-test feedbackloop slower. - You can avoid TestCase classes. A simple method starting with
def test_...()
is enough.
pytest -k keyword
is very handy. Just a keyword (or some characters) which are part of the filename or test-function, and only the functions containing this string will get called.
Execute test from your IDE. This way you can jump directly from the nice stacktrace to your beautiful code.
BTW, pytest caching allows you the re-run only the failed tests.
pytest parametrize is handy. It helps you to write concise tests.
TestCase.setUp() should not be used
This tip applies, if you have not switched to pytest fixtures yet.
The method TestCase.setUp() gets called for every test of this TestCase.
Avoid this method. It is very likely that you just waste time since things get done in this method which is not needed for every test of this TestCase.
If possible, please switch to pytest fixtures.
I have seen code where the setUp() method of a class called MinimalFooTestCase
was 660 lines long.
This slows down tests, since most test methods of this class don't need all the stuff that gets created during setUp(). Remember that setUp() gets called before every test__method(). So 660 lines get executed before every test_method().
arrange/act/assert pattern
def test_foo():
# arrange
obj = ...
# act
foo(obj)
# assert
assert obj.bar == baz
Directory/File Layout
Concerning tests, I like this layout:
setup.py
myapp/utils.py
myapp/utils_test.py
myapp/conftest.py
...
conftest.py is for configuring pytest.
This layout follows the LoB (Locality of Behaviour) Prinicple.
I even use a small test which ensures that for every python file, there is a correspondig ..._test.py
file. Of course the is a small exclude list, but
nevertheless this test helps and reminds me to write tests.
pytest-xdist
The pytest-xdist plugin extends pytest with some unique test execution modes:
test run parallelization: if you have multiple CPUs or hosts you can use those for a combined test run. This allows to speed up development or to use special resources of remote machines.
--looponfail: run your tests repeatedly in a subprocess. After each run pytest waits until a file in your project changes and then re-runs the previously failing tests. This is repeated until all tests pass after which again a full run is performed.
Multi-Platform coverage: you can specify different Python interpreters or different platforms and run tests in parallel on all of them.
pytest cuts your output
Sometimes pytests cuts your output, and you don't see what you want to see.
One way to work around this: Write your data to a temporary file:
with open('/tmp/x', 'wt') as fd:
fd.write(json.dumps(data, indent=2))
assert 0
Run your test and then inspect the file /tmp/x
.
But only add this snippet temporarily, since this is vulnurable to a symlink race
Coverage
Coverage is a handy tool to check if most of your code is tested.
If you have a huge code base, and you only care for a small part, you can do this:
# run only tests matching this pattern and collect coverage data:
coverage run -m pytest -k job
# Only create the coverage report for files which match this pattern:
coverage html --include '*job.py'
# Open browser with the created index.html:
run-mailcap htmlcov/index.html
Let tests fail, if coverage is below PERCENT. I use 85 to 95.
pytest --cov-fail-under=PERCENT
Coverage with Context
With Contexts coverage can answer you the question "What test ran this line?"
Output in Tests is cut
The output gets cut by pytest, if it is too long. You want to see the whole data instead of ...
?
You can use a debugger, set a breakpoint and inspect the current state of the local variables.
Or you can help yourself by temporary adding this snippet to your test. For example you want to
see the value of response.content
. Because the tools like IDE provide so many cool features, it is
easily forgotten to use the basics. This creates a file /tmp/o.html
.
with open('/tmp/o.html', 'wb') as f:
f.write(response.content)
Freezegun
FreezeGun is a library that allows your Python tests to travel through time by mocking the datetime module.
@freeze_time("2012-01-14")
def test():
assert datetime.datetime.now() == datetime.datetime(2012, 1, 14)
Raising an Exception in a mock
You want to raise an excepion in a mock. First solution: do raise Exception()
in a lambda, since you
don't want create a new method. Then you realize that this is not possible.
Solution: Mock.side_effect()
Mocking works locally, but not in CI?
Mocking in Python exchanges a name. If you patch foo.utils.my_method
, this might work if you use from foo.utils import my_method
.
It depends which code was run first. If your call to mock.path()
was called before from foo.utils import my_method
, then it works.
But if the import happends before the patch()
, then it does not work.
This means you test works locally, but in CI the test might fail because in CI the import happened in a previous test.
Imagine you import and use my_method()
in caller.py
. The you can patch like this mock.patch('caller.my_method
, ...)`.
If you test calls my_method()
several times from different files, then this won't help. The you need to patch the internals of my_method()
to return the desired result.
Your options:
- use
utils.my_method()
instead ofmy_method()
. Not nice. - Refactor the implementation of
my_method()
so that you patch something inside it, so that it returns the desired result. Not nice. - Patch all places which use
my_method()
during your test. Not nice.
Up to now I know no nice way to solve this.
See Python Docs "Where to patch?"
Type Annotations
For me it feels much more productive to write tests, compared to write type annotations. I don't think type annotations are important. They increase the code size, which means my eyes read more and my brain needs to process more data. This increases the cognitive load. With other words type annotations sometimes decreases the readability.
Web Development
Use Django. Related Django-Tips
Avoid "as" imports
Example:
import datetime as dt
That's possible, but it is confusing. I don't recommend this. If you can type with ten fingers, then typing "datetime" is fast.
Use Virtualenv
Virtualenv is a great tool to get isolated environments. It is very light-weighted and I almost always use it.
I avoid to develop in Docker, virtual machines or Vagrant.
If a database is needed, then I usualy set it up on my local machine.
If the application needs a lot of servers (redis, solr, s3, ...) then I create containers to provide the service. Nevertheless during development my code runs directly on my local machine, not inside a container or VM.
This keeps the inner dev loop of edit-run-test fast.
If your desktop operating system is Windows, then you it might make sense to get Linux via WSL or VirtualBox.
If the application is a web application (for example with Django), I use http
server (like manage.py runserver
) and access the application this way. I don't
set up a https server for development. Serving the application via https is
only needed for the production environment, not for development.
This way I can easily run and debug my code.
I know that some IDEs have plugins to connect to vagrant/docker/ssh, but I avoid this for daily development. I want a fast edit/test loop.
See "How do you develop for the cloud?" in Python Developer Survey: Most people develop locally with virtualenv.
direnv
direnv sets environment variables as soon as you enter a directory with the terminal (cd my-dir
).
I use it to activate the venv without calling . bin/activate
.
Example:
> mkdir my-new-project
> cd my-new-project
> python3 -m venv venv
> open .envrc
Enter this into your .envrc
:
export PATH=$PWD/venv/bin:$PATH
export VIRTUAL_ENV=$PWD/venv
You need to allow the new config once:
> direnv allow
The environment variables get unloaded if you leave the directory, and activated again as soon as you enter the directory.
pre-commit.com
reorder_python_imports (instead of isort)
Automatically format your code
By using Black, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, and freedom from pycodestyle nagging about formatting. You will save time and mental energy for more important matters.
Black makes code review faster by producing the smallest diffs possible. Blackened code looks the same regardless of the project you’re reading. Formatting becomes transparent after a while and you can focus on the content instead.
I use black -S
. The -S
options: Don't normalize string quotes or prefixes. Related Black Docs "Strings". I prefer single quotes, since they are easier to type.
But if you prefer single quotes to double quotes and a way to configure the process, then blue might your tool.
Related: darker
Apply black reformatting to Python files only in regions changed since a given commit.
Related https://github.com/asottile/pyupgrade
Related https://github.com/asottile/reorder_python_imports
Iterators are overrated
For me this code is perfectly fine:
def my_method(...):
ret = []
for foo in ...:
if ...:
continue
...
ret.append(...)
return ret
Of course I could return an iterator instead of plain and boring list. But what do I gain?
I think iterators make things more complicated. One reason for this: You can't loop over the iterator several times.
In general: a list is stateless, an interator is stateful. In most cases the stateless solution is simpler and more mature.
If you need to long list of items, and handling all data in memory does not work any more, then it is maybe time to use a Task Queue. This way you can split your work into small tasks. This gives you much more power than an iterator.
Finally there are two kind of happy developers: Some are happy because they know fancy methods like more_itertools.spy() and some developers are happy because they don't need to these fancy methods.
Avoid map(), filter() and reduce()
Use list- or dict-comprehension instead.
# Example List-Comprehension: Remove items from list where are 0:
old_list = [0, 1, 2, 3, 4, 5, 0]
new_list = [item for item in old_list if item != 0]
# Example: Dict-Comprehension: Remove values which are not True:
old_dict = {'foo': 1, 'bar': 2, 'empty': 0}
new_dict = {k:v for k, v in old_dict.items() if v}
functools.partials()
functools.partial() is cool.
You can create new methods which get additional arguments.
In this example we needed to provide an old interface after refactoring. We remove a lot of code
by creating a general method my_getter()
:
def my_getter(foo, bar, my_model):
...
for foo in ...:
for bar in ...:
setattr(MyModel, foo + '_' + bar, property(functools.partial(my_getter, foo, bar)))
Don't use tox during your inner dev loop.
I think tox is a tool which should get used during CI.
During your inner dev loop (edit, test, edit, test, ...) I think the additional virtualenv in .tox
confuses
more than it brings you value.
Unicode Symbols
Often you can avoid fancy SVG/PNG icons. You can use the unicode symbols: For example \N{Lock}
🔒
I like classmethod
My rule of thumb: If a method of a class does not need the variable "self", then I use @classmethod
. I never use @staticmethod
.
This makes my life easier (reduces cognitive load), since I don't need to think about "a vs b".
Related article: https://medium.com/school-of-code/classmethod-vs-staticmethod-in-python-8fe63efb1797
Detect confusable Unicode Characters.
There a many confusable Unicode characters
To detect them you can use this:
>>> 'TEST_DАТА_MANAGEMENT_ACCOUNT'.encode()
b'TEST_D\xd0\x90\xd0\xa2\xd0\x90_MANAGEMENT_ACCOUNT'
More details:
>>> from unicodedata import name
>>> for char in 'TEST_DАТА_MANAGEMENT_ACCOUNT':
... print(name(char))
...
LATIN CAPITAL LETTER T
LATIN CAPITAL LETTER E
LATIN CAPITAL LETTER S
LATIN CAPITAL LETTER T
LOW LINE
LATIN CAPITAL LETTER D
CYRILLIC CAPITAL LETTER A
CYRILLIC CAPITAL LETTER TE
CYRILLIC CAPITAL LETTER A
LOW LINE
...
HTML sanitizing library
Parsing and Changing HTML
BeautifulSoup, which supports CSS Selectors via SoupSieve
Statistical Profiling
pathlib
pathlib offers classes representing filesystem paths with semantics appropriate for different operating systems.
I don't use it.
Lately I don't play around with file paths that much.
In the past it was different. But dealing with files is becoming less and less important to me (and in general).
I store data in a database, not in files.
Packaging
Please follow the official and maintained guide: Packaging Projects
If you use google to find a packaging guide, then you might read outdated and not maintained blog articles.
Tracing
Standard library module trace:
python -m trace --trace --ignore-dir=/usr:$VIRTUAL_ENV/lib/ your-script.py
hunter has a cool and simple domain language to filter the lines you want to log.
Time Travel Debugging PyTrace
PySnooper Like set -x
in the Bash Shell. Or snoop
Tracing Python Code with settrace
Stacktraces are beautiful
I have seen code, where the developer tried to provide a simple error message without a stracktrace:
if not os.path.exists(some_file):
print(f'{some_file} does not exist.')
sys.exit(1)
with open(some_file) as f:
...
Most people will prefer this short message to a traceback.
If you get an error message like "foo.yaml does not exist", and you are responsible for fixing this, then you love stracktraces. Imagine the code contains 8 places where the above error messages gets created, then things are getting complicated. Which place created the error message?
With a stacktrace a developer can find the root cause much easier.
My point of view: embrace stacktraces. Their are beautiful, since they help you fix issues.
Side effect: Less code. In above example the first three lines (if ... sys.exit(1)
) are not needed. The
open()
call will raise an exception if the file is missing.
Of course it depends on the use-case. If it is very likely that the file exists, then above way is ok.
If it is likely that the file does not exist, then it might make sense to provide a short message and abort.
Async http client
I recommend aiohttp. Unfortunately there are many old and unmaintained async http solutions. AFAIK aiohttp is the best solution today.
Avoid to modify sys.path
Don't fiddle with sys.path or PYTHONPATH. It is not needed, if you use the common patterns.
Docker
If you use pip
in a Dockerfile, pip
downloads files from the internet again and again
if you build the container several times. The usual cache method does not work.
Here is a solution how to provide a cache to pip running in a Dockerfile: Using a pip cache directory in docker builds
Plugin System
https://pluggy.readthedocs.io/en/latest/
Poetry vs Pipenv vs pip-tools
If unsure take setup.cfg and pip. If you need more take pip-tools.
Why Anthony Sottile does not use Python-Poetry
Create setup.cfg from setup.py
upgrade a setup.py to declarative metadata
Using Python versions, which are not available for your operating system
If you want to test your software on a Python version which is not available for your operating system, you can use pyenv to get the right version.
Example: You are running Ubuntu 20.04 which ships with Python 3.8, but you want to test your code with Python 3.10.
Drawbacks
Not as fast as C/Golang/Rust
Python is not made for hyperscaling. It uses too much ressources. You can server several hundret http requests per second, but if you need to server several thousand requests per second, then you might get troubles. But who cares? If your product is very successful you can hire developers to rewrite critical parts in a more performant language.
Magic underscore to dash replacement
Somewhere underscores get changed to dashes, if you install with pip install -e ...
.
I want to automate stuff. I want repo "foo_bar" to be in src/foo_bar
, not src/foo-bar
.
Related pip -e: magic underscore to dash replacement
In sum I already wasted several hours because of this strange "feature".
No TemplateLiterals (like in JS)
...
No auto-escaping html templates in stdlib
Go has a very powerful html templates: html/template
Don't use random.seed()
use
my_random = random.Random()
my_random.seed(...)
This way, you don't rely on global state.
Show dependencies
It is sad, or even hurts. You can't see the dependencies of a pypi package. See Why PyPI Doesn't Know Your Projects Dependencies.
But you can list the dependencies after installing it with pipdeptree.