Lecture 6 : Engineering tools¶

December, 2021 - François HU

Master of Science - EPITA

This lecture is available here: https://curiousml.github.io/

image.png

Table of contents¶

  1. Code robustness
  2. Code quality
  3. [optional] Tools and project management

1. Code robustness ¶

  • Exceptions
  • Unit tests
  • Profiling
  • [optional] Deployment with ansible

Exceptions: definition¶

An exception is an object that indicates that the program cannot continue its execution. The type of the exception gives an indication of the type of the error encountered and it is usually followed by a more detailed message. All exceptions inherit the Exception type.

In [15]:
def division(a, b):
    return(a/b)

division(1, 0)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
/var/folders/55/c2kg8h2d2wnfzx5wnypmt8kr0000gn/T/ipykernel_1123/719298754.py in <module>
      2     return(a/b)
      3 
----> 4 division(1, 0)

/var/folders/55/c2kg8h2d2wnfzx5wnypmt8kr0000gn/T/ipykernel_1123/719298754.py in division(a, b)
      1 def division(a, b):
----> 2     return(a/b)
      3 
      4 division(1, 0)

ZeroDivisionError: division by zero

Exceptions: catch the error¶

In order to catch the error, we insert the code likely to produce an error between the try and except keywords:

try:
    # ... the program
except:
    # ... what to do in case of an error
else:
    # ... OPTIONAL: what to do when no errors appear
In [16]:
try:
    division(1, 0)
except:
    print("we have en error")
else:
    print("no errors")
we have en error
In [19]:
try:
    print("hello !")
    print(division(1, 2))
    print(division(3, 4))
    print(division(2, 0))
    print(division(1, 10))
except ZeroDivisionError:
    print("division par zéro")
except Exception as exc:
    print("unsuspected error:", exc.__class__)
    print("message ", exc)
hello !
0.5
0.75
division par zéro

Exceptions: raise an error¶

In [20]:
def division(a, b):
    if b == 0:
        raise ValueError
    return(a/b)

try:
    division(3, 0)  # error
except ValueError:
    print("error type: ValueError")
error type: ValueError

Unit tests (unittest, pytest)¶

make tests (in a program test_*.py or *_test.py by convention) to locate the source of error in a program (example package: unittest, pytest)

image.png

Example of unit test with unittest : we want to calculate the area of a circle

In [1]:
import numpy as np

def circle_area(r):
    return np.pi*(r**2)
In [2]:
# test the function
message = "Area of the circle of radius r = {radius} is {area}"
for r in [2, -3, 2+5j, True, "radius"]:
    A = circle_area(r)
    print(message.format(radius = r, area = A))
Area of the circle of radius r = 2 is 12.566370614359172
Area of the circle of radius r = -3 is 28.274333882308138
Area of the circle of radius r = (2+5j) is (-65.97344572538566+62.83185307179586j)
Area of the circle of radius r = True is 3.141592653589793
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/55/c2kg8h2d2wnfzx5wnypmt8kr0000gn/T/ipykernel_47557/360307895.py in <module>
      2 message = "Area of the circle of radius r = {radius} is {area}"
      3 for r in [2, -3, 2+5j, True, "radius"]:
----> 4     A = circle_area(r)
      5     print(message.format(radius = r, area = A))

/var/folders/55/c2kg8h2d2wnfzx5wnypmt8kr0000gn/T/ipykernel_47557/4063705862.py in circle_area(r)
      2 
      3 def circle_area(r):
----> 4     return np.pi*(r**2)

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'
In [13]:
import unittest

class TestCircleArea(unittest.TestCase):
    def test_area(self):
        # test aires lorsque rayon >= 0
        self.assertAlmostEqual(circle_area(1), np.pi)
        self.assertAlmostEqual(circle_area(0), 0)
        self.assertAlmostEqual(circle_area(1.1), pi * 1.1**2)
In [14]:
#unittest.main(argv=[''], verbosity=2, exit=False);
! python -m unittest
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK
In [15]:
class TestCircleArea(unittest.TestCase):
    def test_area(self):
        # test aires lorsque rayon >= 0
        self.assertAlmostEqual(circle_area(1), np.pi)
        self.assertAlmostEqual(circle_area(0), 0)
        self.assertAlmostEqual(circle_area(1.1), np.pi * 1.1**2)
        
    def test_values(self):
        # test aires lorsque rayon <0
        self.assertRaises(ValueError, circle_area, -2)
        
    def test_types(self):
        # test aires lorsque rayon n'est pas un 'int'
        self.assertRaises(ValueError, circle_area, 3+5j)
        self.assertRaises(ValueError, circle_area, True)
        self.assertRaises(ValueError, circle_area, "radius")
In [16]:
#unittest.main(argv=[''], verbosity=2, exit=False);
! python -m unittest
.FF
======================================================================
FAIL: test_types (test_my_program.TestCircleArea)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/Faugon/Documents/3_enseignements/EPITA/python/lecture 6/test_my_program.py", line 17, in test_types
    self.assertRaises(ValueError, circle_area, 3+5j)
AssertionError: ValueError not raised by circle_area

======================================================================
FAIL: test_values (test_my_program.TestCircleArea)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/Faugon/Documents/3_enseignements/EPITA/python/lecture 6/test_my_program.py", line 13, in test_values
    self.assertRaises(ValueError, circle_area, -2)
AssertionError: ValueError not raised by circle_area

----------------------------------------------------------------------
Ran 3 tests in 0.001s

FAILED (failures=2)
In [17]:
import numpy as np

def circle_area(r):
    if type(r) not in [int, float]:
        raise ValueError("le rayon doit etre un entier positif")
    if r < 0:
        raise ValueError("le rayon ne peut pas etre negatif")
    return(np.pi*(r**2))
In [18]:
#unittest.main(argv=[''], verbosity=2, exit=False)
! python -m unittest
...
----------------------------------------------------------------------
Ran 3 tests in 0.000s

OK

Example of unit test with pytest (recommended) : we want to calculate the area of a circle

In [24]:
from my_program import *
import numpy as np
import pytest

def test_area():
    # test aires lorsque rayon >= 0
    circle_area(1) == np.pi
    circle_area(0) == 0
    circle_area(1.1) == np.pi * 1.1**2

def test_values():
    # test aires lorsque rayon <0
    with pytest.raises(ValueError):
        circle_area(-2)
    
def test_types():
    # test aires lorsque rayon n'est pas un 'int'
    with pytest.raises(ValueError):
        circle_area(3+5j)
        circle_area(True)
        circle_area("radius")
In [26]:
#unittest.main(argv=[''], verbosity=2, exit=False)
! py.test test_my_program.py
============================= test session starts ==============================
platform darwin -- Python 3.8.0, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /Users/Faugon/Documents/3_enseignements/EPITA/python/lecture 6
collected 3 items                                                              

test_my_program.py ...                                                   [100%]

============================== 3 passed in 0.18s ===============================

OR

In [27]:
! py.test
============================= test session starts ==============================
platform darwin -- Python 3.8.0, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /Users/Faugon/Documents/3_enseignements/EPITA/python/lecture 6
collected 3 items                                                              

test_my_program.py ...                                                   [100%]

============================== 3 passed in 0.16s ===============================

Unit test coverage¶

measures the number of lines of code actually executed during the execution of unit tests. (objective approx. 80%. e.g. module coverage to measure the coverage)

image.png

In [67]:
# or "pytest -cov test_my_program.py"
! pytest -cov
============================= test session starts ==============================
platform darwin -- Python 3.8.0, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /Users/Faugon/Documents/3_enseignements/EPITA/python/lecture 6, configfile: ov
collected 3 items                                                              

test_my_program.py ...                                                   [100%]

============================== 3 passed in 0.20s ===============================

OR

In [69]:
# or "pytest -cov test_my_program.py"
! pytest -report
============================= test session starts ==============================
platform darwin -- Python 3.8.0, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /Users/Faugon/Documents/3_enseignements/EPITA/python/lecture 6
collected 3 items                                                              

test_my_program.py ...                                                   [100%]

=========================== short test summary info ============================
PASSED test_my_program.py::test_area
PASSED test_my_program.py::test_values
PASSED test_my_program.py::test_types
============================== 3 passed in 0.16s ===============================

So all in all, here is our program and our test program:

image.png

In [78]:
! pytest -cov
============================= test session starts ==============================
platform darwin -- Python 3.8.0, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /Users/Faugon/Documents/3_enseignements/EPITA/python/lecture 6, configfile: ov
collected 3 items                                                              

test_my_program.py ...                                                   [100%]

============================== 3 passed in 0.25s ===============================

Profiling¶

Allows to measure the execution speed and the consumption of each function or each line. When is profiling used? Used when the program is slow and we want to know why.

Naively if we want to quantify the time complexity of a function:

In [4]:
# Let us generate 100M of 1 in a container and compare the time complexities

import numpy as np
import time
n = 100000000

# with list and for range
start = time.time()
l = [1 for _ in range(n)]
end = time.time()
print("time needed (in sec): ", round(end-start, 3))

# with list and concatenation
start = time.time()
l = [1]*n
end = time.time()
print("time needed (in sec): ", round(end-start, 3))

# with the method "ones" of numpy
start = time.time()
l = np.ones(n)
end = time.time()
print("time needed (in sec): ", round(end-start, 3))
time needed (in sec):  5.941
time needed (in sec):  1.657
time needed (in sec):  1.055

Why and how ?

  • In theoretical computer science: we would study the algorithmic complexity which measures how long an algorithm would take to complete given an input of size $n$.
  • In practice: most of the time in a high-level programming such as Python, the external functions might be too complex (or time consuming) to study "theoretically" the algorithmic complexity. So instead we focus on profiling methods which, given a particular case, scan the entire program and highlight which function (or rows) the program the program loses time.
pip install snakeviz
In [3]:
%load_ext snakeviz
In [6]:
# All methods
import numpy as np

def method1(n):
    return [1 for _ in range(n)]

def method2(n):
    return [1]*n

def method3(n):
    return np.ones(n)

def main_results():
    n = 100000000
    
    l = []
    for _ in range(n):
        l.append(1)
    
    l1 = method1(n)
    l2 = method2(n)
    l3 = method3(n)
In [ ]:
%snakeviz main_results()

image.png

In a terminal,

  1. print the profiler of the program my_program_methods.py (ordered by internal time) with:

    python -m cProfile -s time my_program_methods.py
  2. you can generate and store the profiler of my_program_methods.py into my_program_methods.prof with:

    python -m cProfile -s time -o my_program_methods.prof my_program_methods.py

    then visualize it in a browser thanks to snakeviz:

    snakeviz my_program_methods.prof

Your my_program_methods.py looks like as follows:

image.png

2. Code quality ¶

  • Logging
  • Annotations
  • PEP8
  • [optional] Documentation with Sphinx

Logging¶

Logging consists of recording events and operations performed by a program in order to analyze them later (e.g. in case of an error).

image-2.png

image-2.png

In [1]:
import logging
from math import sqrt

# create and configure our logging
LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s"
logging.basicConfig(filename = "my_program.log", filemode = "w", level = logging.DEBUG, format = LOG_FORMAT)
logger = logging.getLogger()

# test the logging
def quadratic_formula(a,b,c):
    """Return the solution of the equation ax^2 + bx + c = 0"""
    logger.info("quadratic_formula({0},{1},{2})".format(a,b,c))
    
    # calculation of the discriminant
    delta = b**2 - 4*a*c
    
    # calculation of the 2 roots
    logger.debug("# calculation of the 2 roots")
    root1 = (-b + sqrt(delta)) / (2*a)
    root2 = (-b - sqrt(delta)) / (2*a)
    
    # returns the roots
    logger.debug("# returns the roots")
    return(root1, root2)
In [2]:
quadratic_formula(1,0,-4)
Out[2]:
(2.0, -2.0)

image-2.png

In [4]:
quadratic_formula(1,0,1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/55/c2kg8h2d2wnfzx5wnypmt8kr0000gn/T/ipykernel_28525/3037663929.py in <module>
----> 1 quadratic_formula(1,0,1)

/var/folders/55/c2kg8h2d2wnfzx5wnypmt8kr0000gn/T/ipykernel_28525/3190562289.py in quadratic_formula(a, b, c)
     17     # calculation of the 2 roots
     18     logger.debug("# calculation of the 2 roots")
---> 19     root1 = (-b + sqrt(delta)) / (2*a)
     20     root2 = (-b - sqrt(delta)) / (2*a)
     21 

ValueError: math domain error

image-2.png

Logging levels: Debug, Info, Warning, Error, Critical

Annotations¶

The Python language is weakly typed. Sometimes we specify the expected type in the documentation of a function or a class. Annotations are a formal way to do this. Even if it has no impact, this information can be used

In [18]:
def foo(prefix, suffix):
    """
    prefix: first word
    suffix: last word
    
    returns : the two words with "and" between
    """
    return prefix + " and " + suffix

foo("eat", "drink")
Out[18]:
'eat and drink'
In [19]:
help(foo)
Help on function foo in module __main__:

foo(prefix, suffix)
    prefix: first word
    suffix: last word
    
    returns : the two words with "and" between

In [22]:
def foo(prefix: str, suffix: str) -> str:
    """
    prefix: first word
    suffix: last word
    returns : the two words with "and" between
    """
    return prefix + " and " + suffix

foo("eat", "drink")
Out[22]:
'eat and drink'
In [23]:
help(foo)
Help on function foo in module __main__:

foo(prefix: str, suffix: str) -> str
    prefix: first word
    suffix: last word
    returns : the two words with "and" between

In [24]:
foo.__annotations__
Out[24]:
{'prefix': str, 'suffix': str, 'return': str}
In [ ]:
def include(prefix: list, suffix: str) -> str:
    """
    prefix: first word
    suffix: last word
    returns : the two words with "and" between
    """
    return prefix + " and " + suffix

foo("eat", "drink")

PEP8¶

It is easier to read a code that always follows the same writing rules. In Python, we call them PEP8. They are not rules to be followed to the letter but most python developers follow them. Some conventions :

  • do not exceed 79 characters per line of code ...

  • skip two lines between functions and classes (but one line between methods of a class)

  • regular_variables: in lower case with "_" if needed

  • CONSTANTS: all upper case

  • function_name(): lowercase with "_" if needed

  • ClassNames: upper case for each word beginning

  • _non_public_properties: "_" at the beginning

  • conflictname_: "_" at the end if there are name conflicts

3. [optional] tools and project management ¶

  • Source tracking software with Github or Gitlab
  • Project management with KanBan
  • Code review with gerrit
  • Continuous integration with Jenkins (locally)
  • Source tracking software (Git): It has become an essential tool for keeping track of changes made to a program. The first task is the possibility to go back. Today, most new projects start on git. Here are the three following sites that host free open source projects: GitHub, GitLab, Bitbucket.
  • Project management (KanBan): The accepted term is KanBan. It comes from agile methods. In concrete terms, the more people work on the same project, the more difficult it is to keep track of what everyone is doing. This approach has been somewhat validated by practice.
  • Code review (gerrit): A code review occurs before a software code update. It is an opportunity for a developer to share his modifications with the rest of his team who comment on the parts of the code that they do not like or that they approve if the update suits them. The rule is often that a change can only be reflected in the application code if one or two other developers approve it.

Pushed to the extreme, this becomes peer programming which consists in programming with two people in front of the same screen. This is quite nightmarish if it is permanent. In practice, code review is a useful exercise. An open source software : gerrit.

  • Continuous integration (Jenkins, BuildBot): We should run all the unit tests at each modification. In practice, this is not always done, or rarely done, because it takes too much time. We delegate this task to a remote machine (see the open source projects travis on Linux, Circle CI, appveyor on Windows. Locally, you can use Jenkins which is very easy to use or BuildBot).