.. _debugging_chapter: ============== Debugging code ============== **Author**: *Gaël Varoquaux* This section explores tools to understand better your code base: debugging, to find and fix bugs. It is not specific to the scientific Python community, but the strategies that we will employ are tailored to its needs. .. topic:: Prerequisites * Numpy * IPython * `nosetests `__ * `pyflakes `__ * gdb for the C-debugging part. .. contents:: Chapter contents :local: :depth: 2 Avoiding bugs ============= Coding best practices to avoid getting in trouble -------------------------------------------------- .. sidebar:: Brian Kernighan *“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?”* * We all write buggy code. Accept it. Deal with it. * Write your code with testing and debugging in mind. * Keep It Simple, Stupid (KISS). * What is the simplest thing that could possibly work? * Don't Repeat Yourself (DRY). * Every piece of knowledge must have a single, unambiguous, authoritative representation within a system. * Constants, algorithms, etc... * Try to limit interdependencies of your code. (Loose Coupling) * Give your variables, functions and modules meaningful names (not mathematics names) pyflakes: fast static analysis ------------------------------- They are several static analysis tools in Python; to name a few: * `pylint `_ * `pychecker `_ * `pyflakes `_ * `flake8 `_ Here we focus on `pyflakes`, which is the simplest tool. * **Fast, simple** * Detects syntax errors, missing imports, typos on names. Another good recommendation is the `flake8` tool which is a combination of pyflakes and pep8. Thus, in addition to the types of errors that pyflakes catches, flake8 detects violations of the recommendation in `PEP8 `_ style guide. Integrating pyflakes (or flake8) in your editor or IDE is highly recommended, it **does yield productivity gains**. Running pyflakes on the current edited file ............................................ You can bind a key to run pyflakes in the current buffer. * **In kate** Menu: 'settings -> configure kate * In plugins enable 'external tools' * In external Tools', add `pyflakes`:: kdialog --title "pyflakes %filename" --msgbox "$(pyflakes %filename)" * **In TextMate** Menu: TextMate -> Preferences -> Advanced -> Shell variables, add a shell variable:: TM_PYCHECKER = /Library/Frameworks/Python.framework/Versions/Current/bin/pyflakes Then `Ctrl-Shift-V` is binded to a pyflakes report * **In vim** In your `.vimrc` (binds F5 to `pyflakes`):: autocmd FileType python let &mp = 'echo "*** running % ***" ; pyflakes %' autocmd FileType tex,mp,rst,python imap [15~ :make!^M autocmd FileType tex,mp,rst,python map [15~ :make!^M autocmd FileType tex,mp,rst,python set autowrite * **In emacs** In your `.emacs` (binds F5 to `pyflakes`):: (defun pyflakes-thisfile () (interactive) (compile (format "pyflakes %s" (buffer-file-name))) ) (define-minor-mode pyflakes-mode "Toggle pyflakes mode. With no argument, this command toggles the mode. Non-null prefix argument turns on the mode. Null prefix argument turns off the mode." ;; The initial value. nil ;; The indicator for the mode line. " Pyflakes" ;; The minor mode bindings. '( ([f5] . pyflakes-thisfile) ) ) (add-hook 'python-mode-hook (lambda () (pyflakes-mode t))) A type-as-go spell-checker like integration ............................................ * **In vim** * Use the pyflakes.vim plugin: #. download the zip file from http://www.vim.org/scripts/script.php?script_id=2441 #. extract the files in ``~/.vim/ftplugin/python`` #. make sure your vimrc has ``filetype plugin indent on`` .. image:: vim_pyflakes.png * Alternatively: use the `syntastic `_ plugin. This can be configured to use ``flake8`` too and also handles on-the-fly checking for many other languages. .. image:: vim_syntastic.png * **In emacs** Use the flymake mode with pyflakes, documented on https://www.emacswiki.org/emacs/FlyMake and included in Emacs 26 and more recent. To activate it, use ``M-x`` (meta-key then x) and enter `flymake-mode` at the prompt. To enable it automatically when opening a Python file, add the following line to your .emacs file:: (add-hook 'python-mode-hook '(lambda () (flymake-mode))) Debugging workflow =================== If you do have a non trivial bug, this is when debugging strategies kick in. There is no silver bullet. Yet, strategies help: **For debugging a given problem, the favorable situation is when the problem is isolated in a small number of lines of code, outside framework or application code, with short modify-run-fail cycles** #. Make it fail reliably. Find a test case that makes the code fail every time. #. Divide and Conquer. Once you have a failing test case, isolate the failing code. * Which module. * Which function. * Which line of code. => isolate a small reproducible failure: a test case #. Change one thing at a time and re-run the failing test case. #. Use the debugger to understand what is going wrong. #. Take notes and be patient. It may take a while. .. note:: Once you have gone through this process: isolated a tight piece of code reproducing the bug and fix the bug using this piece of code, add the corresponding code to your test suite. Using the Python debugger ========================= The python debugger, ``pdb``: https://docs.python.org/library/pdb.html, allows you to inspect your code interactively. Specifically it allows you to: * View the source code. * Walk up and down the call stack. * Inspect values of variables. * Modify values of variables. * Set breakpoints. .. topic:: **print** Yes, ``print`` statements do work as a debugging tool. However to inspect runtime, it is often more efficient to use the debugger. Invoking the debugger ----------------------- Ways to launch the debugger: #. Postmortem, launch debugger after module errors. #. Launch the module with the debugger. #. Call the debugger inside the module Postmortem ........... **Situation**: You're working in IPython and you get a traceback. Here we debug the file :download:`index_error.py`. When running it, an :class:`IndexError` is raised. Type ``%debug`` and drop into the debugger. .. sourcecode:: ipython In [1]: %run index_error.py --------------------------------------------------------------------------- IndexError Traceback (most recent call last) /home/varoquau/dev/scipy-lecture-notes/advanced/debugging/index_error.py in () 6 7 if __name__ == '__main__': ----> 8 index_error() 9 /home/varoquau/dev/scipy-lecture-notes/advanced/debugging/index_error.py in index_error() 3 def index_error(): 4 lst = list('foobar') ----> 5 print lst[len(lst)] 6 7 if __name__ == '__main__': IndexError: list index out of range In [2]: %debug > /home/varoquau/dev/scipy-lecture-notes/advanced/debugging/index_error.py(5)index_error() 4 lst = list('foobar') ----> 5 print lst[len(lst)] 6 ipdb> list 1 """Small snippet to raise an IndexError.""" 2 3 def index_error(): 4 lst = list('foobar') ----> 5 print lst[len(lst)] 6 7 if __name__ == '__main__': 8 index_error() 9 ipdb> len(lst) 6 ipdb> print(lst[len(lst)-1]) r ipdb> quit In [3]: .. topic:: Post-mortem debugging without IPython In some situations you cannot use IPython, for instance to debug a script that wants to be called from the command line. In this case, you can call the script with ``python -m pdb script.py``:: $ python -m pdb index_error.py > /home/varoquau/dev/scipy-lecture-notes/advanced/optimizing/index_error.py(1)() -> """Small snippet to raise an IndexError.""" (Pdb) continue Traceback (most recent call last): File "/usr/lib/python2.6/pdb.py", line 1296, in main pdb._runscript(mainpyfile) File "/usr/lib/python2.6/pdb.py", line 1215, in _runscript self.run(statement) File "/usr/lib/python2.6/bdb.py", line 372, in run exec cmd in globals, locals File "", line 1, in File "index_error.py", line 8, in index_error() File "index_error.py", line 5, in index_error print lst[len(lst)] IndexError: list index out of range Uncaught exception. Entering post mortem debugging Running 'cont' or 'step' will restart the program > /home/varoquau/dev/scipy-lecture-notes/advanced/optimizing/index_error.py(5)index_error() -> print(lst[len(lst)]) (Pdb) Step-by-step execution ....................... **Situation**: You believe a bug exists in a module but are not sure where. For instance we are trying to debug :download:`wiener_filtering.py`. Indeed the code runs, but the filtering does not work well. * Run the script in IPython with the debugger using ``%run -d wiener_filtering.p`` : .. sourcecode:: ipython In [1]: %run -d wiener_filtering.py *** Blank or comment *** Blank or comment *** Blank or comment Breakpoint 1 at /home/varoquau/dev/scipy-lecture-notes/advanced/optimizing/wiener_filtering.py:4 NOTE: Enter 'c' at the ipdb> prompt to start your script. > (1)() * Set a break point at line 34 using ``b 34``: .. sourcecode:: ipython ipdb> n > /home/varoquau/dev/scipy-lecture-notes/advanced/optimizing/wiener_filtering.py(4)() 3 1---> 4 import numpy as np 5 import scipy as sp ipdb> b 34 Breakpoint 2 at /home/varoquau/dev/scipy-lecture-notes/advanced/optimizing/wiener_filtering.py:34 * Continue execution to next breakpoint with ``c(ont(inue))``: .. sourcecode:: ipython ipdb> c > /home/varoquau/dev/scipy-lecture-notes/advanced/optimizing/wiener_filtering.py(34)iterated_wiener() 33 """ 2--> 34 noisy_img = noisy_img 35 denoised_img = local_mean(noisy_img, size=size) * Step into code with ``n(ext)`` and ``s(tep)``: ``next`` jumps to the next statement in the current execution context, while ``step`` will go across execution contexts, i.e. enable exploring inside function calls: .. sourcecode:: ipython ipdb> s > /home/varoquau/dev/scipy-lecture-notes/advanced/optimizing/wiener_filtering.py(35)iterated_wiener() 2 34 noisy_img = noisy_img ---> 35 denoised_img = local_mean(noisy_img, size=size) 36 l_var = local_var(noisy_img, size=size) ipdb> n > /home/varoquau/dev/scipy-lecture-notes/advanced/optimizing/wiener_filtering.py(36)iterated_wiener() 35 denoised_img = local_mean(noisy_img, size=size) ---> 36 l_var = local_var(noisy_img, size=size) 37 for i in range(3): * Step a few lines and explore the local variables: .. sourcecode:: ipython ipdb> n > /home/varoquau/dev/scipy-lecture-notes/advanced/optimizing/wiener_filtering.py(37)iterated_wiener() 36 l_var = local_var(noisy_img, size=size) ---> 37 for i in range(3): 38 res = noisy_img - denoised_img ipdb> print(l_var) [[5868 5379 5316 ..., 5071 4799 5149] [5013 363 437 ..., 346 262 4355] [5379 410 344 ..., 392 604 3377] ..., [ 435 362 308 ..., 275 198 1632] [ 548 392 290 ..., 248 263 1653] [ 466 789 736 ..., 1835 1725 1940]] ipdb> print(l_var.min()) 0 Oh dear, nothing but integers, and 0 variation. Here is our bug, we are doing integer arithmetic. .. topic:: Raising exception on numerical errors When we run the :download:`wiener_filtering.py` file, the following warnings are raised: .. sourcecode:: ipython In [2]: %run wiener_filtering.py wiener_filtering.py:40: RuntimeWarning: divide by zero encountered in divide noise_level = (1 - noise/l_var ) We can turn these warnings in exception, which enables us to do post-mortem debugging on them, and find our problem more quickly: .. sourcecode:: ipython In [3]: np.seterr(all='raise') Out[3]: {'divide': 'print', 'invalid': 'print', 'over': 'print', 'under': 'ignore'} In [4]: %run wiener_filtering.py --------------------------------------------------------------------------- FloatingPointError Traceback (most recent call last) /home/esc/anaconda/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, *where) 176 else: 177 filename = fname --> 178 __builtin__.execfile(filename, *where) /home/esc/physique-cuso-python-2013/scipy-lecture-notes/advanced/debugging/wiener_filtering.py in () 55 pl.matshow(noisy_face[cut], cmap=pl.cm.gray) 56 ---> 57 denoised_face = iterated_wiener(noisy_face) 58 pl.matshow(denoised_face[cut], cmap=pl.cm.gray) 59 /home/esc/physique-cuso-python-2013/scipy-lecture-notes/advanced/debugging/wiener_filtering.py in iterated_wiener(noisy_img, size) 38 res = noisy_img - denoised_img 39 noise = (res**2).sum()/res.size ---> 40 noise_level = (1 - noise/l_var ) 41 noise_level[noise_level<0] = 0 42 denoised_img += noise_level*res FloatingPointError: divide by zero encountered in divide Other ways of starting a debugger .................................... * **Raising an exception as a poor man break point** If you find it tedious to note the line number to set a break point, you can simply raise an exception at the point that you want to inspect and use IPython's ``%debug``. Note that in this case you cannot step or continue the execution. * **Debugging test failures using nosetests** You can run ``nosetests --pdb`` to drop in post-mortem debugging on exceptions, and ``nosetests --pdb-failure`` to inspect test failures using the debugger. In addition, you can use the IPython interface for the debugger in nose by installing the nose plugin `ipdbplugin `_. You can than pass ``--ipdb`` and ``--ipdb-failure`` options to nosetests. * **Calling the debugger explicitly** Insert the following line where you want to drop in the debugger:: import pdb; pdb.set_trace() .. warning:: When running ``nosetests``, the output is captured, and thus it seems that the debugger does not work. Simply run the nosetests with the ``-s`` flag. .. topic:: Graphical debuggers and alternatives * `pudb `_ is a good semi-graphical debugger with a text user interface in the console. * The `Visual Studio Code `_ integrated development environment includes a debugging mode. * The `Mu editor `_ is a simple Python editor that includes a debugging mode. Debugger commands and interaction ---------------------------------- ============ ====================================================================== ``l(list)`` Lists the code at the current position ``u(p)`` Walk up the call stack ``d(own)`` Walk down the call stack ``n(ext)`` Execute the next line (does not go down in new functions) ``s(tep)`` Execute the next statement (goes down in new functions) ``bt`` Print the call stack ``a`` Print the local variables ``!command`` Execute the given **Python** command (by opposition to pdb commands ============ ====================================================================== .. warning:: **Debugger commands are not Python code** You cannot name the variables the way you want. For instance, if in you cannot override the variables in the current frame with the same name: **use different names than your local variable when typing code in the debugger**. Getting help when in the debugger ................................. Type ``h`` or ``help`` to access the interactive help: .. sourcecode:: pycon ipdb> help Documented commands (type help ): ======================================== EOF bt cont enable jump pdef r tbreak w a c continue exit l pdoc restart u whatis alias cl d h list pinfo return unalias where args clear debug help n pp run unt b commands disable ignore next q s until break condition down j p quit step up Miscellaneous help topics: ========================== exec pdb Undocumented commands: ====================== retval rv Debugging segmentation faults using gdb ========================================== If you have a segmentation fault, you cannot debug it with pdb, as it crashes the Python interpreter before it can drop in the debugger. Similarly, if you have a bug in C code embedded in Python, pdb is useless. For this we turn to the gnu debugger, `gdb `_, available on Linux. Before we start with gdb, let us add a few Python-specific tools to it. For this we add a few macros to our ``~/.gdbinit``. The optimal choice of macro depends on your Python version and your gdb version. I have added a simplified version in :download:`gdbinit`, but feel free to read `DebuggingWithGdb `_. To debug with gdb the Python script :download:`segfault.py`, we can run the script in gdb as follows .. sourcecode:: console $ gdb python ... (gdb) run segfault.py Starting program: /usr/bin/python segfault.py [Thread debugging using libthread_db enabled] Program received signal SIGSEGV, Segmentation fault. _strided_byte_copy (dst=0x8537478 "\360\343G", outstrides=4, src= 0x86c0690
, instrides=32, N=3, elsize=4) at numpy/core/src/multiarray/ctors.c:365 365 _FAST_MOVE(Int32); (gdb) We get a segfault, and gdb captures it for post-mortem debugging in the C level stack (not the Python call stack). We can debug the C call stack using gdb's commands: .. sourcecode:: console (gdb) up #1 0x004af4f5 in _copy_from_same_shape (dest=, src=, myfunc=0x496780 <_strided_byte_copy>, swap=0) at numpy/core/src/multiarray/ctors.c:748 748 myfunc(dit->dataptr, dest->strides[maxaxis], As you can see, right now, we are in the C code of numpy. We would like to know what is the Python code that triggers this segfault, so we go up the stack until we hit the Python execution loop: .. sourcecode:: console (gdb) up #8 0x080ddd23 in call_function (f= Frame 0x85371ec, for file /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py, line 156, in _leading_trailing (a=, _nc=), throwflag=0) at ../Python/ceval.c:3750 3750 ../Python/ceval.c: No such file or directory. in ../Python/ceval.c (gdb) up #9 PyEval_EvalFrameEx (f= Frame 0x85371ec, for file /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py, line 156, in _leading_trailing (a=, _nc=), throwflag=0) at ../Python/ceval.c:2412 2412 in ../Python/ceval.c (gdb) Once we are in the Python execution loop, we can use our special Python helper function. For instance we can find the corresponding Python code: .. sourcecode:: console (gdb) pyframe /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py (158): _leading_trailing (gdb) This is numpy code, we need to go up until we find code that we have written: .. sourcecode:: console (gdb) up ... (gdb) up #34 0x080dc97a in PyEval_EvalFrameEx (f= Frame 0x82f064c, for file segfault.py, line 11, in print_big_array (small_array=, big_array=), throwflag=0) at ../Python/ceval.c:1630 1630 ../Python/ceval.c: No such file or directory. in ../Python/ceval.c (gdb) pyframe segfault.py (12): print_big_array The corresponding code is: .. literalinclude:: segfault.py :language: py :lines: 8-14 Thus the segfault happens when printing ``big_array[-10:]``. The reason is simply that ``big_array`` has been allocated with its end outside the program memory. .. note:: For a list of Python-specific commands defined in the `gdbinit`, read the source of this file. ____ .. topic:: **Wrap up exercise** :class: green The following script is well documented and hopefully legible. It seeks to answer a problem of actual interest for numerical computing, but it does not work... Can you debug it? **Python source code:** :download:`to_debug.py ` .. only:: html .. literalinclude:: to_debug.py