I like developing using Python. I like its program structure, its performance and the way it supports coroutines. What I don’t like is distributing Python programs. Python installations from one computer to the next are notoriously different from one another. I may have two seemingly similar Linux computers running the same version of Python and a program may run well on one and may not find a dependency on another.
While the language and run-time system are well-defined, the way in which search-paths are set up and resolved does not seem to be. (See http://stackoverflow.com/questions/25715039/python-interplay-between-lib-site-packages-site-py-and-lib-site-py). Virtualenv aims to help, but the contract between Python and “the system” seems confusing. The result for me has that it has been difficult to produce shrink-wrapped programs for distribution from Python source.
The types of programs I’ve developed usually have dependencies on locally-developed SWIG-generated shared-libraries (.so), so my case may not be typical. (However, it is interesting :-) I’ve looked at using Cython to compile Python modules, and it works well for building an extension module in C, or for extending a C program with Python. However, if your aim is software construction using Python for the distribution of a complete application, then Cython seems lacking.
Nuitka (http://nuitka.net) is a “Python Compiler.” Nuitka can compile just one or a few modules, or it can compile an entire Python program. It takes a higher-level view of the job of compiling an entire project.
In standalone mode, its aim is to comprehend an entire Python program and emit a compiled version of that program with all libraries (shared objects) included. The compiled version captures all of the dependencies of the program on the system and places them in a binary executable and distribution directory. The resulting distribution should then work on any system with the same ABI (Application Binary Interface) as the compiling system.
In my tests, I’ve found it to be very capable. This article describes some of what I’ve learned.
In basic use, compiling a Python program for distribution is as simple as:
% nuitka --standalone main.py
When completed, Nuitka will produce a directory `main.dist` containing a main executable and the collection of all shared libraries used by your project. Because Nuitka walks the parse tree of all of the Python used by your program, the main executable includes the compiled version of all modules used by your program.
main.dist/main.exe - the main executable
The copies of shared libraries found includes many system libraries, as well as the shared-objects accompanying locally-compiled extension modules.
main.dist/libc.so.6 main.dist/libstdc++.so.6 main.dist/_curses.so main.dist/libsqlite3.so.0 main.dist/mymod/submod/_my_swig_mod.so
You can run the binary of the resulting distribution like this
and it will run exactly as if you had typed
% python main.py
Nuitka parses your entire Python program. It begins at your “main.py” and builds a parse tree of all statements. “import” statements are handled at compile time, and imported modules are included in the parse tree.
This is different from some other compilation systems that compile Python code on a module-by-module basis.
Nuitka analyzes the Python program for all of the shared objects it references and gathers copies of them for inclusion in the distribution.
Nuitka compiles Python into C++ and then compiles that C++. Nuitka optimizes the intermediate form so that the resulting code can perform much faster than the original interpreted version.
While speed may be the attraction of compilation for some, producing a shrink-wrapped binary distribution was more interesting to me.
Analyzing your Distribution
If you are preparing a distribution, it is important to make sure that Nuitka has found all of the components your application relies on. You can see the dynamic libraries your distribution is loading by using the Linux `ldd` command.
% ldd main.dist/main.exe
It is also useful to watch what your distribution is loading dynamically by setting the PYTHONVERBOSE environment variable.
% PYTHONVERBOSE=1 main.dist/main.exe
You will get a very detailed listing of all of the imports your python program performs. If your program is dynamically loading a module, you will see something like this
import mymod.submod.my_swig_mod # dlopen("/home/foo/blah/.../mymod/submod/_my_swig_mod.so")
In my case, my SWIG module “my_swig_mod’ was loading its shared object dynamically. I could see the location it was loading from, and it was obvious it was not part of the packed distribution.
Nuitka does a good job at finding shared objects that are loaded as a result of an “import” statement, but it cannot find shared objects that are loaded dynamically. While you can ask explicitly ask Nuitka to include specific Python modules, it still may not find an associated “.so” file since the library remains dynamically loaded.
If you want to include such modules in your distribution, they may need a little help.
I ran into this while using a custom module generated by [SWIG – “Simple Wrapper Interface Generator”](https://swig.org) . In the description here, I’ll use a fictitious module called “my_swig_mod” that is a sub-module of a module called “mymod.submod”.
The difficult Python code is shown below. It is generated by SWIG itself (https://github.com/swig/swig/blob/master/Source/Modules/python.cxx), and so the only way to “fix” it is to alter SWIG.
This code attempts to find the “.so” file related to module “my_swig_mod.py”. SWIG expects that the shared object should live in the same directory as the Python wrapper and that it should begin with an underscore.
==== my_swig_mod.py from sys import version_info if version_info >= (2,6,0): def swig_import_helper(): from os.path import dirname import imp fp = None try: fp, pathname, description = imp.find_module('_my_swig_mod', [dirname(__file__)]) except ImportError: import _my_swig_mod return _my_swig_mod if fp is not None: try: _mod = imp.load_module('_my_swig_mod', fp, pathname, description) finally: fp.close() return _mod _my_swig_mod = swig_import_helper() del swig_import_helper else: import _my_swig_mod del version_info
SWIG tries hard to produce code that can run on many different release-levels of Python system. To accomplish its goals, the SWIG Python handler performs some tricky things (like the excerpt shown
If you are using Nuitka to produce a distribution, your aim is different. You have exactly ONE Python release that you are compiling to. In my case I was using Python 2.7. In my case, the entire block of code in the file “my_swig_mod.py” above can be replaced with the direct import below.
==== my_swig_mod.py import _my_swig_mod
Nuitka handles this import nicely.
Software construction and distribution seems to be going through a period of rethinking at the current time. In the past few years I have seen the proliferation of systems like ‘virtualenv’ for Python and ‘rbenv’ for Ruby. These applications patch the system to help in running applications that may have conflicting dependencies. These tools can help in supporting a few broadly different language versions (Python 2.7, Python 3.2), (Ruby 1.8.7, Ruby 2.1.1). While these tools can help in setting up a few development environments, they do not do much to isolate dependencies and help with application distribution.
I see an interesting corollary between Nuitka and the excitement behind [Docker](https://www.docker.com/) containers. A container describes
- the file system
- and libraries
necessary to run
- a binary
on a Linux ABI. A Dockerfile is a recipe for producing a container and building the binary. Nuitka analyzes a Python program and produces a somewhat similar container: its distributon. A Nuitka distribution includes a binary and the shared objects necessary to run on an ABI (it does not include a file system). Nuitka distributions are shrink-wrapped applications ready to run.
A quick comparison to [Cython](http://cython.org/)
Cython is an optimizing static compiler for Python and an extended language called Cython.
- it is module based
- it does not walk through the dependency tree
- does automatically compile nested modules: when a top-level module
is composed of other modules
- it does not find shared objects used by your application