Where did THAT come from?

All too often Python programmers ask themselves, “Who took the last cup of coffee and didn’t start another pot brewing!?”  Probably more often they’ll wonder “Where the hell is my proggy loading that function (or library or module) from?”   Whereas one can usually track down the SOB who left the empty pot and go all BobP* on him, it is often harder, depending on the platform, to answer the latter question.

Recently I found myself with several virtualenv paths as well as two or three actual installations (talking Python, here) along with several environment variables (PATH, LD_LIBRARY_PATH, PYTHONPATH, DYLD_LIBRARY_PATH, DYLD_FRAMEWORK_PATH, etc.) that were, to be perfectly blunt, FUBAR.   The code I was working on needed to load a modified version of the SQLITE3 module, which itself loaded a shared object (.so) file somewhere within its bowels.    No matter how I changed the several *PATH variables, I kept getting results that indicated the modified shared library simply wasn’t being referenced.   I could have spent a great deal of time trying various permutations of the paths in each of the PATH variables.   Having no desire to make a career (let alone a life-long persuit) of this, I decided to take a somewhat more sane approach.   Surely, there must be a way to tell which module python actually loads when it encounters a line like:

import sqlite3

Enter the modulefinder module (section 30.6 of the Python Library Reference).  This contains a class by the same name that gathers information from import statements encountered in the course of executing a script.  Its report method can be invoked after the script has finished and will list the modules imported  and the actual files where those modules were found.    So, to find out which sqlite3 module was being imported, I first tried:

from modulefinder import ModuleFinder
mf = ModuleFinder()
mf.run_script('import sqlite3')

which rewards me with:

Traceback (most recent call last):
  File "", line 1, in 
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/modulefinder.py", line 112, in run_script
    fp = open(pathname, READ_MODE)
IOError: [Errno 2] No such file or directory: 'import sqlite3'

What modulefinder really wants is a script that contains the import statements you’re looking to trace.   So, I put

import sqlite3

into a file named import_this.py and tried again:

mf.run_script('import_this.py')
mf.report()

and … Success!

Name                      File
----                      ----
m __main__                  import_this.py
m _sqlite3                  /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_sqlite3.so
m datetime                  /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/datetime.so
P sqlite3                   /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sqlite3/__init__.py
m sqlite3.dbapi2            /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sqlite3/dbapi2.py
m time                      /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/time.so

This canned report is nice, but you might want to use ModuleFinder‘s results
programatically, in which case you’ll want to refer to its modules attribute, which is
a Python dict object mapping module names to the files from whcih they were actually imported.

Beyond Python

Although this doesn’t tell me where my modified SQLITE3 library is being loaded, it does bring me a step closer.
I can now tell where python is picking up its sqlite3 module along with several other modules, some of which I can readily dismiss (datetime.so and time.so.) It’s obvious that sqlite3 is a Python package from the final part of its path (…/sqlite3/__init__.py) as well as from the capital “P” (for “package”) to the left of the module’s name in the report. Looking inside this file we find just one (executable) line of Python:

from dbapi2 import *

From the ModuleFinder report we already knew this module was being imported, so we continue digging there:

from _sqlite3 import *

which, again, confirms what we already knew from the report. But, that’s all we find in this file, aside from a few class declarations that are nothing more than wrappers and dressing for the contents of _sqlite3. If we go fossicking about inside the source code for _sqlite3.so (which can be found under the Modules subdirectory of the Python source tree) we’ll find a very simple and typical Python extension module that wraps the functions found in the SQLITE3 library itself. This seems to be a bit of a dead end. We still don’t know where the actual location of the SQLITE3 library (i.e. libsqlite3.dylib) that the extension module is referencing. For this, we’ll need a different tool: the ctypes module.

Ctypes let’s us load and easily access any dynamically linked library (DLL, .so, .dyld, etc) object and, in the case of function objects, call them from within python code. It also contains a very useful function called find_library that will … well … find shared libraries. It uses the environment’s *PATH variables the same way the dynamic linker would use them (i.e., searching these in the same order) for the specified library. So, we look for sqlite3:

from ctypes.util import find_library

find_library(‘sqlite3’)

and get:

/opt/local/lib/libsqlite3.dylib'

Looking at my DYLD_LIBRARY_PATH environment variable, I have only /opt/local/lib in the path. I can either replace the libsqlite3.dyld file in this directory with my own, modified version, or I can prepend this environment variable with the path where the modified library can be found:

export DYLD_LIBRARY_PATH="/home/nick/tryit/sqlite-3.6.23/.libs:$DYLD_LIBRARY_PATH"

and then use find_library as before

from ctypes.util import find_library

find_library(‘sqlite3’)

which yields

/home/nick/tryit/sqlite-3.6.23/.libs/libsqlite3.dylib

just as I’d expect.

Conclusions

There is always the temptation to automate processes such as these. I could have tied this all together into a nice neat little function that would take the module name as its argument and return the path, if any, of the associated dynamic library. If I were trying to develop a large scale code scanning tool, say, for doing a security audit of the code, I might do just that. Here, I just wanted to find out which of several possible dynamic libraries was being loaded by importing a single, particular module. A few lines of code are all that’s needed in this case. I’ll leave the fancier tool development to the intrepid reader. Right now, I have to go make some more coffee … before I get yelled at. (Again.)