Friday, January 1, 2010

Thoughts on pip - whats the big deal?

I have been a python user since python 2.0. Since then I’ve stayed fairly current with the python and the newer methodologies (currently running 2.6.4). I am always striving to make myself a more proficient developer seeking out more efficient ways of doing things and not afraid to give them a try.
I currently am responsible for a cross platform python environment for our engineering teams (>150 ppl) which has spanned various unix’s, macs and the occasional windows machine. I’ve been doing this for the past several years and have managed python transitions from 2.2 - 2.6 with relative ease.
The methodology we employ to maintain this has been around since I started using python and I would expect will remain so in the future. We extend the default python path using a pth file methodology. The pth file is coupled to an environment variable which gives developers the ability to shift an environment variable and quickly be testing their code in their respective sandbox. This coupled with Perforce gives us a nice branched virtualenv without using virtualenv. We use this methodology for both external modules ( django/beautifulSoup/paramiko etc) and internally developed modules. Currently our single pth file looks like this.

import os, site; tech = os.environ.get("TECHROOT", "/"); tech = tech if os.path.isdir(os.path.join(tech, "tools/python/Mac/lib/python2.6/site-packages")) else "/"; site.addsitedir(os.path.join(tech, "tools/python/Mac/lib/python2.6/site-packages"))
import os, site; tech = os.environ.get("TECHROOT", ""); tech = tech if os.path.isdir(os.path.join(tech, "tools/python/modules")) else "/";site.addsitedir(os.path.join(tech, "tools/python/modules"))


Admittedly this may be a bit overkill. However limitations on pth files (paths must be on a single line) force the need. FWIW all this file does is get the environment variable TECHROOT if defined or establish a default path. The first line is used for external modules and the second line is for internally developed modules. Module installation is a then breeze when using a ~/.pydistutils.cfg file which looks like this.

[install]
install_lib = $TECHROOT/tools/python/Mac/lib/python$py_version_short/site-packages
install_scripts = $TECHROOT/tools/python/Mac/bin

Again this is all tied to our revision control system (perforce) which allows any developer to easily add external modules or build modules for other to use. By shifting around the environment variable we can thoroughly test our code prior to release. Simple enough.

Over the past couple years I have really come to enjoy easy_install. It certainly worked for everything I needed. It has some limitations which I didn’t like but for the most part it worked. Some of the notable limitations I usually do are as follows.
  • Dependency Tracking must be thought out and tested ahead of time.
  • easy_install should be named easy_install<ver> (easy_install2.6)
  • Any bin files should have its header nailed down (/usr/bin/env python2.6) or be named file<ver> (django-admin2.5.py)
  • Un-Installing is a manual process - but simple cause it’s a single file and tweak of on file.
Barring the above limitations, I like easy_install. I like the packaging of egg archives; it’s a simple 1 file. True this forced me to be very conscience of the platforms and make sure that everyone was on the same page with respect to upgrades but that’s simple planning. Not every package builds with easy_install but then again neither does pip.

So I decided that I would use my shift to Snow Leopard and 2.6.4 as the basepoint for using pip. After reading comments like this I figured surely I must be missing something. After playing with pip now for a week I can honestly say - Nope I don’t think so.

To be fair pip is still at release 0.61. It appears pip is really geared towards those which use a virtualenv environment. As detailed above we don’t need virtualenv we effectively have an automated one with perforce, and an environment variable. That being said I found several things I didn’t like with pip and other things which are broken altogether.
  • Dependency Tracking still must be thought out and tested ahead of time.
  • Un-Installing packages doesn’t always work
  • Pip Doesn’t understand the test constructs if provided.
  • Pip Doesn’t always respect ~/pydistutils.cfg install_scripts directive. Most of the time I just needed to pip install --install-option="--prefix=$TECHROOT/tools/python/Mac”
So what’s the big deal? Is it me or am I missing something larger. I thought pip would track dependancies allowing you to have say two versions of a module installed simultaneously. But no. It’s a nice and clean installer (I’ll give you that). The freeze is pretty slick if you are distributing your own system without a real SCM system in place. Overall I’m not overly impressed and I feel pip is overrated or easy_install misleadingly accused. Maybe it’s me?

3 comments:

Christopher said...

Interesting read. Don't know what to make of it just yet, but food for thought for sure.

metapundit.net said...

Well the reason for the existence of pip is pretty much better virtualenv integration imo, so if you're not using virtualenv I could see being unimpressed.

The other feature I use all the time, however, is the ability to install from source trees. I build web apps using components (sometimes my own) that may not have release cycles. But I can create repeatable installs specifying particular revisions directly from the scm of each component - and that makes my develop/test/deploy cycle much easier to manage...

Ycros said...

Somewhat of a tangent, but have you ever looked at zc.buildout? I use it for all my projects, it handles the setup of an environment, and will pull in dependencies into the local directory (based on a setup.py, or otherwise). It really eliminates the use of pip, easy_install and virtualenv - except if I want to install something that I want to use in my general environment (ie. useful commandline tools)