Python Package Popularity Contest
I wanted to find out which Python packages are used most to give me a rough idea about the order in which you’d want to process Python packages if you were to, say, build packages for your favorite Linux distribution, starting from the most widely used.
I know that Debian runs a popularity contest that users can opt-in to participate in, and you can easily query for your favorite package. Unfortunately I don’t know if they offer this data in a list format or even in a way I could easily download. Other distros may do so as well, but I don’t know if their databases are publicly available.
Python Package Index (PyPI) has download counters for packages that it hosts, but not all packages that are listed on PyPI have uploaded their packages to PyPI. And even those that have, often just show the latest uploaded version with their download statistics. And PyPI does not list all packages anyway.
With those caveats I thought I would still get some interesting data from the PyPI download numbers. I wrote a little script to go through the roughly 5000 package pages and count the download numbers if any. Here’s my script (it cannot handle all of the URLs it encounters, and it has some other bugs as well, but I was not interested in complete accuracy at this time):
#!/usr/bin/env python import urllib2, time, sys from BeautifulSoup import BeautifulSoup BASE_URL = 'http://pypi.python.org/pypi/' soup = BeautifulSoup(urllib2.urlopen('http://pypi.python.org/simple/')) for i, a in enumerate(soup('a')): name = a.contents[0] url = a.attrs[0][1] url = BASE_URL + url try: package_soup = BeautifulSoup(urllib2.urlopen(url)) try: values = (int(td.contents[0]) for td in package_soup('td', style='text-align: right;') if td.contents and td.contents[0].isdigit()) except Exception: values = [-1] except Exception: values = [-2] print sum(values), name sys.stdout.flush() # So that I can tee to file and watch stdout # Be nice and don't hit PyPI too frequently time.sleep(1) # Break early when you are debugging the script # Uncomment this once you are ready to run for real if i > 9: break print 'Fetched statistics from %d packages' % (i + 1)
The winner is zc.buildout at over 93,000 downloads! I guess now I really need to learn how to use it
The top 20 entries are as follows:
93627 zc.buildout 65512 zope.interface 50544 setuptools 47690 zope.event 40940 zope.dottedname 40318 Paste 39446 zope.configuration 38995 kid 38306 zope.dublincore 37698 zope.formlib 37255 zope.location 37118 PasteDeploy 36403 zope.copypastemove 35943 zope.filerepresentation 35257 plone.recipe.distros 35126 RestrictedPython 35051 zope.error 34569 zope.app.error 32811 zope.pagetemplate 32722 zope.tales
To prevent everyone from running this script and bringing PyPI to its knees, you can download the full results here: pypi-popularity-contest-2008-10-23.txt.
In retrospect it would be nice if PyPI could provide download statistics as a simple list itself.
Similar Posts:
- None Found



Tarek Ziadé:
you can get these info here: http://pypi.python.org/webstats/
Also, notice that zc.buildout is automatically downloaded everytime someone in the world builds a Plone or any buildout-based application, so these stats are not reflecting ‘who’ did a download.
But there’s some work going on for this
Cheers
October 24, 2008, 12:46 amSeo Sanghyeon:
Entire Debian Popularity Contest raw data is available at http://popcon.debian.org/
October 24, 2008, 2:08 amChristopher Arndt:
These statistics are heavily skewed because some packages make more intensive usage of the PyPI infrastructure for installation than others.
For example, I suspect the reason that kid is so high in this list is because it is downloaded whenever TurboGears 1.x is installed via easy_install/tgsetup.py and it is one of the few packages that are not hosted on turbogears.org itself.
October 24, 2008, 4:28 amHeikki Toivonen:
Thank you for comments everyone, useful information I was not aware of.
October 24, 2008, 8:09 ammike bayer:
this is also only taking the most recent version of each project into account. So projects that have been released more recently are bumped down. There’s no temporal element to the number of downloads in general so its a fairly useless metric overall.
October 24, 2008, 12:00 pmkevin gill:
Great script, thanks.
There are now thousands of packages on PyPI, over 700 zope related. It is impossible (for me) to track new useful stuff coming out. This script is a great help.
However, I agree with you comments – PyPI should be extended to provide this kind of data easily.
October 26, 2008, 2:04 pm