Archive for the ‘Python’ Category.

M2Crypto Has New Maintainer

I am excited to announce that M2Crypto has a new maintainer and homepage! As of last summer, Martin Paljak agreed to take over the project since I had not been able to dedicate enough time to keep things going.

Martin is a long time M2Crypto contributor and was a security expert even before his contributions to M2Crypto. He has probably forgotten more about software security than I ever knew, so I know the project will be in safe hands.

There is also a new version of M2Crypto available on PyPI so go check it out!

Beware of cPickle

The Python pickle module provides a way to serialize and deserialize Python objects. A large downside of the pickle format is that it is not secure, meaning you should not deserialize pickles received from untrusted sources.

There is also a cPickle version of the pickle module which implements the algorithm in C and is much faster than the pure Python module. This provides somewhat surprising use cases for the cPickle module besides the obvious application save format: it turns out cPickle can be the fastest way to make a copy of nested structures. Due to speed, using cPickle can also be attractive as a data format between trusted servers.

There is an issue that you need to watch out for in the cPickle module, though. When you are serializing to or deserializing from string using the dumps and loads functions respectively, the functions do not release the GIL! This took me by surprise: I did not expect anything in the stdlib to hold on to the GIL for anything that could potentially take a long time. You can try this out easily by creating a multithreaded application where one thread tries to use cPickle.dumps on multimegabyte data structure while the other treads are printing to screen for example. You will see that while dumps is running, the other threads are stopped.

Luckily there is an easy workaround: you can use the load and dump functions with cStringIO buffer or other file-like objects.

Note that I haven’t checked if this problem applies to Python 3.x.

Decorator to Log Slow Calls

One of the most common examples given for Python decorators is a decorator that tracks how long the execution of the wrapped function took. While this is very useful in and of itself, it can cause issues when you want to apply that into production usage.

The issue I faced was that when I was trying to find out why my servers were too slow (only under production loads), I first added the simple timing decorators to everything I thought might be slow in the hopes of catching all the slow calls and maybe finding some patterns. Well, this approach worked in the sense that I did find the slow parts quickly, but it was producing much more logs than before, and I wasn’t really interested in most of this timing information.

What I really wanted was a timing decorator that would log only when the wrapped callable took too long to execute. But there were still some calls that I wanted to log always for accurate statistical purposes. I figured the best way was to make my decorator take a threshold argument with some reasonable default that I could override if I wanted.

Now while I have written decorators before, this was the first decorator that called for optional arguments. Python treats decorators that don’t take any arguments slightly differently from those that require arguments, so this complicates things a bit. The sample in Python decorator library is almost scary! I think my approach is nice and simple yet fairly sophisticated:

from time import time
import logging
import functools
 
log = logging.getLogger(__name__)
 
def time_slow(f=None, logger=log, threshold=0.01):
    def decorated(f):
        @functools.wraps(f)
        def wrapper(*args, **kw):
            start = time()
            try:
                ret = f(*args, **kw)
            finally:
                duration = time() - start
                if duration > threshold:
                    logger.info('slow: %s %.9f seconds', f.__name__, duration)
            return ret
        return wrapper
    if f is not None:
        return decorated(f)
    return decorated

This decorator can be placed on a callable without any arguments, or with a custom logger or threshold value. In other words, both of these would work:

@time_slow
def myfast():
    pass
 
@time_slow(threshold=0)
def myslow():
    from time import sleep
    sleep(0.001)

and typically calling myslow only would produce log output. I chose 0.01 as a reasonable default threshold, but this of course depends a lot on the use case. The log includes the slow function’s name, as well as the time formatted with 9 decimals in order to avoid the exponential notation, which makes it easier to work with the log output (sort -n, for example). I have just used a single threshold, but an easy improvement would be to pass a list of thresholds and log at different levels depending on the duration.

M2Crypto Supports OpenSSL 1.0.x

I was supposed to release new M2Crypto version in the summer of 2010 but “real life” got in the way, and this extended all the way until this week. I finally decided that I’d better push out a new release even though there was just one significant change: OpenSSL 1.0.x support. However, I felt this was really important since OpenSSL 1.0.x has been out for almost a year now, and it is starting to get difficult to deal with software that does not work with pre-1.0.x.

Unfortunately I made a mistake in my first release to PyPI: I used the setup.py commands to build, sign and upload a source distribution, but I did this from a tree I had svn exported. Due to the way the M2Crypto setup.py was constructed this meant that the tarball was lacking vital files. Yesterday I did a new 0.21.1 release from the Subversion checkout, so the tarball now includes everything.

SSL in Python 2.7

It has been almost two years since I wrote about the state of SSL in Python 2.6. If you haven’t read that yet, I suggest you read that first and then continue here, since I will mostly just be talking about things that have changed since then, or things that I have discovered since then.

The good news is that things have improved in the stdlib ssl module. The bad news is that it is still missing some critical pieces to make SSL secure.

Python 2.7 enables you to specify ciphers to use explicitly, rather than just relying on what comes default with the SSL version selection. Additionally, if you compile the ssl module with OpenSSL 1.0 and later, using ssl.PROTOCOL_SSLv23 is safe (as in, it will not pick the insecure SSLv2 protocol) as long as you don’t enable SSLv2 specific ciphers (see the ssl module documentation for details).

Servers

With that out of the way, there isn’t really much difference to how you would write a simple SSL server with Python 2.7 compared to what I wrote in 2008. If you know your ssl module was compiled with OpenSSL 1.0 you can pick ssl.PROTOCOL_SSLv23 for maximum compatibility. Otherwise you should stick with an explicit version other than v2.

The documentation for the ssl module in 2.7 has improved a lot, and includes good sample code for servers here.

The M2Crypto code hasn’t changed. The next M2Crypto release will add support for 0penSSL 1.0.

Clients

Checking the peer certificate’s hostname is still the weak point of the ssl module. The SSL version selection situation has improved slightly like I explained above. Othewise follow the example I wrote in 2008.

Again, the M2Crypto API hasn’t changed.

Lately I have been working with pycurl at Egnyte, so I decided to give a client example using that module.

import pycurl
 
c = pycurl.Curl()
c.setopt(pycurl.URL, 'https://www.google.com')
c.setopt(pycurl.HTTPGET, 1)
c.setopt(pycurl.SSL_VERIFYPEER, 1)
c.setopt(pycurl.CAINFO, 'ca.pem')
c.setopt(pycurl.SSL_VERIFYHOST, 2)
 
try:
    c.perform()
finally:
    c.close()

I am not a big fan of pycurl due to difficulties getting it compiled and the non-Pythonic API. But it is based on the very powerful curl library, so it comes full featured out of the box.

Other Resources

Besides the Python crypto libraries capable of doing SSL that I mentioned in my SSL in Python 2.6 article, I have found pycurl. Another find in the Python crypto front is cryptlib.

Mike Ivanov wrote a great series about crypto in Python: part 2, part 3 (link to part 1 seems to have rotted). Mike also produced a comparison of different Python crypto libraries (PDF).

The future is also looking bright for the ssl module. Already the upcoming Python 3.2 ssl module will include support for SSLContexts so that you can set options for multiple SSL connections at once, allows you to selectively disable SSL versions, and it allows you to check the OpenSSL version as well.

baconrank.com Powered by Turbogears 2

The baconrank.com website was written with the Turbogears 2 web application framework. It is pretty lightly modified from the quickstart project.

In the UI front the quickstart project used a lot of images to make rounded corners and so on to make everything look good in cross-platform way. I dislike this practice, so I decided to see if I could take the basic layout and not use any images for corners and backgrounds and the like. I think the result looks pretty good in modern browsers, including the default browser in Android 2.1. border-radius and -moz-linear-gradient did the bulk of the work. Which reminds me that I forgot to look for the equivalent of -moz-linear-gradient for other browsers…

The template language I used was Genshi, both because it is the default that comes with TG2 and the quickstart project, but also because I wanted to try out another template language as well. The thing I like about Genshi is that it guarantees I get well-formed output. However, its use of XPath can be a pretty steep learning curve to someone who doesn’t know XPath (like me). I did not see how to control the whitespace around elements: in some cases I have elements one on each row, other times they end up on the same row. I think I also could not get a comment to appear at the end of the document (to put in page rendering time, for example, although it turned out you’d need to hack tg2 itself to get the rendering time – unfortunately I lost the link).

I originally had bigger plans for the application, including forum, password recovery and so on, but canned those plans for now because I realized I could not find enough time to do all those and ship something before the summer. Part of that I did get in, although it is only exposed to the admin user. Being the security-minded geek that I am, I hacked the auth code to use bcrypt. I also learned about doing password comparison in a way that does not leak information due to timing.

I designed the small protocol that the Android client uses to communicate with the server. Most operations are covered by the following:

Initial sync:

>>>request<<<
POST /users
@JSON@

>>>response<<<
201
Location: /users/123

>>>request<<<
GET /users/123

>>>request<<<
200
@JSON@

Add bacon:

>>>request<<<
POST /users/123
@JSON@

>>>response<<<
200
@JSON@

Incidentally, if you'd like to write another client (iPhone anyone?), let me know...

I wanted to wrap the client API in its own class, but could not get that to work as a separate class so had to just settle with a method that does way too too much. Something I hope I can visit later.

I really wanted to like Turbogears 2 since it is based on my favorite, Pylons, but I actually felt that TG2 was getting in my way almost as often as it was helping me. I am happy that I did not have to write the auth code from scratch, and having the admin area almost as easy as in Django were big wins, but since I did not yet implement the forum (where I think I could really have used more of tg2's strengths) I would probably have been better off with plain Pylons. The documentation has been in flux the whole time I've been working on my app, some things working, some broken, and I had to resort to comments at the end of docs and doing web searches to try and figure things out. There were periods of time when the dependent packages were out of sync, and you could not even get a working quickstart project without setting version limits. I think that the controller methods not returning the actual string to render can be potentially big headache (for me, I just could not get the timing information output into the end of the returned HTML). I'd say TG2 has potential to be great, but right now things feel a little bit too unfinished. I know 2.1 will fix some of the issues I run into, but I don't know if anyone is looking at checking and cleaning up the documentation.

Bacon Rank Released

I am pleased to release Bacon Rank, the application you have been waiting for to count your bacon! (You may not have realized this, of course. I forgive you.)

I’ve been working on this application combo since last November, a few hours here and there. The server piece is a Turbogears 2 application, while the client piece is an Android application. As a whole, they form the Bacon Rank application ecosystem. The idea is that whenever you eat bacon, you submit the number of strips of bacon you ate, using the Android client, to the server. The server returns your statistics, including your rank compared to other users of the system. It is clearly a not very serious application, but it presented some interesting programming challenges.

I’ve been going through various Python web frameworks, and this time I wanted to learn about Turbogears 2. There isn’t anything especially groundbreaking about it, but I think it marks the first time I am actually running a real service. Of course it is also tied to the Android application, which makes it interesting.

The Android application is a bit more ambitious from engineering point of view. In the UI the trickiest piece was a customized SeekBar: uncooked bacon that becomes cooked as you move a skillet over it. I was also able to finally figure out how to make a background network request survive screen orientation change gracefully.

I am also experimenting with Twitter for the first time as a communications channel for the project.

In the next two posts I will explore the server and client pieces in more detail.

New Adventures

This news is a couple of weeks old, but I thought better late than never… For the past couple of years I was working at SpikeSource, on a variety of things ranging from Pylons and CherryPy applications to designing RESTful APIs and client libraries and even CakePHP and Java web applications (horror of horrors ;) ). It was never a good fit culturally, though, so just before my two year anniversary came around I decided I needed a change.

I decided to be very focused on my search: Python web application development, startups, and ideally located from Sunnyvale to Palo Alto. I used mostly craigslist.org and startuply.com for my search, with a couple of hints from friends. I applied to eight jobs, and got five interviews. I was quite surprised that there were so many companies matching my preferences within such a small area hiring at the time when I was looking! I was also pleasantly surprised that I got into as many interviews as I did; last time around I had a bit worse success rate. Hint: requirements aren’t necessarily requirements in job postings.

Unfortunately I learned I am bad at interviews aimed at Python web application developers. For some reason I find coding questions a lot easier to answer when they are in C. I was also badly prepared to answer a few questions related to web development; having not thought about something for a while makes a bad interview impression. Note to my future self: start practicing at least a week before interviews, practice in Python on problems designed to be solved in Python (maybe Python challenges, Facebook programming puzzles and the like), and check with friends for some common web developer questions. Assuming I were to look for that kind of job next time around, of course.

To make long story short, I decided to join Aahz at Egnyte. Egnyte provides cloud storage, backup and sharing services. My first task involves integrating new UI design to a CherryPy application using Cheetah templates. I am also working on a Mac after many years on Linux.