Allowing Safe HTML in Pylons Applications

I have again had some free time to work on horsetrailratings.com. I should be able to put the 0.2 version online by the end of the week.

One of the major undertakings has been changing the form validators to allow any Unicode text, and for the textareas to additionally accept a safe subset of HTML. Unicode was simple once I realized I needed to change my SQLAlchemy types to Unicode and UnicodeText (from String and Text); otherwise I would get errors trying to load strings back from MySQL. Internal strings were already in Unicode, since by default Pylons coerces strings into that, which is nice.

Simple text fields that are not supposed to contain any HTML I simply HTML escape with Mako. The downside is that I have to be careful to always remember to do that; potentially unsafe strings are saved in the database. I might change my strategy and run these strings also through BeautifulCleaner (see below) before storing them. Theoretically Mako Templates can be configured so that it HTML escapes by default, but I was unable to make that work. The documentation also states this is a serious performance hit.

Safe HTML for textareas on the other hand was more complicated. I looked at the HTML cleaner in feedparser, and some others, but nothing seemed easy to use yet configurable and with proven technology. But when I asked about this on #pylons IRC channel, a person pointed me to BeautifulCleaner he was working on (not yet on CheeseShop), and I put that in. It uses the same stuff that is in feedparser, but it also uses BeautifulSoup and it is all packaged with a nice, simple API. What it does it basically HTML strips unsafe constructs (script tags etc.). Well, it doesn’t do that automatically (no Pylons integration), but I created a custom text validator which hooks BeautifulCleaner in for this. Maybe not a job for a validator exactly, but it fits in nicely like that. I also created a helper to use in Mako Templates, although this could be considered a bit overkill (filtering both incoming and outgoing strings). While nice, I wish the BeautifulCleaner defaults were safer; it allows for style and images by default. There is also no way to configure what attributes to whitelist or blacklist, which can be done for elements. I think things like class attributes should be filtered, because they can mess with the site layout. I want HTML that will have no access to scripts or network, and which cannot mess up the site layout.

As I was then allowing HTML in textareas it was time to look for nice WYSIWYG editors. JJ recommended tinyMCE, and I put that in. One problem I am still battling with is that I seem to be unable to put any paddings or margins around the tinyMCE editor using CSS.

Similar Posts:

    None Found