Pulling Android Market Sales Data Programmatically

Android Market handles sales through Google Checkout. I haven’t tried selling anything else online before, but what this setup provides for me as the seller leaves a lot to be desired. One issue you will have trouble with is getting the data needed to file taxes.

Google provides a Google Checkout Notification History API that lets you programmatically query sales data. For my purposes the API requests are really simple: just post a small XML document with the date range I am interested in, get back XML documents that contain my data. If there is more data that fits in a single response, look for an element that specifies the token for the next page and keep pulling until you get all data.

Below is a really simple Python script that uses M2Crypto to handle the SSL parts for the connection (needed since Python doesn’t do secure SSL out of the box). You will also need to grab certificates. You should save the script as gnotif.py, save the certificates as cacert.pem and create gnotif.ini as described in the script below all in the same directory. When you execute it, it will ask for start and end date (in YYYY-MM-DD format) and then fetch all the data, saving them in response-N.xml files, where N is a number.

#!/usr/bin/env python
# Script to query Google Checkout Notification History
# http://code.google.com/apis/checkout/developer/Google_Checkout_XML_API_Notification_History_API.html
 
# Supporting file gnotif.ini:
#[gnotif]
# merchant_id = YOUR_MERCHANT_ID_HERE
# merchant_key = YOUR_MERCHANT_KEY_HERE
 
import base64
import re
from ConfigParser import ConfigParser
 
from M2Crypto import SSL, httpslib
 
ENVIRONMENT = "https://checkout.google.com/api/checkout/v2/reports/Merchant/"
XML = """\
<notification-history-request xmlns="http://checkout.google.com/schema/2">
%(query)s
</notification-history-request>
"""
 
config = ConfigParser()
config.read('gnotif.ini')
MERCHANT_ID = config.get('gnotif', 'merchant_id')
MERCHANT_KEY = config.get('gnotif', 'merchant_key')
 
rawstr = r"""<next-page-token>(.*)</next-page-token>"""
compile_obj = re.compile(rawstr, re.MULTILINE)
 
auth = base64.encodestring('%s:%s' % (MERCHANT_ID, MERCHANT_KEY))[:-1]
 
ctx = SSL.Context('sslv3')
# If you comment out the next 2 lines, the connection won't be secure
ctx.set_verify(SSL.verify_peer | SSL.verify_fail_if_no_peer_cert, depth=9)
if ctx.load_verify_locations('cacert.pem') != 1: raise Exception('No CA certs')
 
start = raw_input('Start date: ')
end = raw_input('End date: ')
 
data = XML % {'query': """<start-time>%(start)s</start-time>
<end-time>%(end)s</end-time>""" % {'start': start, 'end': end}}
 
i = 0
 
while True:
    c = httpslib.HTTPSConnection(host='checkout.google.com', port=443, ssl_context=ctx)
    c.request('POST', ENVIRONMENT + MERCHANT_ID, data,
             {'content-type': 'application/xml; charset=UTF-8',
              'accept': 'application/xml; charset=UTF-8',
              'authorization': 'Basic ' + auth})
 
    r = c.getresponse()
 
    f=open('response-%d.xml' % i, 'w')
    result = r.read()
    f.write(result)
    f.close()
 
    print i, r.status
 
    c.close()
 
    match_obj = compile_obj.search(result)
    if match_obj:
        i += 1
        data = XML % {'query': """<next-page-token>%s</next-page-token>""" % match_obj.group(1)}
    else:
        break

As you take a look at the data you will probably notice that you are only getting the sale price information, but no information about the fees that Google is deducting. Officially it is a flat 30%, but I have found out a number of my sales have the fee as 5%. So we need to get this information somehow. Luckily you can toggle a checkbox in your Google Checkout Merchant Settings. Unfortunately there is a bug, and the transaction fee shows as $0 for Android Market sales. I have reported this to Google, and they acknowledged it, but there is no ETA on when this will be fixed.

I also haven’t found any way to programmatically query when and how much did Google Checkout actually pay me. (I can get this info from my bank, but it would be nice to query for that with the Checkout API as well.)

Last but certainly not least, working with the monster XML files returned from Google Checkout API is a real pain. If someone has a script to turn those into a format that could be imported into a spreadsheet or database that would be nice…

2009 Statistics

Looking back at my website traffic in 2009 shows pretty amazing growth:

year	#reqs	#pages
2006	2362	1657	+
2007	14982	9867	+
2008	335897	227906	++++++++++++
2009	1149221	826814	++++++++++++++++++++++++++++++++++++++++++

Compared to 2008, I blogged much more frequently. I actually tried to post at least once a week, but couldn’t quite keep up during the second half. I also put up the M2Crypto API documentation, which gets a fair bit of traffic.

The most popular 2009 blog post was Multicolumn ListView in Android with about 5,100 page views according to Google Analytics. Most of my posts dealt with Python, however, and I posted several snippets of code that touched projects like Django, CherryPy, boto (Python interface to Amazon Web Services), Fabric and so on.

I continued to maintain Caltroid, the first Caltrain schedule application for Android. I also released three other Android applications: d20 Ability Calculator, 20 Ability Calculator Lite and ChandlerQE for Android. All in all I made about $250 on Android software sales (gross), but since the expenses were less I actually made a slight profit. Hourly pay would counted in pennies, though.

I continued to maintain CaltrainPy and CaltrainJS as well. Other projects that I did maintenance releases on were M2Crypto.

I ported Office Resource Locator to Django.

I created a little encryption library called m2secret in Python.

2009 Donations

Time for my yearly donations to Open Source projects I use. As I mentioned when I launched ChandlerQE for Android I planned to donate 20% of the sales to OSAF. Here are the statistics:

Market           Price  Copies
==============================
Android Market   $1.99  4
Android Market   $0.99  6
SlideMe          $1.99  0
SlideMe          $0.99  3
==============================
TOTAL                   $16.87
20%                      $3.37

Somewhat of a disappointment, but understandable given that nobody seems to be working on Chandler code anymore. But I sent the 20% to OSAF anyway.

I decided to donate to three other projects this year:

Like last year, I would have wanted to donate to Mozilla Thunderbird, Pidgin, Enigmail and Pylons but didn’t see any ways to donate directly to them.

Buildbot Slave on Windows XP

Today I installed Buildbot on Windows XP and even got it to run successfully as build slave. It was both harder and easier than I expected. Or more specifically, installing all the dependencies was harder than I thought, but configuring and running the slave was easier than I thought.

I have run Buildbot server and slaves on Linux based systems without any problems. Once you have setuptools installed, it is just a matter of easy_install buildbot and creating the configuration files. I thought once I had Python and setuptools installed on Windows it would be equally simple. Not so.

First of all I tried to use somewhat nonstandard Python 2.5, for which there was no registry entry. This meant that none of the dependencies that had their own executable installers worked. When I tried the zipped versions, these also failed because apparently my Python was compiled with different version of Visual C++ than the extensions I was trying to install. I got past this by grabbing a standard Python installer and going from there.

Next I tried to install setuptools from the exe installer. That failed because it could not find msvcrt71.dll. I downloaded that and put one in Python\DLLs directory and one if the directory from which I was trying to run the installers. Then I was able to get setuptools installed.

After I modified PATH to include the Python and Python\Scripts directories I tried to use easy_install, but it failed to install any package that contained native code. I also added .py to PATHEXT as described in the Buildbot README.w32 file. I continued by grabbing the exe installers of the dependencies and running them one by one. Note that this might be too long, but I found these from various pages describing how to run Buildbot on Windows: pywin32, zope.interfaces, twisted, pycrypto and pyopenssl. I used the latest stable release of each.

Next step was buildbot itself. I downloaded the sources, unpacked, and run python setup.py install.

For a change, easy_install worked for the next two packages I wanted to use for the actual tests: easy_install nose, easy_install coverage.

buildbot create-slave command worked fine, but when I first tried to start my slave it would just sit there without doing anything nor giving any errors. It turned out I had given wrong machine name as the buildmaster. I would have expected an error when trying to start. Once I fixed that, the buildbot slave was running correctly. (Note that I have not yet tried to run it as service.)

Finally it was just a matter of fixing broken tests and installing any dependencies the code I was testing had (like .NET 3.5). Feels good having some assurance that the changes I make on Linux won’t hose the Windows world.

Secure Password Scheme for Turbogears2 Application with repoze.who and bcrypt

I am working on a little Turbogears2 application, and wanted to use the repoze.who and repoze.what packages that integrate nicely with Turbogears2 and have gathered fair amount of positive feedback. It seems usage is simple, but I was quite disappointed in how password hashes are done by default, especially after I had read Thomas Ptacek’s educational rant about secure password schemes.

It turns out by default the authentication code that gets generated for you if you tell paster quickstart you want authentication is the weak kind of way Thomas warns about. Luckily the correct way he advices is easy to put in by using the py-bcrypt module. If you want pointers, see repoze.who Issue 85, or the py-bcrypt homepage.

Unfortunately py-crypt is kind of annoying to put in dependencies. The pypi entry is named bcrypt, but the download page links to py-bcrypt, so setuptools machinery can not find the right package. This can be worked around relatively easily by adding the py-bcrypt download link to setup.cfg:

[easy_install]
find_links = http://www.pylonshq.com/download/
             http://www.mindrot.org/projects/py-bcrypt/
# ...

and then in setup.py you do something like this:

#...
 
install_requires=[
    "TurboGears2 >= 2.0.3",
    "Catwalk >= 2.0.2",
    "Babel >=0.9.4",
    #can be removed iif use_toscawidgets = False
    "toscawidgets >= 0.9.7.1",
    "zope.sqlalchemy >= 0.4 ",
    "repoze.tm2 >= 1.0a4",
    "repoze.what-quickstart >= 1.0",
]
 
try:
    import bcrypt
except ImportError:
    install_requires.append("py-bcrypt >= 0.1")
 
setup(
    install_requires = install_requires,
 
# ...

Now when you go deploy your TG2 app with easy_install or similar tools, it will download the py-bcrypt package the first time, and won’t bother you if it already exists.

Testing CherryPy 3 Application with Twill and Nose

I’ve been working on a CherryPy application for a few days, and wanted to write some tests. Surprisingly I could not find any tutorials or documentation on how I should test a CherryPy application. Unfortunately I also missed the last section on CherryPy Testing page; why is CherryPy application testing added as an afterthought? Wouldn’t it make more sense to start the testing section on how people can write tests for their CherryPy applications, rather than first explaining how to test CherryPy itself? Of well, at least I learned something new…

Since I got tests working with Twill first I decided to document my experience, and switch to the CherryPy way later if it makes more sense. The CherryPy Essentials book apparently has a section on testing, so reading that would probably clarify a lot of things.

There is a brief tutorial on how to test CherryPy 2 application with twill, but the instructions need some tweaking to work with CherryPy 3.

On Ubuntu 8.04 I first created a virtualenv 1.3.1 without site packages. I am running Python 2.5.2, and I have the following packages installed in the virtualenv: setuptools 0.6c9, CherryPy 3.1.2, twill 0.9 and nose 0.11.1. The additional packages were installed with easy_install.

My directory structure is as follows:

hello.py
tests/
  __init__.py
  test_hello.py

hello.py contents is simply:

import cherrypy
 
class HelloWorld:
    def index(self):
        return "Hello world!"
    index.exposed = True
 
if __name__ == '__main__':
    cherrypy.quickstart(HelloWorld())

Running python hello.py will start the web server and I can see the greeting in my browser at URL http://localhost:8080.

The tests directory has two files. __init__.py is empty. The test_hello.py follows closely the tutorial by Titus, but modified to work with CherryPy 3. The CherryPy 3 Upgrade instructions and CherryPy mod_wsgi instructions showed the way.

from StringIO import StringIO
 
import twill
import cherrypy
 
from hello import HelloWorld
 
class TestHelloWorld:
    def setUp(self):
        # configure cherrypy to be quiet ;)
        cherrypy.config.update({ "environment": "embedded" })
 
        # get WSGI app.
        wsgiApp = cherrypy.tree.mount(HelloWorld())
 
        # initialize
        cherrypy.server.start()
 
        # install the app at localhost:8080 for wsgi_intercept
        twill.add_wsgi_intercept('localhost', 8080, lambda : wsgiApp)
 
        # while we're at it, snarf twill's output.
        self.outp = StringIO()
        twill.set_output(self.outp)
 
    def tearDown(self):
        # remove intercept.
        twill.remove_wsgi_intercept('localhost', 8080)
 
        # shut down the cherrypy server.
        cherrypy.server.stop()
 
    def test_hello(self):
        script = "find 'Hello world!'"
        twill.execute_string(script, initial_url='http://localhost:8080/')

Now you’d expect that this would work by simply running nosetests command. Mysteriously I got import error on twill (and after I removed the line, also import error on cherrypy). I looked at sys.path which showed that I was somehow picking up the older nosetests I had installed into system Python. which nosetests claimed it was finding the virtualenv nosetests. Still, I had to actually give the explicit path to my virtualenv nosetests before the tests would run without import errors.

All in all testing CherryPy applications turned into a longer adventure than I anticipated. I run into a number of unexpected difficulties, but I finally got it working and learned about twill as a bonus. Thanks for the tip, JJ!

Turbogears2 on Dreamhost

It has been almost two years since I tested Turbogears 1 on Dreamhost. Back then it was quite difficult for me to get it running. But some additional personal experience and improvements in Turbogears2 have made it a breeze. I tested with Turbogears 2.0 although I upgraded to 2.1a2 at some point.

First you need to get virtualenv installed, which is pretty simple after you have downloaded and unpacked the source tarball: python2.5 virtualenv.py $HOME. (I wanted it installed in $HOME, but you could use alternative locations as well.) This will install setuptools, but somehow not virtualenv. Then you just do easy_install virtualenv. You will also need PasteDeploy so do: easy_install PasteDeploy.

Next steps might be different for installing a Turbogears2 egg/application, but I used these instructions to install the wiki-20 tutorial in development mode. (To install a properly packaged app you probably just need to do: easy_install app_tarball; paster make-config yourapp production.ini and follow the instructions from FastCGI onwards.)

After that you just follow tg2 automatic installation instructions.

Then use paster quickstart to create a new project template. cd to the created directory, and run python setup.py develop to download any missing dependencies and set things up for debugging and development.

Edit as instructed in the tutorial. Then python setup-app development.ini.

After that it is time to create the production ini: paster make-config Wiki20 production.ini.

Next step is getting this running with FastCGI. Create wiki20.fcgi in the webroot directory:

#!/home/your-username/path/to/tg2env/bin/python
 
from fcgi import WSGIServer # you could also use flup etc.
 
from paste.deploy import loadapp
real_app = loadapp('config:/home/your-username/path/to/production.ini')
 
def myapp(environ, start_response):
    environ['SCRIPT_NAME'] =  # get rid of the .fcgi in urls
    return real_app(environ, start_response)
 
server = WSGIServer(myapp)
server.run()

There are a couple of points of note here:

  • I am using fcgi.py which seems to be slightly more reliable than flup on Dreamhost. You’d better edit fcgi.py so that you won’t show private information to everyone in case of errors.
  • There is a trick to get rid of the .fcgi part from the URLs.

Next we’ll need a .htaccess file:

# Enable Dreamhost stats
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} ^/(stats|failed_auth\.html).*$ [NC]
RewriteRule . - [L]
</IfModule> 
# FastCGI
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^wiki20\.fcgi/ - [L]
RewriteRule ^(.*)$ wiki20.fcgi/$1 [L]
</IfModule>

Now when you go to your site, first time it is going to take a while to load your app, but after that things will be snappy as long as the app stays in memory.

Caltroid 1.3 Supports Small Screens

Not a big update to Caltroid at this time, but I wanted to get out the release because Caltroid now works in 320×240 resolution (QVGA), and there are even devices out with that resolution, like the HTC Tattoo. Of course bigger screens will work as well (I’m looking at you, Verizon Droid). The other minor user visible change is that when you locate the nearest station with GPS, the destination station will stay the same whenever possible.

Under the hood I fixed the spaghetti layout into pretty clean RelativeLayout affair, except for the red strip container which is LinearLayout so I could use layout weight to size the departure and destination stations the same. Apparently this can be a performance hit, but it is probably overall faster than it used to be. (Yeah, bad me, I didn’t measure.)

Figuring out how to make an apk that would work in all SDK versions took some time as well. I ended up creating layout-320x240 and layout-land-320x240 directories to put in the QVGA specific changes needed. I should change to layout-small instead, but for some reason I could not get this to work. I also added android:targetSdkVersion="4" into AndroidManifest.xml and built with 1.6 SDK.

Final change in place is that I exported using the wizard in the Eclipse plugin, which should zipalign the apk and make the app use less memory.

Oh, I should note that Caltroid passed 100 users last month. Thanks to everyone who purchased it, and made it the best rated Caltrain schedule application for Android!