tech.agilitynerd.com

scratching that itch... 

Reducing the Cost of Client Side Analytics

I read Andy McKay's blog post on timing user experience on the Mozilla Webdev blog the other day and it reminded me of an idea I was thinking about for measuring client side timings at work. I had been toying with the idea of rolling our own library to capture JavaScript rendering time for our JS heavy pages (grids of hundreds of lines of data).

Andy's post mentions the boomerang JavaScript library and when I was reading it's docs they pointed out potential for abuse/load on the URL used to report the timings. For each instrumented page boomerang can hit the "beacon" URL to report the statistics it collects. So in the worst case you could double your page hits - although for specific pages/samples recording a few statistics shouldn't be too costly for low volume sites.

One solution is to only sample the pages/users of interest; selecting the sample could occur on the server and/or client. But another solution would be to collect statistics across multiple pages and periodically send batches of analytics to the beacon URL.

I've been playing with mobile web development for agilitycourses.com lately and will soon let users store the courses they create in localStorage on their browser. That got me thinking that sessionStorage could be used to store analytics across pages and then periodically send the stats to the server. This would reduce the number of hits on the beacon, allowing deployment to a larger sample of clients. It also gets flushed once the session is closed and (if kept small) doesn't prompt the user to approve storing the data.

A lot of modern browsers support session storage and for my purposes only ones with support would be relevant - due to our browser support policy at work.

The other problem the boomerang docs discuss is abuse of the beacon (accidental or malicious). A solution would be to piggyback reporting of analytics into application form post payloads. This is trickier to implement and it suffers from coupling analytic reporting into the application itself.

To try to solve it some what generally... The client side JS library could add a hidden field to any/some/specific forms into which it writes the analytics data collected thus far. If it hooked the form submit callback it could know if the form was successfully submitted and clear the session storage.

Server side middleware could detect the hidden analytics field in the form and extract/store the data. It could also remove the field before passing the request data along to the app server. 

All in all a fair amount of twiddling to keep from exposing a recording URL to the outside world. 

Of course if an authenticated session was being used then abusers would have to have a valid session to post to the beacon URL.

I don't know if I will have time to play with the sessionStorage idea but I think it might be a worthwhile extension to boomerang or other client side analytics capture libraries.

Filed under  //   analytics   boomerang   javascript   sessionstorage   web development  

Comments [0]

YellowGrass - Web Based Issue Tracking

I was doing some reading on mobl and saw that they are using a free web based service called YellowGrass for issue tracking. It has some nice features and seems easy to use. Everything is tag based. I think I'll try to use it for tracking enhancements to agilitycourses.com.

Filed under  //   development   issue-tracking  

Comments [0]

Django Shrink The Web django-stw 0.2.0 Released

Shrink The Web has announced a new API for free users using their new preview verification feature. This change required changes to my django-stw package.

The changes (lifted from the CHANGELOG.txt):

Changes to the shrinkthewebimage template tag:

  • The shrinkthewebimage template tag is NOT backward compatible with version 0.0.1. The alt argument is no longer accepted.
  • The shrinkthewebimage template tag is now intended for use by free accounts, it adds the required preview feature. It can also be used by PRO account users wanting the preview functionality.
  • The shrinkthewebimage template tag now accepts PRO key-value arguments in the same manner as the stwimage tag. This functionality is shown in theexample template but may not yet be fully implemented by the STW web service.

Changes to the stwimage template tag:

  • The stwimage can now only be used for PRO features.

Common changes:

  • Template tags now throw exceptions in their constructors instead of in the render function so configuration errors are visible during development.
  • django-stw defines a key 'lang' for the SHRINK_THE_WEB dictionary that can be passed along as a default to the preview tag. Alternately a 'lang' keyword can be supplied in each template tag invocation. django-stw defaults it to 'en'. This functionality is not yet implemented by the STW web service.

The v 0.2.0 package is available on PyPi, as a source download on github, or via git clone.

Filed under  //   django   shrink the web  

Comments [0]

Python dict.get's Default Value is Always Evaluated

This is a gotcha I ran across in some production code that is obvious in retrospect. I was profiling the code to find places where we were calling "an_expensive_database_function" and came across code like this:

def doit(*args, **kwargs):
    value = kwargs.get('key', an_expensive_database_function())

The original author probably assumed that if 'key' was present in the kwargs dictionary an_expensive_database_function wouldn't be called; that it would be short circuited in the same manner as Boolean expressions. But since get is a function the arguments are always evaluated on the way into the function. So in this case even if the value of an_expensive_database_function was already present in the kwargs dictionary the database function would be called again.

Here is a "look before you leap" solution:

def doit(*args, **kwargs):
    value = kwargs.get('key')
    if value is None: 
        # assuming default value None isn't a valid value
        value = an_expensive_database_function()

Here is the "easier to ask forgiveness than permission" solution:

def doit(*args, **kwargs):
    try:
        value = kwargs['key']
    except KeyError:
        value = an_expensive_database_function()

Filed under  //   python  

Comments [0]

Blosxom Plugin for Generating Facebook Comment xids

I've been using Blosxom to power my dog agility blog for over 6 years. In the past year or so I've enabled Facebook comments in addition to my site's own comment plugin. I ran into a problem using Facebook's comments: if the user enters a comment on a page and the URL has any additional URL parameters then the comment is only associated with the page when accessed with those parameters, others hitting the page w/o parameters won't see the comments.

This behavior is documented by Facebook when the "xid" attribute isn't set in fb:comments HTML element. I didn't think I'd encounter this situation since my blog post URLs don't contain any parameters. However, when people link to one of my articles within Facebook, Facebook appends various parameters to the base URL.

The solution is to specify an xid attribute in the fb:comments element containing the URL encoded URL of the page (Facebook's default xid). This causes existing comments to show up and causes comments created when the page is loaded with URL parameters to use the same encoded URL.

So I created a simple Blosxom plugin to perform the encoding so the encoded URL can be placed in the story.html template:



# Blosxom Plugin: urlencode -*- perl -*-
# Author: Steve Schwarz <http://agilitynerd.com/>

# 2010-NOV-22    0.1 initial version.

package urlencode;
# puts the urlencoded string of the URL for this page
# into $urlencode::url without any params
# use this in fb.comments xid to give the same xid
# even when query params are provided
# -------------------
use CGI qw/:standard/;
use URI::Escape;

sub start {
    return 1;
}

$url = '';

sub story {
    my ($pkg, $path, $filename, $story_ref, $title_ref, $body_ref) = @_;
    $urlencode::url = uri_escape("$blosxom::url/${blosxom::path_info}");
    return 1;
}
1;

Then it is used in the story template:



<fb:comments width="550" numposts="5" xid="$urlencode::url"></fb:comments>

Now my readers won't have to worry that their comments won't show up.

Filed under  //   blosxom   comments   facebook   perl  

Comments [0]

Obtain Short URLs and QR-Codes for Django Apps

Lately I've been interested in improving the interaction of my agilitycourses website for mobile users. One such improvement is to add QR Codes (aka 2D barcodes) representing the page URLs to the printed representations of pages served as PDFs.

I found that developers have reverse engineered the "api" of the goo.gl URL shortening web site. In my brief testing it is very fast. What makes that service extra useful is by adding ".qr" to a shortened URL it returns a PNG image of the QR Code for the shortened URL. That made it perfect for providing both short text and QR Code URL representations for my printed documents.

I threw together a few functions and put them in a module to make it easy to shorten a long URL, obtain the QR Code PNG and store it using Django's Storage functionality:

import os
import urllib
from django.utils import simplejson


def googl_shorten_url(long_url):
    """
    Returns goo.gl shortened url for the provided long_url.
    Code taken from: http://djangosnippets.org/snippets/2220/

    Parameters:

    - `long_url`: the url to supply to goo.gl to be shortened.
    """
    params = urllib.urlencode({'security_token': None, 'url': long_url})
    f = urllib.urlopen('http://goo.gl/api/shorten', params)
    return simplejson.loads(f.read())['short_url']


def googl_qrcode(googl_url):
    """
    Return file containing qr code image file for the given goo.gl url.

    Parameters:

    - `googl_url`: url from which to obtain the qr code.
    """
    return urllib.urlopen(googl_url + '.qr')


def get_url_qr_code_image(long_url, storage, storage_image_file_path=''):
    """
    Return goo.gl shortened url and storage name of qr code corresponding to
    the shortened url for the supplied full url. Contacts goo.gl to shorten
    the supplied long url then downloads and stores the qr code image file
    in the storage instance using the file path and the shortened url name
    as the storage name.

    Parameters:

    - `long_url`: the url to shorten.
    - `storage': a Django storage instance into which to store the qr code
    image.
    - `storage_image_file_path`: file system path to prepend to shortened
    url. This path must exist prior to calling this function.
    """
    try:
        googl_url = googl_shorten_url(long_url)
        qr_file_name = googl_url.split('/')[-1] + '.qr'
        qr_code_name = os.path.join(storage_image_file_path, qr_file_name)
        if not storage.exists(qr_code_name):
            qr_buffer = storage.open(qr_code_name, 'wb')
            qr_buffer.write(googl_qrcode(googl_url).read())
            qr_buffer.close()
    except:
        googl_url = None
        qr_code_name = None
    return googl_url, qr_code_name

Yes, it has a nasty bare try/except. For my uses this is optional functionality so I never want a failure to stop the main functionality of the views that use it. Add exception handling appropriate for your needs.

The main entry point is get_url_qr_code_image(). Here is an example of its use (assuming you save the code in googl.py):

>>> import googl
>>> from django.core.files.storage import default_storage

>>> short_url, qr_code_storage_name = googl.get_url_qr_code_image('http://google.com', default_storage)
>>> short_url
u'http://goo.gl/mR2d'
>>> qr_code_storage_name
u'mR2d.qr'
>>> default_storage.path(qr_code_storage_name)
u'/home/dev/agilitycourses/static/mR2d.qr'
>>> default_storage.url(qr_code_storage_name)
u'mR2d.qr'
>>> 

Hope you find this useful.

Filed under  //   django   goo.gl   python   qr-code  

Comments [0]

Adding pyrsvg to a virtualenv created with --no-site-packages

I set up my development and deployment environments on Ubuntu with virtualenv with the --no-site-packages option to isolate them from packages in the system installation. My application uses pyrsvg and it is installed by default as a system package. Consequently I had to link the shared libraries it installs (w/in gtk) into my virtualenv.

Here are the links I created (workon and cdsitepackages are virtualenvwrapper shell aliases):

$ workon project
$ cdsitepackages
$ ln -s /var/lib/python-support/python2.6/gtk-2.0/rsvg.so .
$ ln -s /var/lib/python-support/python2.6/gtk-2.0/gobject .
$ ln -s /var/lib/python-support/python2.6/gtk-2.0/glib .

Filed under  //   python   rsvg   virtualenv  

Comments [0]

Mobile Web Site Redirects in Django

For the mobile version of agilitycourses.com I wanted to follow the approach Google appears to be using on some of its sites:

  • If the user views agilitycourses.com from a desktop browser they should see the standard/desktop version of the site.
  • If the user views agilitycourses.com from a mobile browser they should be redirected to a mobile domain (m.agilitycourses.com).
  • The mobile version of the website includes a link to the standard version.
  • If the mobile user chooses the standard website they should "stick" on that site and not be redirected to the mobile site.

I wanted to run two different websites but share templates and have the templates and css change for the mobile site. That meant that I'd need to set a variable(s) in the request to use to generate the appropriate HTML. So I found the simplest mobile device detector minidetector and initially used that. I later found Chris Drackett's fork has a number of useful enhancements and switched to it.

But minidetector didn't provide the ability to redirect to another site. I found Scott Newman's article on using multiple templates which had a section on performing the redirect and storing the user's selection in the session. So I forked Chris' minidetector and modified it to include the redirect and session storage. At the same time I decided to store all the minidetector variables into the session and add them, via middleware, to the request so the raw request wouldn't have to be parsed each time. My fork is available here with details on the new configuration options.

I'm using two domains so I can track analytics for the mobile and non-mobile sites separately and allow users to bookmark the desired site's pages. I use Google Analytics (via django-google-analytics) and Awstats for analytics.

Since I'm using two separate domain and sharing everything else I'm using a setup similar to the one described by Dustin Davis. I have a settings.py file and a mobile_settings.py that only overrides the features I need:

from settings import *
SITE_ID = 2
CACHE_MIDDLEWARE_KEY_PREFIX = "m.ac-"

I use a different memcached key prefix so the cached pages for the mobile site don't clash with those for the desktop site.

I setup m.agilitycourses on my server using the same Gunicorn setup I used for agilitycourses.com with the only changes being specifying the --bind address/port and the name of the mobile settings file:

#!/bin/sh

GUNICORN=/home/user/virtualenvs/myapp/bin/gunicorn_django
ROOT=/home/user/source/myapp
PID=/var/run/myapp.pid

if [ -f $PID ] 
    then rm $PID 
fi

cd $ROOT
exec $GUNICORN --bind 127.0.0.1:8001 -c $ROOT/gunicorn.conf.py --pid=$PID $ROOT/mobile_settings.py

If my templates/content start to diverge more significantly between the mobile and desktop sites I may set the TEMPLATE_DIRS differently in the mobile_settings file. Or I can move to Dustin's approach and create a new application containing the urls.py and views.py specific to my mobile deployment. I would think diverging further would call for a refactoring of the common functionality to its own application which could be imported into separate code branches for each domain.

Filed under  //   django   gunicorn   minidetector   mobile  

Comments [5]

Debug Site for Website Redirects By Referer String

I'm adding an "m" subdomain to agilitycourses.com to provide a better mobile browsing experience. I'm using the referrer string in Django middleware (currently using minidetector) to detect whether the client is mobile and redirect them to the mobile site. Since it is likely that some folks will/won't get appropriately redirected I was looking for an easy way for them to tell me when they were incorrectly redirected. I'd need to know their referer string.

A little googling turned up a nice one purpose website: www.whatismyreferrer.com/

Filed under  //   django   mobile   referrer   web development  

Comments [0]

My Favorite ORM and Python Anti-Patterns

At work I was looking at improving the performance of one of our slower web pages. It can be rewarding to find a little piece of code that can be easily optimized. This time there were several functions that were adding 10+ sec to the page in worst case. It wasn't a problem for most clients, but when clients with who are related to many other clients hit the page they'd experience terrible performance. Here's pseudo code for the combination of anti-patterns that caused the problem:

# Projects have users and users are in different organizations 
# (project can contain multiple organization's users)
activeOrganizationProjectUsers = \
    [x for x in project.users \
        if x.active and x.organization == organization]

if activeOrganizationProjectUsers:
    # do something *NOT* using activeOrganizationProjectUsers

There are two main problems with this code:

  1. It ignores the fact the project, users, and organization are backed by an ORM
  2. The list comprehension is being used to find all matching elements when only a single element is needed.

Ignoring the ORM

The code above wouldn't be too bad if these were just lists of objects in memory. But being objects that are instantiated by an ORM a number of database queries will be issued. In this particular case (w/o eager loading across user to the organization table) the following queries where executed:

  1. Join project to user and get all users for the project's id
  2. For each user load their organization (one by one) if the user is active

So in the case where there were hundreds of users on a project there were hundreds of queries executed and hundreds of User and Organization instances were instantiated. Depending on the size of the objects (and the ORM's behavior) it can take "real time" to fetch and instantiate all these large objects.

This code base has this kind of code sprinkled through out it. At one time during it's development the developers were encouraged to treat ORM backed objects as though they were Plain Old Python Objects (POPOs). The developer wouldn't necessarily see the performance degradation using small data sets either. This is one of the reasons why I like to tail the database log (or use django-debug-toolbar if I''m using Django) to see the queries go by.

Using List Comprehensions When a Single Value is Needed

To make this situation worse, the activeOrganizationProjectUsers list wasn't actually used. This is a combination of a Python anti-pattern and the ORM anti-pattern. What was required was to determine if a single active organization user existed.

I believe the original developer(s) used the list comprehension solution in a combination of ignorance and syntactic sugar. They didn't want to write a new function to do the query and put it in the User class so they used the existing class's API. The syntactic sugar was using the list comprehension to get more values than the one that was needed. If this wasn't a (potentially) expensive ORM backed operation the original code could have been:

activeOrganizationProjectUsers = False
for x in project.users:
    if x.active and x.organization == organization:
        activeOrganizationProjectUsers = True
        break

if activeOrganizationProjectUsers:
    # do something

But this solution could still query all possible user/organization combinations. The other question would be: which set is larger the organization users or the project users? It is likely looping over the organization's users looking for active ones would be more efficient anyway.

Remember the Underlying Representation

When performance matters remembering the objects are ORM backed is important. So in this case a single query was all that was required (SqlObject pseudo syntax):

activeOrganizationProjectUsers = \
    Users.selectBy(project=project,
                           active=True,
                           organization=organization).count() > 0
 

If abstracting out the ORM's methods is important this new function could be added to the appropriate class as a method. In my case making a change to use a query resulted in cutting the page load time by two orders of magnitude.

Filed under  //   anti-pattern   development   orm   python  

Comments [0]