tech.agilitynerd.com

scratching that itch... 
Filed under

python

 

Python dict.get's Default Value is Always Evaluated

This is a gotcha I ran across in some production code that is obvious in retrospect. I was profiling the code to find places where we were calling "an_expensive_database_function" and came across code like this:

def doit(*args, **kwargs):
    value = kwargs.get('key', an_expensive_database_function())

The original author probably assumed that if 'key' was present in the kwargs dictionary an_expensive_database_function wouldn't be called; that it would be short circuited in the same manner as Boolean expressions. But since get is a function the arguments are always evaluated on the way into the function. So in this case even if the value of an_expensive_database_function was already present in the kwargs dictionary the database function would be called again.

Here is a "look before you leap" solution:

def doit(*args, **kwargs):
    value = kwargs.get('key')
    if value is None: 
        # assuming default value None isn't a valid value
        value = an_expensive_database_function()

Here is the "easier to ask forgiveness than permission" solution:

def doit(*args, **kwargs):
    try:
        value = kwargs['key']
    except KeyError:
        value = an_expensive_database_function()

Filed under  //   python  

Comments [0]

Obtain Short URLs and QR-Codes for Django Apps

Lately I've been interested in improving the interaction of my agilitycourses website for mobile users. One such improvement is to add QR Codes (aka 2D barcodes) representing the page URLs to the printed representations of pages served as PDFs.

I found that developers have reverse engineered the "api" of the goo.gl URL shortening web site. In my brief testing it is very fast. What makes that service extra useful is by adding ".qr" to a shortened URL it returns a PNG image of the QR Code for the shortened URL. That made it perfect for providing both short text and QR Code URL representations for my printed documents.

I threw together a few functions and put them in a module to make it easy to shorten a long URL, obtain the QR Code PNG and store it using Django's Storage functionality:

import os
import urllib
from django.utils import simplejson


def googl_shorten_url(long_url):
    """
    Returns goo.gl shortened url for the provided long_url.
    Code taken from: http://djangosnippets.org/snippets/2220/

    Parameters:

    - `long_url`: the url to supply to goo.gl to be shortened.
    """
    params = urllib.urlencode({'security_token': None, 'url': long_url})
    f = urllib.urlopen('http://goo.gl/api/shorten', params)
    return simplejson.loads(f.read())['short_url']


def googl_qrcode(googl_url):
    """
    Return file containing qr code image file for the given goo.gl url.

    Parameters:

    - `googl_url`: url from which to obtain the qr code.
    """
    return urllib.urlopen(googl_url + '.qr')


def get_url_qr_code_image(long_url, storage, storage_image_file_path=''):
    """
    Return goo.gl shortened url and storage name of qr code corresponding to
    the shortened url for the supplied full url. Contacts goo.gl to shorten
    the supplied long url then downloads and stores the qr code image file
    in the storage instance using the file path and the shortened url name
    as the storage name.

    Parameters:

    - `long_url`: the url to shorten.
    - `storage': a Django storage instance into which to store the qr code
    image.
    - `storage_image_file_path`: file system path to prepend to shortened
    url. This path must exist prior to calling this function.
    """
    try:
        googl_url = googl_shorten_url(long_url)
        qr_file_name = googl_url.split('/')[-1] + '.qr'
        qr_code_name = os.path.join(storage_image_file_path, qr_file_name)
        if not storage.exists(qr_code_name):
            qr_buffer = storage.open(qr_code_name, 'wb')
            qr_buffer.write(googl_qrcode(googl_url).read())
            qr_buffer.close()
    except:
        googl_url = None
        qr_code_name = None
    return googl_url, qr_code_name

Yes, it has a nasty bare try/except. For my uses this is optional functionality so I never want a failure to stop the main functionality of the views that use it. Add exception handling appropriate for your needs.

The main entry point is get_url_qr_code_image(). Here is an example of its use (assuming you save the code in googl.py):

>>> import googl
>>> from django.core.files.storage import default_storage

>>> short_url, qr_code_storage_name = googl.get_url_qr_code_image('http://google.com', default_storage)
>>> short_url
u'http://goo.gl/mR2d'
>>> qr_code_storage_name
u'mR2d.qr'
>>> default_storage.path(qr_code_storage_name)
u'/home/dev/agilitycourses/static/mR2d.qr'
>>> default_storage.url(qr_code_storage_name)
u'mR2d.qr'
>>> 

Hope you find this useful.

Filed under  //   django   goo.gl   python   qr-code  

Comments [0]

Adding pyrsvg to a virtualenv created with --no-site-packages

I set up my development and deployment environments on Ubuntu with virtualenv with the --no-site-packages option to isolate them from packages in the system installation. My application uses pyrsvg and it is installed by default as a system package. Consequently I had to link the shared libraries it installs (w/in gtk) into my virtualenv.

Here are the links I created (workon and cdsitepackages are virtualenvwrapper shell aliases):

$ workon project
$ cdsitepackages
$ ln -s /var/lib/python-support/python2.6/gtk-2.0/rsvg.so .
$ ln -s /var/lib/python-support/python2.6/gtk-2.0/gobject .
$ ln -s /var/lib/python-support/python2.6/gtk-2.0/glib .

Filed under  //   python   rsvg   virtualenv  

Comments [0]

My Favorite ORM and Python Anti-Patterns

At work I was looking at improving the performance of one of our slower web pages. It can be rewarding to find a little piece of code that can be easily optimized. This time there were several functions that were adding 10+ sec to the page in worst case. It wasn't a problem for most clients, but when clients with who are related to many other clients hit the page they'd experience terrible performance. Here's pseudo code for the combination of anti-patterns that caused the problem:

# Projects have users and users are in different organizations 
# (project can contain multiple organization's users)
activeOrganizationProjectUsers = \
    [x for x in project.users \
        if x.active and x.organization == organization]

if activeOrganizationProjectUsers:
    # do something *NOT* using activeOrganizationProjectUsers

There are two main problems with this code:

  1. It ignores the fact the project, users, and organization are backed by an ORM
  2. The list comprehension is being used to find all matching elements when only a single element is needed.

Ignoring the ORM

The code above wouldn't be too bad if these were just lists of objects in memory. But being objects that are instantiated by an ORM a number of database queries will be issued. In this particular case (w/o eager loading across user to the organization table) the following queries where executed:

  1. Join project to user and get all users for the project's id
  2. For each user load their organization (one by one) if the user is active

So in the case where there were hundreds of users on a project there were hundreds of queries executed and hundreds of User and Organization instances were instantiated. Depending on the size of the objects (and the ORM's behavior) it can take "real time" to fetch and instantiate all these large objects.

This code base has this kind of code sprinkled through out it. At one time during it's development the developers were encouraged to treat ORM backed objects as though they were Plain Old Python Objects (POPOs). The developer wouldn't necessarily see the performance degradation using small data sets either. This is one of the reasons why I like to tail the database log (or use django-debug-toolbar if I''m using Django) to see the queries go by.

Using List Comprehensions When a Single Value is Needed

To make this situation worse, the activeOrganizationProjectUsers list wasn't actually used. This is a combination of a Python anti-pattern and the ORM anti-pattern. What was required was to determine if a single active organization user existed.

I believe the original developer(s) used the list comprehension solution in a combination of ignorance and syntactic sugar. They didn't want to write a new function to do the query and put it in the User class so they used the existing class's API. The syntactic sugar was using the list comprehension to get more values than the one that was needed. If this wasn't a (potentially) expensive ORM backed operation the original code could have been:

activeOrganizationProjectUsers = False
for x in project.users:
    if x.active and x.organization == organization:
        activeOrganizationProjectUsers = True
        break

if activeOrganizationProjectUsers:
    # do something

But this solution could still query all possible user/organization combinations. The other question would be: which set is larger the organization users or the project users? It is likely looping over the organization's users looking for active ones would be more efficient anyway.

Remember the Underlying Representation

When performance matters remembering the objects are ORM backed is important. So in this case a single query was all that was required (SqlObject pseudo syntax):

activeOrganizationProjectUsers = \
    Users.selectBy(project=project,
                           active=True,
                           organization=organization).count() > 0
 

If abstracting out the ORM's methods is important this new function could be added to the appropriate class as a method. In my case making a change to use a query resulted in cutting the page load time by two orders of magnitude.

Filed under  //   anti-pattern   development   orm   python  

Comments [0]

Confidently Refactoring Django URLs, Views, and Templates

Googility.com is my first Django website and under the covers the oldest code looked like it. I had originally written it with the sole intent of allowing people to enter dog agility businesses and websites into a database that I could use to create a Dog Agility Google Custom Search Engine. The primary mistake I made was making the "project" (in Django speak) effectively equivalent to the primary application. In other words I didn't divide the major features of the site into standalone applications (which would allow them to be more easily reused, extended and tested).

As I continued to work on it I learned more about organizing Django projects. When I added the periodical search to the website I created it as a standalone application. I recently split out my django-shrinktheweb application from the main code base.

The Custom Search Engine (CSE) functionality is a worthwhile application that I'm planning on releasing as its own reusable application. I had already created an application directory called "cse" into which I had placed my models, views, urls, and tests specific to the CSE functionality. But I wanted to make the following changes:

  • Move CSE templates into a cse template subdirectory
  • Name the templates to match the views that use them
  • Name the urls in the urls.py prefixed with the application name ("cse_")
  • Covert all reverse() calls in the views and url template tags to use the named urls

Those are enough changes that I was concerned that I might miss something that would fail either in the view code or in rendering of the templates.

The Django test client makes it easy to test the forward and reverse url matching, calling the view and rendering the template. It is kind of a coarse grained test but the changes I was making were perfect for this tool. Given a urls.py:

urlpatterns = patterns('cse.views',
                    url(r'^site/view/(?P<id>\d+)/$', 'view', name='cse_view'),
)

and a view:

def view(request, id, template='cse/view.html'):
    """Display an end user read only view of the site information"""
    site = get_object_or_404(Annotation, pk=id)
    return render_to_response(template,
                          {'site': site,
                           'labels': get_labels_for(site, cap=None),
                           },
                          context_instance=RequestContext(request))

I then wrote a test class to create the required test instances and tests for each url to verify that the url can be found by name (via reverse()), the url maps to a view, the view invokes the desired template(s), and the {% url %} calls within the template can all be resolved:

from django.test import TestCase
from django.test.client import Client
from django.conf import settings
from django.core.urlresolvers import reverse
from cse.models import Label, Annotation

class ViewsTestCase(TestCase):

    def setUp(self):
        self.client = Client()
        self.ROOT_URLCONF = settings.ROOT_URLCONF
        # can provide a custom urls.py for testing so the tests can be run when
        # the application is incorporated into another project
        # settings.ROOT_URLCONF = 'cse.tests.cse_test_urls'
        # override the template context processors if there are special ones in place
        # that either you want to test or want to avoid
        self.TEMPLATE_CONTEXT_PROCESSORS = settings.TEMPLATE_CONTEXT_PROCESSORS
        settings.TEMPLATE_CONTEXT_PROCESSORS = ()
        # Create some instances on which we can invoke views
        self.label = Label(name='name', description='description')
        self.label.save()
        self.annotation = Annotation(comment='Site Name', original_url='http://example.com/')
        self.annotation.save()
        self.annotation.labels.add(self.label)
        self.annotation.save()

    def tearDown(self):
        # put settings back so the next tests aren't effected
        settings.ROOT_URLCONF = self.ROOT_URLCONF
        settings.TEMPLATE_CONTEXT_PROCESSORS = self.TEMPLATE_CONTEXT_PROCESSORS


    def test_view(self):
        response = self.client.get(reverse('cse_view', kwargs={"id":self.annotation.id}))
        self.assertEquals(200, response.status_code)
        self.assertTemplateUsed(response, 'cse/view.html')

The normal unittest asserts are available in the tests. I'm using one of the special asserts provided by the Django test Client to verify that the template I expected was used. All the templates used (due to template inheritance) are collected by the client and can also be verified.

I used these tests in a TDD-ish manner, I wrote the test for a view, ran the tests and kept resolving errors in the templates as I made the changes in my bullet list. It made a tedious job simple and gave me good confidence that I'd found all the renamed urls, views, and templates.

Filed under  //   django   googility   python   tdd   testing  

Comments [0]

Initial Release of django-stw

I have been using the free website thumbnail service from Shrink The Web on my dog agility search website Googility since I launched it. It is quick and easy to use and it adds a lot to the look of the pages.

I had created a simple Django template tag for inserting the little snippet of HTML needed by their service.

Recently they asked me to add support for their advanced features to my template tag. I used this opportunity to convert my templatetag to a Django application. This mostly makes it a lot easier to install but it also let me to bundle tests and an example template with the template tag.

I kept the existing shrinkthewebimage template tag and added a new tag called stwimage to enable the new features.

I'm hosting the example page included in the package here so you can see how the template tags work.

I've hosted the project source on github and uploaded the initial release to the CheeseShop for easy installation.

 

Filed under  //   django   github   googility   pypi   python   shrink the web   web development  

Comments [0]

Embedding JSON Within Generated HTML

Ran into an interesting problem at work this past week that had a simple and pleasing resolution. We have an in house developed JavaScript grid on some of our pages and when users entered some text strings we'd generate invalid JSON payloads that would give the user an error page. If they entered strings that looked like an HTML Entity i.e. &#13 which (with the addition of a trailing ; ) is a non-visible HTML character (carriage return) the text wasn't displayed in the widget. To further complicate things some of the content displayed in the grid is HTML which is inserted into the grid as is and can contain escaped HTML characters.

The grid gets its content as a JSON payload from within a hidden div in the HTML which is generated via a template mechanism. Heres a portion of the template where <%= and %> stringifying of the value of the Python variable(s)/code they surround:

<div style="display:none;" id="grid-init-args-<%= count %>">
    <textarea>
  <!-- this is the JSON payload loaded via the grid JavaScript -->
      <%= 
           [ columnsIndex,
            indexColumns,
            columns,
            rowBuffer,
            contractComponentCount,
            contractId,
            projectId,
            row.contractComponentID,
            row.changeOrderID,
            component.changeOrderType,
            footerRows,
            formulas,
            "false",
            rf.test] %>
    </textarea>
</div>

This approach has a number of problems:

  1. By using the template mechanism to create the JSON payload this template was relying on the similarity of the string representation of Python objects to JSON. After some testing I found the following scenarios: If a string contained a single quote character the string representation was a double quoted string around the text and the single quote; a valid JSON string. If the string contained a double quote character the string representation was a single quoted string around the text and the double quote; an invalid JSON string. If the string contained both a single and a double quote the string representation would be a single quoted string containing a slash escaped single quote and the double quote; an invalid JSON string. Depending on the browser (of course) the JSON string would fail to parse correctly when the double quote was encountered within the single quoted string.
  2. The JSON payload had to be HTML encoded (converting <, >, ", and &) since it was parsed by the browser as HTML.
  3. The HTML encoding would encode or double encode HTML to be inserted directly into the grid's DOM.

The variation in single/double quoting was an easy fix, I changed to simplejson.dumps() which correctly double quotes key/values in dicts and escapes embedded double quotes (single quotes don't need to be escaped). I didn't time it but with the C extension it may be faster than the template engine for our larger datasets.

I played around with (not) encoding various portions of the payload and then it hit me that I should change the grid to get its payload from a non HTML element so that only HTML destined for insertion into the DOM would be HTML encoded (which is as you'd expect for normal HTML handling). I started changing the payload to be stored in JavaScript generated in the template but didn't like the impact the change would have on all the existing templates. So I started Googling and found Ben Nadel's blog post on using script tags as data containers.

So here's my solution:

<div style="display:none;" id="grid-init-args-<%= count %>">
       <script type="application/json">
       <%= simplejson.dumps(
            [ columnsIndex,
             indexColumns,
             columns,
             rowBuffer,
             contractComponentCount,
             contractId,
             projectId,
             row.contractComponentID,
             row.changeOrderID,
             component.changeOrderType,
             footerRows,
             formulas,
             "false",
             rf.test]) %>
        </script>
     </div>
 

There were two changes:

  1. Used simplejson.dumps to correctly double quote and escape double quotes within the variables in the payload.
  2. Change the textarea to a script element.

By converting to a script tag within the hidden div the HTML parser no longer parsed the content of the JSON payload. so the JSON payload only needed to HTML encode HTML elements that were being inserted into the DOM created by the grid.

This change also meant I was able to delete the unnecessary HTML encoding of non-HTML JSON payload data. Got to love solutions that involve deleting code.

Ultimately, we'll convert to loading the JSON payload as a separate AJAX request from the page to the server, but for now this simplifies the markup and handles all types of user input and HTML encoded characters correctly.

Filed under  //   html   javascript   json   python   web development  

Comments [1]

Django Shrink The Web Template Tag Updated

I recently updated my Django template tag for simplifying the use of Shrink The Web images. They recently announced a CDN based distribution of images and they took the opportunity to modify their API.

The updated template tag is on django snippets.

The STW folks have asked be to extend my template tag with support for their PRO features. With luck I'll make that available sometime this weekend.

Filed under  //   django   python   shrink the web   web development  

Comments [0]

Using django-sitemap with django-tagging

I was adding django-sitemap to googility.com yesterday and found that Tags don't implement get_absolute_url(). Which makes sense since the site developer would want to decide how to expose them in the URL space.

It is also arguable that links to pages displaying the tag view already exist in the page for models that are already in the sitemap so they don't need to be put in the sitemap explicitly. For example, a page for an Article might be at /article/django-11-release and that page would contain the links to pages linked with the tags for that article e.g. /tag/django/ and /tag/python/

But I figured having the tag pages indexed by Google would be useful. It also allows a different priority to be specified for the pages. So I made a little class that derives from GenericSitemap that allows the url and suffix for the Tag name to be specified:



class SlugSitemap(GenericSitemap):
    """Use for objects that don't implement
     get_absolute_url but have a slug field used in 
     creating their url"""
    def __init__(self, info_dict, priority=None, changefreq=None):
        GenericSitemap.__init__(self, info_dict, 
                                  priority=priority, 
                                  changefreq=changefreq)
        self.url = info_dict.get('url', '/')
        self.slugfield = info_dict['slugfield']
        self.suffix = info_dict.get('suffix', '')

    def location(self, obj):
        return "%s%s%s" % (self.url, 
                             getattr(obj, self.slugfield), 
                             self.suffix)


Here's how I use it:



sitemaps = {
    'tag_detail': SlugSitemap({'queryset':Tag.objects,
                               'url':'/tag/',
                               'slugfield':'name',
                               'suffix':'/'},
                              changefreq='monthly',
                              priority='0.5'),
}


The urls for tags are at /tag/slugname/ where /tag/ is prepended to tag.name and / is appended to the end

This class can be used to create sitemap entries for any url parameterized on a single field of an instance returned by the QuerySet.

Filed under  //   django   django-sitemap   django-tagging   python   web development  

Comments [0]