tech.agilitynerd.com

scratching that itch... 
Filed under

django

 

Confidently Refactoring Django URLs, Views, and Templates

Googility.com is my first Django website and under the covers the oldest code looked like it. I had originally written it with the sole intent of allowing people to enter dog agility businesses and websites into a database that I could use to create a Dog Agility Google Custom Search Engine. The primary mistake I made was making the "project" (in Django speak) effectively equivalent to the primary application. In other words I didn't divide the major features of the site into standalone applications (which would allow them to be more easily reused, extended and tested).

As I continued to work on it I learned more about organizing Django projects. When I added the periodical search to the website I created it as a standalone application. I recently split out my django-shrinktheweb application from the main code base.

The Custom Search Engine (CSE) functionality is a worthwhile application that I'm planning on releasing as its own reusable application. I had already created an application directory called "cse" into which I had placed my models, views, urls, and tests specific to the CSE functionality. But I wanted to make the following changes:

  • Move CSE templates into a cse template subdirectory
  • Name the templates to match the views that use them
  • Name the urls in the urls.py prefixed with the application name ("cse_")
  • Covert all reverse() calls in the views and url template tags to use the named urls

Those are enough changes that I was concerned that I might miss something that would fail either in the view code or in rendering of the templates.

The Django test client makes it easy to test the forward and reverse url matching, calling the view and rendering the template. It is kind of a coarse grained test but the changes I was making were perfect for this tool. Given a urls.py:

urlpatterns = patterns('cse.views',
                    url(r'^site/view/(?P<id>\d+)/$', 'view', name='cse_view'),
)

and a view:

def view(request, id, template='cse/view.html'):
    """Display an end user read only view of the site information"""
    site = get_object_or_404(Annotation, pk=id)
    return render_to_response(template,
                          {'site': site,
                           'labels': get_labels_for(site, cap=None),
                           },
                          context_instance=RequestContext(request))

I then wrote a test class to create the required test instances and tests for each url to verify that the url can be found by name (via reverse()), the url maps to a view, the view invokes the desired template(s), and the {% url %} calls within the template can all be resolved:

from django.test import TestCase
from django.test.client import Client
from django.conf import settings
from django.core.urlresolvers import reverse
from cse.models import Label, Annotation

class ViewsTestCase(TestCase):

    def setUp(self):
        self.client = Client()
        self.ROOT_URLCONF = settings.ROOT_URLCONF
        # can provide a custom urls.py for testing so the tests can be run when
        # the application is incorporated into another project
        # settings.ROOT_URLCONF = 'cse.tests.cse_test_urls'
        # override the template context processors if there are special ones in place
        # that either you want to test or want to avoid
        self.TEMPLATE_CONTEXT_PROCESSORS = settings.TEMPLATE_CONTEXT_PROCESSORS
        settings.TEMPLATE_CONTEXT_PROCESSORS = ()
        # Create some instances on which we can invoke views
        self.label = Label(name='name', description='description')
        self.label.save()
        self.annotation = Annotation(comment='Site Name', original_url='http://example.com/')
        self.annotation.save()
        self.annotation.labels.add(self.label)
        self.annotation.save()

    def tearDown(self):
        # put settings back so the next tests aren't effected
        settings.ROOT_URLCONF = self.ROOT_URLCONF
        settings.TEMPLATE_CONTEXT_PROCESSORS = self.TEMPLATE_CONTEXT_PROCESSORS


    def test_view(self):
        response = self.client.get(reverse('cse_view', kwargs={"id":self.annotation.id}))
        self.assertEquals(200, response.status_code)
        self.assertTemplateUsed(response, 'cse/view.html')

The normal unittest asserts are available in the tests. I'm using one of the special asserts provided by the Django test Client to verify that the template I expected was used. All the templates used (due to template inheritance) are collected by the client and can also be verified.

I used these tests in a TDD-ish manner, I wrote the test for a view, ran the tests and kept resolving errors in the templates as I made the changes in my bullet list. It made a tedious job simple and gave me good confidence that I'd found all the renamed urls, views, and templates.

Filed under  //   django   googility   python   tdd   testing  

Comments [0]

Haystack Search Result Ordering and Pre-Rendering Results

I use Haystack and the Python Whoosh project to provide search over ~3400 articles in my Googility.com database. I had originally implemented the search in the "simplest way that works". I was making some other enhancement to Googility and noticed the search result page had two undesirable  behaviors:

  1. The ordering of results was basically random for all matching articles. For the domain of magazine article search having a bias toward the most recent publications would be more desirable.
  2. Looking at the django-debug-toolbar output each element in the search results was hitting the database twice (once for the Article instance and again for its corresponding Periodical). So a single result page was making as many as 60 database selects.

Haystack provides mechanisms to help with both of these issues.

Imposing an Order on the SearchQuerySet

Haystack models search using an API based on Django's QuerySet. The only thing to remember is it performs its queries over the Haystack SearchIndex subclass(es) you create instead of over the Django ORM. So you define a SearchIndex subclass that contains the data from the application's model overwhich you'd like to search. You can also define additional fields that can be used to modify the results of the query. Here is my magazine Article search index:

from haystack.sites import site
from haystack import indexes
from periodicals.models import Article

class ArticleIndex(indexes.SearchIndex):
    text = indexes.CharField(document=True, use_template=True)
    pub_date = indexes.DateTimeField(model_attr='issue__pub_date')

site.register(Article, ArticleIndex)

The text field contains the "document" over which the search engine (Whoosh) will actually perform the search. I'm using the template feature that allows me to use Django templates to format the data presented to the search engine.

I added the pub_date field to the index to allow the matching search results to be ordered by the pub_date field. The 'issue__pub_date' syntax mirrors the Django QuerySet syntax and means extract the "pub_date" attribute of the Article's "issue" attribute (it joins Article to Publication and get's the Publication's published date).

Then the urls.py is modified to change the SearchQuerySet passed to the default haystacksearch view to order by the ArticleIndex's pub_date attribute:

<snip>
from haystack.views import SearchView
from haystack.query import SearchQuerySet
# query results with most recent publication date first
sqs = SearchQuerySet().order_by('-pub_date')
urlpatterns = patterns('',
                       url(r'^search/',
                           SearchView(
                               load_all=False,
                               searchqueryset=sqs,
                               ),
                           name='haystack_search',
                           ),
<snip>

Pre-Rendering Result HTML

Since I have only a few thousand records I decided to follow the Haystack Best Practices for Not Hitting the Database. This solution trades space in the Whoosh index files by generating the HTML that will be displayed when each article matches along with the data used by Whoosh to match articles to search keywords. The changes were pretty simple. In the ArticleIndex:

from haystack.sites import site
from haystack import indexes
from periodicals.models import Article

class ArticleIndex(indexes.SearchIndex):
    text = indexes.CharField(document=True, use_template=True)
    pub_date = indexes.DateTimeField(model_attr='issue__pub_date')
    # pregenerate the search result HTML for an Article
    # this avoids any database hits when results are processed
    # at the cost of storing all the data in the Haystack index
    result_text = indexes.CharField(indexed=False, use_template=True)

site.register(Article, ArticleIndex)

The use_template keyword requires you to create a Django template file that is used during index creation to build the HTML that will be displayed. The only peculiarity I found was figuring out where the template should live. On my system it was at templates/search/indexes/periodicals/article_result_text.txt. I understand the periodicals/article_result_text part but I haven't looked into where the search/indexes is generated from. I imagine a reverse() to find the url for the view and "indexes" is appended to that...

The final change is the template used to display the search results. In order to not hit the database the object list generated by the haystack SearchView is placed into the context used by the template and only the result_text attribute should be accessed:

{% if page.object_list %}
<div class="search-results-title">Results <b>{{page.start_index}}</b>  - <b>{{page.end_index}}</b> for <b>{{query}}</b></div>
    <div class="search-results-list">
    {% for result in page.object_list %}
      {{result.result_text|safe}}
    {% endfor %}
    <div class="pagination">
      <span class="step-links">
        {% if page.has_previous %}
            previous
        {% endif %}
        <span class="current">
            Page {{ page.number }} of {{ page.paginator.num_pages }}
        </span>
        {% if page.has_next %}
            next
        {% endif %}
      </span>
    </div>
</div>
{% else %}
<h2>No matching articles found.</h2>
{% endif %}

The actual result is placed in the template via {{result.result_text|safe}} the safe filter is required since the HTML doesn't need to be escaped again - it was escaped by Django when it was placed into the SearchIndex.

So now my search results are in reverse chronological order and they render using only 3 database queries and at least 10x faster than before.

Filed under  //   django   haystack   search   whoosh  

Comments [0]

Improving Google Ads and Google Search Descriptions

I was looking at the google search results for my Googility web site and noticed that the descriptions shown underneath the title often contained text from my navigation links instead of content from the body of the page:

I did some searching and found the Google Webmaster blog post about description meta tags. Since almost all of the pages on Googility are generated by fewer than a dozen Django templates I edited the templates and inserted meta tags and filled the description in with data from each database entry. This avoids boilerplate information that would be ignored by Google and improves the descriptions shown to Google searchers. Some of my pages have already been reindexed:

Yahoo and some other search sites use a class robots-nocontent on any page elements it should ignore for it's index, Unfortunately, Google doesn't follow this standard. So I might end up making that edit to the templates also. Looking at my site's log files it appears the Yahoo spider is hitting my site more frequently than Google's and the Yahoo index is more up to date. Looking at my analytics reports though Google refers far more readers to my site than Yahoo...

I also noticed that the ads served on pages containing mostly links appeared to be using words in my navigation or other boilerplate instead of the few lines of valuable content. More searching to the rescue and I found this Google Adsense article on section targeting. Once again editing the dozen or so templates I used were easy to edit to add in these HTML comment tags. Checking back a couple days later showed improvements in the ads being generated for those pages. I keep an eye on my Adsense click rate and see if there is any increase in ad clicks.

So a couple simple edits made noticeable improvements not bad for a couple hours investigation and implementation.

Filed under  //   adsense   django   google   search   web development  

Comments [0]

Initial Release of django-stw

I have been using the free website thumbnail service from Shrink The Web on my dog agility search website Googility since I launched it. It is quick and easy to use and it adds a lot to the look of the pages.

I had created a simple Django template tag for inserting the little snippet of HTML needed by their service.

Recently they asked me to add support for their advanced features to my template tag. I used this opportunity to convert my templatetag to a Django application. This mostly makes it a lot easier to install but it also let me to bundle tests and an example template with the template tag.

I kept the existing shrinkthewebimage template tag and added a new tag called stwimage to enable the new features.

I'm hosting the example page included in the package here so you can see how the template tags work.

I've hosted the project source on github and uploaded the initial release to the CheeseShop for easy installation.

 

Filed under  //   django   github   googility   pypi   python   shrink the web   web development  

Comments [0]

Django Shrink The Web Template Tag Updated

I recently updated my Django template tag for simplifying the use of Shrink The Web images. They recently announced a CDN based distribution of images and they took the opportunity to modify their API.

The updated template tag is on django snippets.

The STW folks have asked be to extend my template tag with support for their PRO features. With luck I'll make that available sometime this weekend.

Filed under  //   django   python   shrink the web   web development  

Comments [0]

Using django-sitemap with django-tagging

I was adding django-sitemap to googility.com yesterday and found that Tags don't implement get_absolute_url(). Which makes sense since the site developer would want to decide how to expose them in the URL space.

It is also arguable that links to pages displaying the tag view already exist in the page for models that are already in the sitemap so they don't need to be put in the sitemap explicitly. For example, a page for an Article might be at /article/django-11-release and that page would contain the links to pages linked with the tags for that article e.g. /tag/django/ and /tag/python/

But I figured having the tag pages indexed by Google would be useful. It also allows a different priority to be specified for the pages. So I made a little class that derives from GenericSitemap that allows the url and suffix for the Tag name to be specified:



class SlugSitemap(GenericSitemap):
    """Use for objects that don't implement
     get_absolute_url but have a slug field used in 
     creating their url"""
    def __init__(self, info_dict, priority=None, changefreq=None):
        GenericSitemap.__init__(self, info_dict, 
                                  priority=priority, 
                                  changefreq=changefreq)
        self.url = info_dict.get('url', '/')
        self.slugfield = info_dict['slugfield']
        self.suffix = info_dict.get('suffix', '')

    def location(self, obj):
        return "%s%s%s" % (self.url, 
                             getattr(obj, self.slugfield), 
                             self.suffix)


Here's how I use it:



sitemaps = {
    'tag_detail': SlugSitemap({'queryset':Tag.objects,
                               'url':'/tag/',
                               'slugfield':'name',
                               'suffix':'/'},
                              changefreq='monthly',
                              priority='0.5'),
}


The urls for tags are at /tag/slugname/ where /tag/ is prepended to tag.name and / is appended to the end

This class can be used to create sitemap entries for any url parameterized on a single field of an instance returned by the QuerySet.

Filed under  //   django   django-sitemap   django-tagging   python   web development  

Comments [0]

reCAPTCHA in Django

I first read about ReCAPTCHA in this article in Wired magazine last year.


Copyright reCAPTCHA
reCAPTCHA provides a free CAPTCHA web service that pairs together two words from OCR scanned books. One of the words is known and the other couldn't be recognized. The user types in both words not knowing which is unknown to the system. As reCAPTCHA collects the responses for the unknown word they get human verified character recognition. So the millions of users of the system are clearing up millions of unrecognized words. It is a very clever human "cloud computing" system using only seconds of human effort for each use of the system.

I'm using a FIGLet based ASCII CAPTCHA on my websites since it was easy to integrate into the Blosxom writeback plugin. But I wanted to give reCAPTCHA a try while converting my Googility site to Django. John DeRosa made my job trivial by writing up the steps with a clear example.

So I followed his directions which involved installing the recaptcha-client python library on my dev and production systems and obtaining a free public/private license key from the reCAPTCHA site. Then I updated my Django view and template files for the one form that needed CAPTCHA protection. It was dead simple and working within minutes. The only minor addition I'd make to John's article is of course you need to pass the captcha_error variable from your view to the template: return render_to_response('edit.html', {'form': form, 'captcha_error':captcha_error}).

So give reCAPTCHA a try for your next project. It was so easy to do I might even convert my Blosxom blogs to use it via Lars Engel's recaptcha plugin.

Filed under  //   django   recaptcha  

Comments [0]