tech.agilitynerd.com

scratching that itch... 
Filed under

development

 

YellowGrass - Web Based Issue Tracking

I was doing some reading on mobl and saw that they are using a free web based service called YellowGrass for issue tracking. It has some nice features and seems easy to use. Everything is tag based. I think I'll try to use it for tracking enhancements to agilitycourses.com.

Filed under  //   development   issue-tracking  

Comments [0]

My Favorite ORM and Python Anti-Patterns

At work I was looking at improving the performance of one of our slower web pages. It can be rewarding to find a little piece of code that can be easily optimized. This time there were several functions that were adding 10+ sec to the page in worst case. It wasn't a problem for most clients, but when clients with who are related to many other clients hit the page they'd experience terrible performance. Here's pseudo code for the combination of anti-patterns that caused the problem:

# Projects have users and users are in different organizations 
# (project can contain multiple organization's users)
activeOrganizationProjectUsers = \
    [x for x in project.users \
        if x.active and x.organization == organization]

if activeOrganizationProjectUsers:
    # do something *NOT* using activeOrganizationProjectUsers

There are two main problems with this code:

  1. It ignores the fact the project, users, and organization are backed by an ORM
  2. The list comprehension is being used to find all matching elements when only a single element is needed.

Ignoring the ORM

The code above wouldn't be too bad if these were just lists of objects in memory. But being objects that are instantiated by an ORM a number of database queries will be issued. In this particular case (w/o eager loading across user to the organization table) the following queries where executed:

  1. Join project to user and get all users for the project's id
  2. For each user load their organization (one by one) if the user is active

So in the case where there were hundreds of users on a project there were hundreds of queries executed and hundreds of User and Organization instances were instantiated. Depending on the size of the objects (and the ORM's behavior) it can take "real time" to fetch and instantiate all these large objects.

This code base has this kind of code sprinkled through out it. At one time during it's development the developers were encouraged to treat ORM backed objects as though they were Plain Old Python Objects (POPOs). The developer wouldn't necessarily see the performance degradation using small data sets either. This is one of the reasons why I like to tail the database log (or use django-debug-toolbar if I''m using Django) to see the queries go by.

Using List Comprehensions When a Single Value is Needed

To make this situation worse, the activeOrganizationProjectUsers list wasn't actually used. This is a combination of a Python anti-pattern and the ORM anti-pattern. What was required was to determine if a single active organization user existed.

I believe the original developer(s) used the list comprehension solution in a combination of ignorance and syntactic sugar. They didn't want to write a new function to do the query and put it in the User class so they used the existing class's API. The syntactic sugar was using the list comprehension to get more values than the one that was needed. If this wasn't a (potentially) expensive ORM backed operation the original code could have been:

activeOrganizationProjectUsers = False
for x in project.users:
    if x.active and x.organization == organization:
        activeOrganizationProjectUsers = True
        break

if activeOrganizationProjectUsers:
    # do something

But this solution could still query all possible user/organization combinations. The other question would be: which set is larger the organization users or the project users? It is likely looping over the organization's users looking for active ones would be more efficient anyway.

Remember the Underlying Representation

When performance matters remembering the objects are ORM backed is important. So in this case a single query was all that was required (SqlObject pseudo syntax):

activeOrganizationProjectUsers = \
    Users.selectBy(project=project,
                           active=True,
                           organization=organization).count() > 0
 

If abstracting out the ORM's methods is important this new function could be added to the appropriate class as a method. In my case making a change to use a query resulted in cutting the page load time by two orders of magnitude.

Filed under  //   anti-pattern   development   orm   python  

Comments [0]

How Safe is Your Personal Information in the Hands of Website Developers?

I was going through the webserver statistics for this site to see if any new sites had linked to any of my articles (it is always nice to see that what I have to say is useful to someone). Anyway, I ran across someone who had come to my site through a Google query (I won't mention what the query was for reasons you'll soon see). I ran the same query on Google to see what else came up since it was a rather unique query. Another Google link was for a site that looked like it had raw data - not your usual HTML pages.

When I went to the site I found what looked like a website developer's development directory wide open to the internet. There were at least three company's websites sitting in subdirectories. The file referred to in the Google result page was a backup of an SQL database dump file. Not just any database file - a backup of all the customer information for running one site's shopping cart database. It included names, addresses, email addresses, and phone numbers! (I didn't poke around to see if it had any more sensitive data).

I was able to figure out the original data owner's domain name from some info in the header of the file. So I just sent them an email letting them know that their customer information is posted for all to see on someone else's website. It will be interesting to see if they respond. I hope it is just their website developer who has a test server running and accidentally left this SQL dump in a publicly accessible area of their webserver. I'd hate to think this data was stolen from the real website and being used for spamming purposes.

As a software developer I've read numerous cautionary tales of accidental (and malicious) data theft occurring when real customer data is used in test systems. I just never imagined I'd stumble across such an egregious privacy violation. So this experience makes me wonder about all the online systems into which we type our personal information. All it takes is one careless developer (not even a malicious one) to expose our private information to a much wider audience...

Filed under  //   development   privacy  

Comments [0]