Basie Blog

A Lightweight Software Development Portal in Django

Handling Anonymous Users Is Proving Difficult

with one comment

Nine days ago, Eran sat down to clean up Basie’s REST API. As of yesterday, that work was still halted pending a solution to what seems like a simple problem: how to handle users who haven’t logged in. This post is my attempt to explain what we’re currently doing, why we think it’s broken, what I think we should be doing instead, and why we aren’t finished yet. Since I don’t have my hands in the code, I’m probably going to get a few things wrong—please check the comments below for corrections and additions from more knowledgeable members of the team.

Authentication is the process of establishing who someone is. In Basie, we rely on people logging in, i.e., providing a recognized user ID and an associated password. (In future we hope to also support certificate-based authentication, OpenID, and other mechanisms, but that doesn’t change this story.)

If someone hasn’t (yet) logged in, we call them an anonymous user. Typically, anonymous users can see and do less than authenticated users: for example, anonymous users are usually not allowed to edit wiki pages. Representing who can do what is called authorization; enforcing those rules is called access control; the things users might be able to do (like editing wiki pages) are called capabilities or privileges.

Many different authorization and access control schemes are in use today. One of the best-known is called role-based access control, or RBAC, which is designed to make management and auditing easy. Instead of associating privileges directly with users, RBAC introduces an intermediate concept called a role. A role is a set of privileges: for example, the “viewer” role might contain all the “read” privileges in the system but none of the “write” privileges, while the “superuser” role would contain every privilege there is.

A system like Basie can implement RBAC using four tables: USERS, PROJECTS, MEMBERSHIPS, and ROLES. The USERS table stores information about known users:

USERS
id name
aturing Alan Turing …other information…
ghopper Grace Hopper …other information…
jvn John von Neumann …other information…

The PROJECTS table stores information about projects (I’ll explain what the “default role” field is for in a moment):

PROJECTS
id default role
antigravity empty …other information…
teleportation viewer …other information…

The MEMBERSHIPS table shows what roles users have in projects:

MEMBERSHIPS
user_id project_id role_id
aturing antigravity viewer
ghopper antigravity developer
ghopper teleportation developer

Note that the MEMBERSHIPS table is not “complete”: some users may not have an explicit role in a particular project. In that case, the system uses the “default role” associated with the project. Comparing the tables seen so far, for example, we can see that John von Neumann would have the “empty” role in the antigravity project, and both he and Alan Turing would have the “viewer” role in the teleportation project.

The final table, ROLES, simply records what privileges each role contains:

ROLES
role privilege
viewer WIKI_READ
viewer EMAIL_READ
viewer …other “read” privileges…
developer WIKI_READ
developer WIKI_WRITE
viewer …other “read” and “write” privileges…

Note that the “empty” role doesn’t show up here, so that when the database is asked, “What privileges does the ‘empty’ role have?” the answer is the empty set.

RBAC is widely used because it is simple to implement, simple to administer, and simple to test. The implementation has two parts: figuring out what privileges someone has, and enforcing them. Figuring out is easy: just look up the role associated with a (user, project) pair, using the project’s default role if there’s no explicit entry, then see if the privilege needed is in the set associated with that role. In Python, we can encapsulate this check in a decorator function that we can then apply to all the methods that need privileges:

@require_permission(request, WIKI_READ)
def show_a_wiki_page(...args...):
    ...body of method...

where request is the HTTP request object created by Django that holds information about the current transaction (including the user ID and project ID), and “WIKI_READ” is the privilege to be checked. Administration is simple too: a typical system only has a handful of roles (in three years of continuous use, we’ve only needed half a dozen in DrProject), and you can tell at a glance who’s able to do what. Testing is straightforward too, or at least, a lot more straightforward than it was with Trac, in which every user could have a slightly different set of privileges.

But life is never that easy. The problem we face is that Django doesn’t handle the anonymous user the same way it handles authenticated users. If the user associated with a transaction has been authenticated (i.e., if the HTTP request contains a valid cookie), Django’s middleware looks up the user in the USERS table, creates an object of the class User to store the information it finds, and puts that object in request. If there isn’t a valid cookie, though, Django creates an instance of an entirely different class called AnonymousUser and sticks that in request without ever checking the database. User and AnonymousUser both have Boolean methods called is_anonymous and is_authenticated; for each class, one always returns True and the other always returns False. Permission checks are then supposed to do something like this:

if request.user.is_anonymous():
    return True # users can read wiki pages
privilege_set = ...look up privileges(request.user.id, request.project.id)...
if "WIKI_READ" in privilege_set:
    return True
else:
    return False

Three things are wrong with handling anonymous users this way:

  1. If an administrator wants to change what an anonymous user can do with respect to a particular project, she can’t just modify the database using the admin tools provided for managing regular users—she has to actually modify the source. Policy changes shouldn’t require recoding.
  2. Special cases and extra code paths in code greatly increase the chances of bugs; bugs in security-related code greatly increase the chances that something bad will happen to people because they chose to use your software.
  3. The more things developers are required to remember when writing or maintaining code, the greater the odds that they’ll make a mistake. I think the odds are pretty high that sooner or later, one of the students working on Basie will forget to handle the anonymous user in a permissions check, or will handle it incorrectly. Pre-release testing should catch this, but why rely on that if we don’t have to?

Why does Django work this way? It might be a misguided attempt to optmize performance: if you expect that most visitors to your site won’t authenticate, handling their permissions in code saves a couple of database lookups. It also avoids storing an entry in the USERS table that doesn’t correspond to an actual user, and (more importantly) saves Django from requiring special-case code in its administrative tools to prevent anyone from deleting that entry (which would break the whole system).

All things considered, I strongly prefer uniform RBAC plus one extra test in the admin interface over special-case tests for the anonymous user scattered through the whole application. The problem now is how to get there from here. We can’t get Django to create an instance of our User class instead of using its own AnonymousUser class (and no, I’m not willing to consider patching Django itself—it would be a never-ending maintenance headache). I think our best bet is to check the request object created by Django’s middleware as early in the processing cycle as we can to see if request.user.is_anonymous() is True, and if so, replace request.user with an instance of our own User class. If need be, we can add is_anonymous and is_authenticated methods to that class to keep Django happy, but not use them anywhere in our own code.

If it really was that simple, though, the team would have done it by now. What am I missing? What part of this problem have I forgotten or not understood?

Written by gvwilson

June 24th, 2009 at 10:03 am

Posted in Uncategorized

One Response to 'Handling Anonymous Users Is Proving Difficult'

Subscribe to comments with RSS or TrackBack to 'Handling Anonymous Users Is Proving Difficult'.

  1. I’ve just walked into this and come across this post in trying to find a solution. I want to log events on my site that are mostly created by registered users, but can in some instances be created by anonymous users. An AnonymousUser() object doesn’t have the same properties as a User object so an attempt to save it as a ForeignKey fails. My workaround is going to be to have a user called anonymous that is called to handle this situation. This has the value of being able to change the anonymous user’s rights as well, and doesn’t involve tinkering with django code. It is less than satisfactory though and is another case for not using contrib.auth.

    Simon Greenwood

    5 Aug 09 at 1:03 pm

Leave a Reply