Basie Blog

A Lightweight Software Development Portal in Django

How to pretty-print the database schema

without comments

  • . bin/activate
  • cd <path-to-basie>
  • svn checkout http://django-command-extensions.googlecode.com/svn/trunk django-extensions
  • cd django-extensions
  • python setup.py install
  • add ‘django_extensions’ to the tuple INSTALLED_APPS in <path-to-basie>/basie/settings.py
  • django graph_models -ag > schema.dot
  • sudo apt-get install graphviz
  • dot -o schema.png -Tpng schema.dot

The tricky part is that we do not know how to set the resolution of schema.png to 300dpi from the command line. The only relevant option I could find was “dot -o schema.png -Gsize=’10, 10′ -Tpng schema.dot” where -Gsize=’x, y’ specifies the size of the box that encloses the final graph in inches.

Written by florian

June 30th, 2009 at 3:23 pm

Posted in Uncategorized

Database Migration

with one comment

Ian Lienert posted a table last night comparing the schemas of DrProject and Basie databases.  We discussed it this morning, and it looks like translating DrP databases to Basie isn’t going to hurt as much as we feared.  Here are some of the high points.

Tables we just don’t care about:

  • Enum (stored enumerations and their values)
  • SystemProperty and _InstanceSetting (Django does this for us)
  • Session and SessionAttribute (Django does this for us)
  • Preference (now part of the UserProfile)
  • Attachment (we’re not supporting attachments in 0.5)
  • Harvester, DashboardMetric, DashboardValue, and Retriever (our Dashboard is simpler)
  • UnconfirmedEmail (we don’t require users to confirm email addresses in 0.5)
  • DeletedMessage (never really understood why this was separate from Message)
  • Tag (no tagging in 0.5)
  • RoleCapability (Django handles this for us)
  • Milestone and MilestoneChange (no milestones in 0.5)
  • TicketChange (DrProject inherited this sort-of backward diff from Trac, where it was a premature space optmization)

Tables that translate more or less directly:

  • Project
  • User (Basie has User and UserProfile, since Django uses the latter as an auxiliary table for extensions to the former)
  • Role (becomes Group)
  • Membership (Basie’s is more complicated, but the core concepts are the same)
  • Email (becomes UserEmail)
  • WikiPage

Things that require some thought:

  • Message: the obvious translation is to MailMessage, but (a) Basie doesn’t store headers in the same way, so we may have to re-parse stored emails to populare the Basie database, and (b) DrProject has that damn DeletedMessage table, so we have to decide how much of the history of individual mail messages to bring forward.
  • Ticket: Basie’s ticketing system doesn’t provide type, milestone, priority, or comments; we will probably stuff these into the text of the ticket to avoid losing information completely.
  • CachedChangeset and CachedChange: these make up DrProject’s copy of information taken from the Subversion repository.  We haven’t yet decided how much of this to cache in Basie, or where to store it.

So as I said at the outset, it doesn’t look like it’s going to hurt as much as it might.  Stay tuned to see…

Written by gvwilson

June 30th, 2009 at 1:27 pm

Posted in Uncategorized

We’ve Been Busy

without comments

Via ReviewBoard, a plot of who has posted patches when since January:

Written by gvwilson

June 25th, 2009 at 11:21 am

Posted in Uncategorized

Handling Anonymous Users Is Proving Difficult

without comments

Nine days ago, Eran sat down to clean up Basie’s REST API. As of yesterday, that work was still halted pending a solution to what seems like a simple problem: how to handle users who haven’t logged in. This post is my attempt to explain what we’re currently doing, why we think it’s broken, what I think we should be doing instead, and why we aren’t finished yet. Since I don’t have my hands in the code, I’m probably going to get a few things wrong—please check the comments below for corrections and additions from more knowledgeable members of the team.

Authentication is the process of establishing who someone is. In Basie, we rely on people logging in, i.e., providing a recognized user ID and an associated password. (In future we hope to also support certificate-based authentication, OpenID, and other mechanisms, but that doesn’t change this story.)

If someone hasn’t (yet) logged in, we call them an anonymous user. Typically, anonymous users can see and do less than authenticated users: for example, anonymous users are usually not allowed to edit wiki pages. Representing who can do what is called authorization; enforcing those rules is called access control; the things users might be able to do (like editing wiki pages) are called capabilities or privileges.

Many different authorization and access control schemes are in use today. One of the best-known is called role-based access control, or RBAC, which is designed to make management and auditing easy. Instead of associating privileges directly with users, RBAC introduces an intermediate concept called a role. A role is a set of privileges: for example, the “viewer” role might contain all the “read” privileges in the system but none of the “write” privileges, while the “superuser” role would contain every privilege there is.

A system like Basie can implement RBAC using four tables: USERS, PROJECTS, MEMBERSHIPS, and ROLES. The USERS table stores information about known users:

USERS
id name
aturing Alan Turing …other information…
ghopper Grace Hopper …other information…
jvn John von Neumann …other information…

The PROJECTS table stores information about projects (I’ll explain what the “default role” field is for in a moment):

PROJECTS
id default role
antigravity empty …other information…
teleportation viewer …other information…

The MEMBERSHIPS table shows what roles users have in projects:

MEMBERSHIPS
user_id project_id role_id
aturing antigravity viewer
ghopper antigravity developer
ghopper teleportation developer

Note that the MEMBERSHIPS table is not “complete”: some users may not have an explicit role in a particular project. In that case, the system uses the “default role” associated with the project. Comparing the tables seen so far, for example, we can see that John von Neumann would have the “empty” role in the antigravity project, and both he and Alan Turing would have the “viewer” role in the teleportation project.

The final table, ROLES, simply records what privileges each role contains:

ROLES
role privilege
viewer WIKI_READ
viewer EMAIL_READ
viewer …other “read” privileges…
developer WIKI_READ
developer WIKI_WRITE
viewer …other “read” and “write” privileges…

Note that the “empty” role doesn’t show up here, so that when the database is asked, “What privileges does the ‘empty’ role have?” the answer is the empty set.

RBAC is widely used because it is simple to implement, simple to administer, and simple to test. The implementation has two parts: figuring out what privileges someone has, and enforcing them. Figuring out is easy: just look up the role associated with a (user, project) pair, using the project’s default role if there’s no explicit entry, then see if the privilege needed is in the set associated with that role. In Python, we can encapsulate this check in a decorator function that we can then apply to all the methods that need privileges:

@require_permission(request, WIKI_READ)
def show_a_wiki_page(...args...):
    ...body of method...

where request is the HTTP request object created by Django that holds information about the current transaction (including the user ID and project ID), and “WIKI_READ” is the privilege to be checked. Administration is simple too: a typical system only has a handful of roles (in three years of continuous use, we’ve only needed half a dozen in DrProject), and you can tell at a glance who’s able to do what. Testing is straightforward too, or at least, a lot more straightforward than it was with Trac, in which every user could have a slightly different set of privileges.

But life is never that easy. The problem we face is that Django doesn’t handle the anonymous user the same way it handles authenticated users. If the user associated with a transaction has been authenticated (i.e., if the HTTP request contains a valid cookie), Django’s middleware looks up the user in the USERS table, creates an object of the class User to store the information it finds, and puts that object in request. If there isn’t a valid cookie, though, Django creates an instance of an entirely different class called AnonymousUser and sticks that in request without ever checking the database. User and AnonymousUser both have Boolean methods called is_anonymous and is_authenticated; for each class, one always returns True and the other always returns False. Permission checks are then supposed to do something like this:

if request.user.is_anonymous():
    return True # users can read wiki pages
privilege_set = ...look up privileges(request.user.id, request.project.id)...
if "WIKI_READ" in privilege_set:
    return True
else:
    return False

Three things are wrong with handling anonymous users this way:

  1. If an administrator wants to change what an anonymous user can do with respect to a particular project, she can’t just modify the database using the admin tools provided for managing regular users—she has to actually modify the source. Policy changes shouldn’t require recoding.
  2. Special cases and extra code paths in code greatly increase the chances of bugs; bugs in security-related code greatly increase the chances that something bad will happen to people because they chose to use your software.
  3. The more things developers are required to remember when writing or maintaining code, the greater the odds that they’ll make a mistake. I think the odds are pretty high that sooner or later, one of the students working on Basie will forget to handle the anonymous user in a permissions check, or will handle it incorrectly. Pre-release testing should catch this, but why rely on that if we don’t have to?

Why does Django work this way? It might be a misguided attempt to optmize performance: if you expect that most visitors to your site won’t authenticate, handling their permissions in code saves a couple of database lookups. It also avoids storing an entry in the USERS table that doesn’t correspond to an actual user, and (more importantly) saves Django from requiring special-case code in its administrative tools to prevent anyone from deleting that entry (which would break the whole system).

All things considered, I strongly prefer uniform RBAC plus one extra test in the admin interface over special-case tests for the anonymous user scattered through the whole application. The problem now is how to get there from here. We can’t get Django to create an instance of our User class instead of using its own AnonymousUser class (and no, I’m not willing to consider patching Django itself—it would be a never-ending maintenance headache). I think our best bet is to check the request object created by Django’s middleware as early in the processing cycle as we can to see if request.user.is_anonymous() is True, and if so, replace request.user with an instance of our own User class. If need be, we can add is_anonymous and is_authenticated methods to that class to keep Django happy, but not use them anywhere in our own code.

If it really was that simple, though, the team would have done it by now. What am I missing? What part of this problem have I forgotten or not understood?

Written by gvwilson

June 24th, 2009 at 10:03 am

Posted in Uncategorized

UX Feedback

with one comment

Yesterday Greg brought Ryan Feeley in our lab to give us some feedback on Basie’s and MarkUs’s user interface. Ryan, who hadn’t seen any of the two sites before, acted as an educated new user, critiqued some of the existing interfaces and provided suggestions on how to improve their usability. The thought process of a user experience (UX) feedback was invaluable so the notes I took might be of some use to future developers:

  • The “Home” link is confusing because it takes you to the list of projects, whereas somebody would expect to be directed to the home page of the currently-selected project. Perhaps we should remove “Home”? Or maybe we can rename “Home” to “Projects”?
  • It is confusing to be able to view tickets/wiki/source even when you are not logged in. (You cannot edit either of them without logging in, of course, but Eran is working on authentication and we are addressing this issue). At the very least, it would be helpful if we had a message like “Please log in to add/view wiki.”
  • In the ticket filters section, the field Number sounds very vague. Is it ticket ID or issue #? In addition to that, what does the default “—–” in the drop boxes mean? We should probably replace it with something more specific, like “all” or “none” etc. Specifically, the field Owner can be used to replace the “My Tickets” button by adding “me” on the top of the owners list.
  • In the same section, maybe we can change “Hide columns” to “Show columns” because it’s more positive :-)
  • “Events” can probably be renamed into “Activity Log” or “Events Log” to make their purpose clearer.
  • In the event filters, we should be using datepickers instead of YYYY-MM-DD text fields.
  • We should add some functionality that allows us to filter events by groups of contents, e.g. “Show me only events pertaining to wikis, tickets and mail”
  • The title of each page, which is shown on a browser tab should show the currently-selected project’s name first and then any other info.
  • We should have sortable columns for every table we display, and we should somehow keep the sorting when navigating by using the browser’s Back and Forward buttons.
  • It’s a good idea to display a tooltip containing the first line of the subject of a mail/ticket/wiki etc. when the mouse hovers over a cell in the Subject column of the corresponding tables that show mail/tickets/wikis etc.
  • Maybe we can use truncation for Date columns like the one used on the Mac? In the same vein, maybe we could use truncation of emails like the one in Google Groups.
  • Our pagination widget should show how many pages there are in total. Ryan showed us what he thought was a cool option. Pagination is not necessary when there is onlu one page of items to be displayed.
  • Our “Back to list” links should use chevrons. And our breadcrumbs are awesome! And we shouldn’t use conjunctions to start a sentence :-)
  • “Old Revisions” in wikis should be renamed to “Page History” or something similar.
  • We should keep title case consistent across our buttons and any label that we display.
  • The dashboard should not display negative axes. Blake joked that maybe it is useful for removing tickets and wikis.
  • There is a font resizing issue in the dashboard; we probably forgot to use em’s like everywhere else.
  • The ticket filter is generally too advanced, and we should simplify it.
  • We should be aware that tools like Balsamiq, and iPlotz make designing UI interfaces easier.
  • I left the most important direction for improvement last: we should have an instance of Basie populated with real data available at any time. Tickets named asdf and users named jgfjkhdf simply provide no room for useful UX feedback because they distract the user very easily.

That’s all for now, I hope this was helpful. Thanks for the feedback Ryan!

Written by florian

June 23rd, 2009 at 11:00 am

Posted in Uncategorized

Where We Are on the First Day of Summer

with one comment

Despite haranguing my students to blog regularly, I’ve been pretty lax about writing myself. Oh well — here’s an update on where we are:

  1. Phyliss Lee has finished her six weeks with us, and is leaving for an internship in Japan next weekend. She did great work, particularly on the status dashboard—we hope she has a great time, and look forward to having her back in twelve months.
  2. Zuzel Vera Pacheco has integrated a pure-Python search engine called Whoosh into Basie; all the basic features are now working, and eight patches implementing advanced features are under review.
  3. The ticket, wiki, and repository views have all been updated by Bill, Florian, and Eran, who have also created our first installers.  Eran is now trying to make the anonymous user object supplied by Django look more like a real user so that we don’t need special cases in our access control logic for people who haven’t logged in yet.
  4. Ian Lienert has started working as a volunteer on a conversion tool using SQLAlchemy and Django’s ORM to migrate old DrProject databases to Basie.  Tomorrow will be his first day sitting in with the team.
  5. We’re meeting with other summer students every Thursday morning at 9:00 am in Bahen 5256 for demos—if you’re around, you’re welcome to join us.

Written by gvwilson

June 21st, 2009 at 2:53 pm

Posted in Uncategorized

Fast Apache WSGI Configuration - for local network

with one comment

If you would like to deploy your django site behind an Apache server, one of the ways to accomplish this connection is to point the Apache configuration file for your site to your WSGI script that will start the server with the appropriate configuration files.

Setting up an apache2 server to run with your WSGI script just for your local network can easily be done by following these steps:

  1. sudo apt-get install apache2 libapache2-mod-wsgi
  2. get root permissions (sudo) and create or edit /etc/apache2/sites-available/yoursite to look like that:
    ServerAdmin webmaster@localhost
    ServerName yoursite
    DocumentRoot /path/to/your/site/
    WSGIScriptAlias / /path/to/your/wsgi/script/example.wsgi
    <Directory /path/to/your/site>
    Order deny,allow
    Allow from all
    </Directory>
    WSGIDaemonProcess processname user=ownername processes=1 threads=10
    WSGIProcessGroup groupname
  3. change directory to /etc/apache2/sites-available
  4. Run a2ensite yoursite
  5. Restart the Apache2 server: sudo /etc/init.d/apache2 restart
  6. Now you can access it from the server by running: localhost or from another machine by calling the machine name or ip address.

Tada!

Some helpful links:

http://docs.djangoproject.com/en/dev/howto/deployment/modwsgi/

http://stackoverflow.com/questions/36806/setup-django-with-wsgi-and-apache

Written by henig

June 9th, 2009 at 10:59 am

Posted in Deploy

The Dashboard Revived

with 2 comments

So I’ve spent the past week fiddling with the dashboard and I think it looks pretty snazzy now.

 

 

 

 

 

 

 

The first image is the dashboard upon entering a project. The second image shows what happens when users click on ‘details’ for each graph. Things to note:

 

  • Navigational bar above the main content area. There is still some debate about what ‘Home’ should be. I think that it should be the project home, Florian said it should be the project list. Any input on the matter will be helpful.
  • Only the 10 most recent events get displayed and there is a link to see all the events in the events page. The events page has not been modified as of yet, but I am in the process of redesigning it.
  • When the user clicks on’details’ it opens up the detailed graph with dates just as it used to be and pushes the events down.
  • This is the screen that users will see if they are logged in. However, what should the user see if they are NOT logged in (i.e. an unknown user)? Should this information still be available if the project is public? I’m going to assume that there will be public projects that anyone can see as well as private projects that only registered members can see. SO, in the case of private projects, users not registered to that project shouldn’t even see the project in the ‘Home’ project list. For public projects, non-registered users should still see activity and the dashboard.
  • Another questions is whether the login and user preferences are now in the right place. Personally I think that they’re fine where they are on top of the search bar. However, that can be debated. 
There are some other areas of concern now though. The following is what the ‘Home’ page looks like. 
Because the navigation and search bar is project specific, the ‘Home’ page now looks very bare and lonely. Now imagine that nobody is logged in, that’s another thing that is taken out from the home page.
Clearly, there are several new issues that have surfaced. Any and all input is welcome. Thanks!

Written by pongers

June 8th, 2009 at 3:48 pm

Posted in Design, Development

Batch creation of users and projects

with 7 comments

Today we started examining the way we add new users and new projects to Basie and what we realized is that they can only be done one-by-one. DrProject allowed batch creation of users and projects, and since it is essential to administrators, Basie is going to have to support this functionality as well.

In DrProject we could create a bunch of users by uploading a file containing user names and their respective memberships to projects. This file was parsed and its contents were encapsulated in a request
to the server, which modified the corresponding membership files.

The same procedure was used for batch creation of projects, but it was not perfect, because an SVN repo had to be created for each of them. Creating a repository for one project took ~1s so, when the server tried to do that for about 30 students in a class, Apache timed out the request because it took a lot of time to complete.

Timeouts sometimes occurred with batch creation of users as well, which was very dangerous because it lead to partial failures and bugs that were difficult to discover. Taking these past experiences into account, the approach we currently have in mind is to use AJAX for batch creation of users and projects: for each student/project in the files that we upload, an AJAX request is going to be sent to the server. The client’s result page will be populated one by one with “User created” or something similar. This means that timeouts in requests are avoided and batch creation of users and projects will be supported.

Any comments or objections to this idea are most welcome.

Written by florian

June 8th, 2009 at 10:39 am

Posted in Uncategorized

Basie Dashboard - Go!

without comments

 

 

basie_dashboard_2

basie_dashboard_2

So this week I’ve started working on the Basie Dashboard. I’m picking up where Heather left off essentially. Although I don’t know much about the flot library or java scripting, so manipulating the graphs is going to be tough. I have added a recent events table and a list of quick links for users. I think these are very important especially to get a quick overview of the most recent things that has happened on the project. 

 

Here are some things that i think could make the graphs useful to users:

  • hide/show ability -> this way users can choose which graphs they want to see. I think that having the overall graph visible at all times is useful. This we can toggle the more detailed graph, which is the one that takes up the most space. 
  • I want to have some way of changing the graph styles to be either lines or bars for each user. I know people like to see the totals as breakdowns of individual users in different colours (personally I like line graphs). I’m just not sure how managable that is especially if the project has a lot of developers on it.

As far as the ‘quick links’ go, I added it to the dashboard, because if this is going to be the screen that users see once they sign in, then it is important to be able to navigate away from there. That being said, we still don’t have an easy navigational system for other parts of Basie. At the moment, users have to click ‘back’ or on the breadcrumbs we added to the top of the pages. I think adding a ‘quick links’ widget to each page would be best. 

So far the dashboard is also project specific and not user specific. In the future I would like to have the dashboard user specific, which will display data and news relevant to that user across all the projects they’re in. This may be a v3.0 feature.

Written by pongers

June 5th, 2009 at 1:14 pm

Posted in Design