Basie Blog

A Lightweight Software Development Portal in Django

WikiApp or OurOwnWikiApp ???

Comments

After writing a (very) simple Wiki app for Django I’m ready to dive into the
code of Django WikiApp, a simple wiki revision and markup support, so that I
will have a better overview of how it should be.

Lets start from the core of the app, models.py:
This app includes two models, an Article that is a WikiPage and ChangeSet that
represents the older version of the Articles. If we are implementing a generic
history mechanism that can be plugged to each app in the project there is no
need for the one supplied here.

So I already arrived to the point when I think that we shouldn’t use the
WikiApp in our project because I believe that adding suggestions and fixes to
our app is far better than ripping off the revision mechanism carefully
without breaking the rest of the code.

Now some interesting fields that WikiApp includes are:
-summary: A fixed length string that includes a brief summary of the page.
although I don’t understand why would we want to see a summary of a WikiPage.

-markup: This is a good one! Why support just one Wiki Markup when we can
support more than one and have the option to even add more! We should
definitely add this one to our app (even if we will support just one, at least
we will have the foundations ready). This field will get an tuple argument of
the markup choices that will have a default value and will be able to be
configured by setting setting.WIKI_MARKUP_CHOICES to a different value.

-creator_ip: Because our project will require authentication and each user
that will edit a Wiki page will be known and authorized by the system, there
is no need to trace ip addresses unless it will be required by a lock system.

-last-update: Will be supported by our generic history app.

-content_type, object_id and group: This fields are allowing the ForeignKey
relationship to point to any other model instead of just one. I’m not sure if
we need it in our app, so lets leave it for later.

-tags: I don’t see any use of that field and I also don’t see any reason why
importing tagging at the beginning of the file will not fail, unless it is
suppose to refer to django-tagging app (not mentioned in the dependencies
file)

Next, lets take a quick look at forms.py so that we will be able to get to
views.py that will probably make a big difference:
An ArticleForm exists that includes a clean_title function that raises a
ValidationError if the page title is not a WikiWord. In our app we decided to
support any title input from the user and automatically slugify it uniquely
for url purposes, so we won’t need this one.

The following function is a rewritten clean function that takes a new form
data and is responsible for inserting cleaned data into the cleaned_data
dictionary of the form data in a way that is generic for each type of field.
I really like the implementation and I’m looking forward to dive into the
views.py file so that I will be able to see better the whole picture.

The last function is a rewritten save function that creates a new revision.
Like we said before if we are creating our own generic history app we won’t
need that part.

After we gathered all that information we can take a look at views.py. At the
very beginning of that file there is a WIKI_LOCK_DURATION that is set from
settings or by a default value. How could I miss that part!! I definitely
should add it to our app, so that there will be a soft editing lock on an
article (remember: tell the author that the comments indicating the txt file
that the measure is in seconds and in views it is said minutes…)

Now after a quick look at the code I see that it includes permissions support
(supported by our a3c model), rss feeds support (good feature but will need to
be generic in our project and not by app), articles history (ignored from
same previous reason), search article (will depend on the requirement of our
future search application) and the rest is the generic list/view/edit methods.

So I think I can wrap up my thoughts at this point and conclude that we cannot
use the WikiApp for our purposes (although the code is in a very high quality
and really well organized) because it addresses many features mixed together
into a whole (great) application, but our project needs to have generic
distinct app that implement each feature so that code will not be repeated in
each app. separation between different applications like authentication,
history, search and feeds is very important to a medium/large scale project.
Some ideas that I will take from this project like the edit page lock and the
multi-markup support will obviously help me build a solid foundation toward a
better wiki application that I can plug into Basie.

Written by henig

November 9th, 2008 at 2:50 am

Posted in Uncategorized

Helpful Errors

Comments

Have you ever used a program, all of a sudden to have it explode with a message like:

   Error: unable to create directory.  You lose.

How have you figured out which directory it couldn’t create?
Reading the source? Running strace? Giving up?
Yea, all those options suck. And it’s all because some developer, somewhere, wanted to save about eight keystrokes:

   printf("Error: unable to create directory.");

verses:

   printf("Error: unable to create directory %s.", dir);

So, moral of the story:
When ever you fail (err, that is, when ever you are writing pre to deal with failures…), think “when someone sees this error, what information will they need to solve it?” And then make sure that you include that information.
For example, if you are raising a 404, instead of leaving no message:

   raise Http404

or a message with only a general error:

   raise Http404("Widget not found")

include as much information about the failure as is reasonable:

   raise Http404("Widget %s not found in bucket %s" %(widget_name, bucket_name)

Written by wolever

November 7th, 2008 at 1:22 am

Posted in Uncategorized

Error handling in Basie

Comments

The error handling in Basie is implemented through the errors app. Its principal settings are:

  • LOG_FILE: path of the file that will contain the logs.
  • LOGGING_LEVEL: CRITICAL, ERROR, WARNING, INFO or DEBUG.

The logger lives in the errors module and you can use it like in the following example:

from errors import logger
 
logger.warning("This is a warning.")

The errors app defines a base class for exceptions named AppError. When an exception derived from that class is raised and the DEBUG setting is False the errors app logs its message and shows an error page. To customize that process the AppError class has the following attributes:

  • status_code: http status code of the error page.
  • appname: name of the application that defines that exception (the message in the log will be “appname: message“).
  • level: logging level corresponding to that exception.

When the DEBUG setting is True or some exception not derived from AppError is raised it is logged in ERROR level.

Written by zuzelvp

November 3rd, 2008 at 3:21 pm

Posted in Development

WikiCreole

Comments

Use Wiki creole:

How does it work?

Wiki creole’s parser is provided with a Wiki raw string and parse it using regular expressions into a DocNode (creole class for a document tree) object. The HTML will be generated from that Document later according to each individual needs by specifying emit document node functions.

  • Create a new parsed DocNode Document object by providing the Parser class with (unicode) raw text, and call the parse method.
  • Now that we have a Document tree parsed from the Wiki raw string we can convert it to an HTML recursively according to the _emit() functions from the HtmlEmitter class.
  • When the previous step ends the HTML output is returned and can be encoded as wanted.
  • How can we use creole.py?

    create the HtmlEmitter class:
    That will be the HTML generating class that will include all the %s_emit methods (were %s % node.kind) that will emit to DocNode interior (content and children) according to the current HTML generating parser evaluation.

    For example:
    def strong_emit(self, node):
    return u'<b>%s</b>' % self.emit_children(node)
    def bullet_list_emit(self, node):
    return u'<ul>\n%s</ul>\n' % self.emit_children(node)

    How can we configure it to fit our needs?

    Changes can be introduced in two different phases of the process: Before or after the creation of the DocNode tree object. The element content is saved into the content field of a DocNode object, meaning that if you will need to parse the content it needs to be done after the creation of the document inside the HtmlEmitter class (You can add element specific fields to the node but not to the DocNode object constructor).

    Two aspects need to be considered:

      1) Parsing a specific element content (as link targets) from the DocNodes tree (i.e, self.content):
      - Create a Rules class (different from the creole.py Rules class) that will includes the additional regular expression parsing rules.
      - To get internal byte-code representation of the regular expression patterns compile (re.compile) all the new rules (in the initialization phase of the HtmlEmitter class) and use them inside the _emit functions where needed.
      2) If you would like to introduce changes to Wiki syntax (as adding Macros) you will need to edit the creole.py source code:
      - First edit the Rules class to fit your demand by editing the wanted elements’ regular expression matchings.
      - Edit the element replace function (_%s_repl % element), save the regular expression arguments (by group name) and create the new DocNode object corresponding to the element.

    Written by henig

    October 25th, 2008 at 3:12 pm

    Posted in Uncategorized

    Error Handling in Django

    Comments

    When something else invokes Django, a handler, that is an instance of some subclass of the BaseHandler class, is created. When the handler is instantiated, a couple things happen immediately:

    • The handler imports Djangos custom exception classes.
    • The handler calls its own load middleware method, which loads all the middleware classes it finds listed in the MIDDLEWARE_CLASSES setting and introspects them.

    Middleware

    Each middleware component is a single Python class that defines one or more of the following methods : process_request, process_view, process_response and process_exception. The last one is called when a view raises an exception.

    Middlerware’s process_exception method

    process_exception(self, request, exception)

    • request is an HttpRequest object.
    • exception is an Exception object raised by the view function.

    process_exception() should return either None or an HttpResponse object. If it returns an HttpResponse object, the response will be returned to the browser. Otherwise, default exception handling kicks in.

    Default exception handling

    The default exception handling comes in several layers:

    • If the exception was Http404 and the DEBUG setting is True, the handler will execute the view technical_404_response , passing the HttpRequest and the exception as arguments. This view displays information about the patterns the URL resolver tried to match against.
    • If DEBUG is False, and the exception was Http404, the handler calls the URL resolver’s resolve_ 404 method; this method looks at the URL configuration to determine which view has been specified for handling 404 errors. This defaults to page_not_found , but can be overridden in the URL configuration by assigning a value to the variable handler404.
    • If the exception was PermissionDenied the handler will return a response of the type HttpResponseForbidden.
    • For any other type of exception, If the DEBUG setting is True, the handler will execute the view technical_500_response, passing the HttpRequest and exception information as arguments. This view provides detailed information about the exception, including the traceback, local variables at each level of the stack, a detailed representation of the HttpRequest object and a listing of all non-sensitive settings.
    • If DEBUG is False, the handler calls the URL resolvers resolve_500 method, which works in mostly the same way as resolve_404; the default view in this case is server_error , and can be overridden in the URL configuration by assigning a value to the variable handler500.

    Additionally, for any exception other than Http404 or Pythons built-in SystemExit, the handler will fire the dispatcher signal got_request_exception, and construct a description of the exception which is mailed to each person listed in the Django settings files ADMINS setting before returning.

    Signal got_request_exception

    Django includes a signal dispatcher which helps allow decoupled applications get notified when actions occur elsewhere in the framework. In particular the built-in signal got_request_exception is sent whenever Django encounters an exception while processing an incoming HTTP request. Arguments sent with this signal are sender (the handler) and request.

    To receive that signal, you need to register a receiver function that gets called when the signal is sent. For example:

    import sys
    from django.core.signals import got_request_exception
     
    def my callback(sender, ∗∗kwargs):
        request = kwargs["request"]
        exc info = sys.exc_info()
        print "We got an exception"
     
    got_request_exception.connect(my callback)

    Error reporting via e-mail

    When DEBUG is False, Django will e-mail the users listed in the ADMIN setting whenever your code raises an unhandled exception and results in an internal server error (HTTP status code 500). This gives the administrators immediate notification of any errors. The ADMINS will get a description of the error, a complete Python traceback, and details about the HTTP request that caused the error.

    404 errors

    Django can also be configured to email errors about broken links (404 ”page not found” errors). Django sends emails about 404 errors when:

    • DEBUG is False.
    • SEND_BROKEN_LINK_EMAILS is True.
    • Your MIDDLEWARE_CLASSES setting includes CommonMiddleware (which it does by default).

    If those conditions are met, Django will e-mail the users listed in the MANAGERS setting whenever your code raises a 404 and the request has a referer.

    Handling errors not raised by a view

    Not all errors need to be raised by a view function, or something which happens inside it. For example all django applications can register their own actions with manage.py. In that case Django defines a CommandError exception that is a subclass of Python’s built-in Exception. If any action raise an exception derived from CommandError it will be hadled by a try statement that writes the exception to the standard error (sys.stderr).

    See more on:

    Written by zuzelvp

    October 24th, 2008 at 2:30 pm

    Posted in Django

    Basie Search

    Comments

    Currently we’re looking at adding a search capability to Basie. The approach we’re going to take is to create a separate application that would contain the logic required to perform searh. The application itself should be as self-contained as possible, meaning that any other applications that want to have a search capability need to have minimal interaction. Of course, we can’t make it fully independent, because at the lowest level the search will be performed on the models which are defined in applications that come with the Basie project. Think of the email application. In order to perform a search for emails, the search application needs metadata about how the email data is stored in the database. Which fields should be indexed/queried? Obviously the email application needs to provide some information to the search application, however it should only be metadata upon which the search application can do its work.

    The possible solution to this first design question is to have the models explicitly provide a list of fields which should be indexed and queried against by the search application. This approach is taken by django-search app, originally derived from the search api branch from the official django repository.

    The next problem is how do we provide an interface for the caller to query Basie through the search app and a closely related question about how to represent the search results. Email application and ticketing application, which are the closest to being finished, currently provide many models which could be queried against. The way the query is done against different models is always unique but shares some similarities. For example, allowing a user to use logic operators on different fields will be the same regardless of the model. However, how does the caller know what the fields are? In case of SOAP webservices, this metadata is provided through a WSDL document, which is just XML file that describes the properties of objects. Amazon’s AWS example can be seen here. From my understanding of REST, it’s possible to return a JSON file that could contain some metadata information about which fields are available for a search component (i.e. model) and what their data type is. This way, the client could work with the models dynamically (i.e. using reflection to specify correct data types for their query application and then using correct data types when doing the queries).

    The previous problem is related to how we present information to the user about which components (models) are available for search. If we add a new application, and let the search app know which models and fields are available for searching and indexing, the search application could generate this JSON file automatically, so there’s no additional overhead.

    The next obvious question is how do we return the actual results to the user. Should we just return the URL to the resource (the way DrProject/Trac does it), or should we return data in the same format as the REST API for those applications? Obviously doing the latter is better, and it will force us to be careful about design of the REST API as well as keep the RFC consistent across applications. This part is very tricky, but I’m hoping that the use of django.core.serializers will be consistent across applications, and that way the REST API for search will return data in the same format as when a client would have asked for a specific resource. So then on the web browser side, it’s just a matter of presenting results in a generic way (i.e. showing only fields that can be queried against in the model). As for linking, I’m really fuzzy here. It would be obviously nice  to have a link to a resource that was found. The way a specific resource accessed through URL can differ (i.e. how does the search app know that if it find a Project “A”, it’s located in http://foo.bar/project/A? Should we follow strict URL conventions, and should our model names be closely tied with the URL’s? Or should the model have a field in its Meta class that says what the URL bread crumb for that model is?).

    Written by kosta

    October 12th, 2008 at 11:56 pm

    Posted in Uncategorized

    Comparing Wiki Engines

    Comments

    In my search for the ideal Django wiki I have come up with what seem to be three serious contenders: Diamanda, Sphene Community Tools and Django WikiApp. Apart from these, there is a slew of “hey, cool! I can make a wiki!” type apps, which I have ignored (these include projects with fewer than 10 or 15 revisions in the repository, projects with very little documentation and so on).

    First, Diamanda. All over there are promises of a wiki… But try as I might, I can’t actually seem to find it. So, unfortunately, this contender is out.

    Second, there is Django WikiApp. It is the least mature of the three projects, with only 166 revisions in source control but the code seems to be high quality.

    Finally, Sphere Community Tools. They seem to be the most mature of the three, even including a small example application in their repository. It seems to have all the features we need… But the code feel less than ideal and the wiki requires the entire Community Tools framework, which could be a pain.

    So, in short, my vote goes to taking Django Wikiapp, spending a day trying to integrate it (especially with A3C) and see how things go.

    Written by wolever

    October 9th, 2008 at 2:18 pm

    Posted in Uncategorized

    Incrementing Numbers Through Custom Fields

    Comments

    The Problem: We want our ticket system and other applications to have IDs or numbers starting at 1 for each project so users are not confused when there tickets go from 100 to 105 to 200 (or some other illogical sequence from the point of view of a users looking at a signal project).

    Proposed Solution: Create a new custom field witch will work like AutoField as closely as possible however rather then incrementing all rows in a table sequentially it will update them relative to another column in the table. For example in the ticket system if you had 3 rows with project id 1 and 3 rows with project id 2 each row with project id 1 will get number fields of 1, 2 and 3 (with respect to the order there where created) and for each row witch project id 2 they will get number fields 1, 2 and 3 respectively despite being in the same table.

    Implementation: To implement this custom field i propose hooking in to the post_save signal of the model the field is in and triggering a function witch will run a custom update query witch will be run right after the model is saved and set the fields value to the maximum value for the field in the database witch respect to the column the field is incrementing with or 1 if there are no rows in the table. This should avoid any threading problems that could arise from another ticket being made at the same time and getting the same value as the logic for finding the maximum value is in the update query and done on the DBMS side.

    To avoid the potential problem of a lock or other even happening between when the model is inserted and the update query is run i propose putting “@transaction.commit_on_success” above the view function responsible for creating a new ticket or other object with this custom field. This should tell django to use a signal transaction for both the insert and update query and if the function errors out such as in the event of an lock or error between the insert and update the transaction will not be committed to the database.

    Advantages of this method:

    • It’s simple for developers: Developers will just need to use this field in there model and possible place @transaction.commit_on_success above there view function that creates a new object and the custom field should take care of the rest.
    • It does not need any complicated locking.
    • It does not need another table.
    • It is thread safe (as far as i can tell).
    • It works for any application and model: A developer would just need to import the custom field and use it.
    • It is DBMS independent and requires no SQL queries or lines to be made to check for or do something for different DBMSs.
    • An implementation of it is currently working in the ticket branch.

    Disadvantages of this method:

    • To know what the value of the custom field is, the model has to be saved first. However this is also how AutoField works (where you do not know the id or a row in till it is saved in the database).
    • Some DBMS may not support djano’s transaction system and might default to autocommit mode witch could lead to a ticket or object being left with a null value for the field if a lock or error happens between the insert and update queries. (This might be able to be fixed with a few custom SQL lines made for DBMS that do not support the transaction system.)

    Written by Dan

    October 2nd, 2008 at 5:25 pm

    Posted in Design

    Starting Design of User (Self-)Administration

    Comments

    Administer Multiple Users

    1. Admin is logged in (i.e., a user is logged in, and is a portal administrator).
    2. Admin select “users” from “admin” menu => system displays table showing key properties of all existing unlocked user accounts.
      Locked ID Real Name Email Project Role Forwarding
      gvwilson Greg Wilson Alpha
      Beta
      bwinton Blake Winton Alpha
    3. Admin changes some or all properties of users and clicks “submit” => system updates properties as requested and redisplays mass-editing form.

    Notes

    1. Need to add some way to remove a membership (a column marked “Delete” with an ‘X’ button?).
    2. Need to add some way to add a membership—this is a common operation.
    3. No way in the bulk editing interface to add or delete email addresses—those operations are done through “Edit User Information” below.
    4. Changes should be highlighted in some way so that Admin can easily see what’s going to change when “Submit” is selected.

    Create New Users

    1. Admin selects “users” from “admin” menu => system displays same mass-editing form as above.
    2. Admin clicks “add” button at bottom of table; system appends new blank row to table with editable text fields for “ID”, “Real Name”, and “Email”, and pulldowns for “Project” (populated with available project names) and “Role”.
    3. Admin fills in values and clicks “submit” => system redisplays table with new user information.

    Notes

    1. Admin might add several users before clicking “submit” to do batch creation.
    2. Either all creations succeed or none do (transactional). If failure, table is redisplayed with all the information Admin provided in place and errors highlighted.

    Edit User Information

    1. User is logged in and selects “Preferences”, or Admin is logged in and selects a user ID link from the users mass-editing display shown above => system displays preferences page for that user.
    2. User/Admin sees the following tables:Real Name:
      Email Forwarding Delete
      gvwilson@cs.toronto.edu X
      gvwilson@third-bit.com X
      Project Role Forwarding
      Alpha
      Beta
      Project Commits Emails Wiki Edits Tickets Created (Open) Tickets Created (Total) Tickets Owned (Open) Tickets Owned (Total)
      Alpha 5/35 17/44 0/3 6/12 27/90 8/17 34/110
      Beta 5/35 17/44 0/3 6/12 27/90 8/17 34/110
      10/70 34/88 0/6 12/24 54/180 16/34 68/220
    3. User/Admin updates values and clicks “submit” => system modifies settings and redisplays tables.

    Notes

    1. Need to add some way for users to add more email addresses (”Add” button on first table).
    2. Need to add some way for users to enrol in projects and remove themselves from projects (”Add” and “Delete” buttons on second table).
    3. Either all changes succeed or none do (transactional). If failure, table is redisplayed with all the information provided in place and errors highlighted.
    4. The third table (relative/absolute status in all projects) should be implemented as a module so that it can be used on the “launchpad” page users see when they first log in.

    Written by gvwilson

    September 28th, 2008 at 11:25 am

    Posted in Design

    Managing Test Data

    Comments

    The problem: many module tests depend on a common set of test objects (for example, instances of ‘User’ are required both for testing tickets and testing wiki pages). Django does not seem to have a good way to do this.

    The “Django” way of loading test data are things called “test fixtures”. Basically they are json/yaml/xml files which contain descriptions of the instances which are needed at test time. For example:

        ...
        {
            {
            "pk": 3,
            "model": "auth.user",
            "fields": {
                "username": "pending_confirmation_user",
                "first_name": "",
                "last_name": "",
                "is_active": true,
                "is_superuser": true,
                "is_staff": true,
                "last_login": "2008-08-29 03:42:19",
                "groups": [],
                "user_permissions": [],
                "password": "sha1$bb725$20279401313e22f629f9dbbf7fd54338ca51c50f",
                "email": "",
                "date_joined": "2008-08-29 03:42:19"
            }
        }
        ...

    But there are a few problems with this scheme:

    • There is no way to define a set of “template” data. This means that every field must be filled in every instance, instead of only providing the fields relevant to the test. For example, above, the only important field is “username” (”pending_confirmation_user”) — every other field is just overhead.
    • Fixtures are very unforgiving. Leave out a field? Add in an extra field? Reference a foreign key which doesn’t exist? Don’t include a primary key? Boom. Everything breaks. Now, granted, this is a good quality to have when you’re loading production data… But for test data it’s way overboard — only the data relevant to the test at hand should be required.
    • There is no way to share test data. When I’m working on testing tickets, I’ve got to create my own “User” data — which means I need to know all of the fields of “User”. This is a problem for two reasons: first, it means lots of duplication. Second, it means that any schema change anywhere will break every fixture which uses the classes which have been changed — even if the change is trivial and unrelated (for example, ticketing tests will fail if the “password” field is removed from “User”).
    • The opaque integer primary key must be explicitly specified. I’ve mentioned it before, but this really sucks. Manually specifying opaque integer keys gives me the creeps and encourages opaque code: “u = User.objects.get(id=42)”.

    So what should we do about this? Here is my proposal:

    • Storing data in fixture-like files is good. Keep that. It’s nice to separate the data from the code, and if the fixtures are written in a sane markup language (YAML?) they can be very easy to read.
    • Some sort of inheratence should be possible. For example, a “template” User can be defined in one place, then other instances of User can inherit from that template, only changing the relevant fields:

          model.User:
              __is_template__ = True
              name = Basic User
              last_login = 2008-08-29 03:42:19
              locked = False
              password = sha1$bb725$20279401313e22f629f9dbbf7fd54338ca51c50f
              email = foo@example.com
              date_joined = 2008-08-29 03:42:19
              ...
      
          model.User:
              __template_name__ = Basic user
              name = Locked user
              locked = True

    • It should be possible for other tests to inherit the template. For example, when it comes time to test the ticketing system, I can just call on the template when I need a ‘generic’ User:

          model.User:
              __template_name__ = Basic User
              name = Ticket Creator

    Written by wolever

    September 25th, 2008 at 4:40 pm

    Posted in Design, testing