Case-insensitive ordering with Django and PostgreSQL

20 November 2007

When the Django Gigs site first went live we noticed the ordering of developers by name was not right. Those starting with an uppercase letter were coming before those starting with a lowercase letter.

PostgreSQL and the locale

PostgreSQL has a locale setting which is configured when the cluster is created. Among other things, this affects the ordering of results when you use the SQL order by clause.

The local on my server was set to “C” which means it uses byte-level comparisons, rather than following more complex rules for a given culture. Although this is apparently good for performance, it means order by will be case sensitive – e.g. “Zebra” comes before “apple”.

Depending on how your system is set up, you may have locales such as en_GB. The locale can’t easily be changed in PostgreSQL because indexes and other data depends on it. To change locale, you need to start a new cluster and move databases to it.

Django and case-sensitivity

Django provides the order_by() function on QuerySets, but does not have an option for case insensitive ordering. Instead this is left to your database configuration.

When using SQL directly, you can sort case-insensitively using the PostgreSQL lower() function.

e.g.

select * from developer order by lower(name)

One way to do this in Django is to use extra to call the lower() function, creating a virtual column which you can then order by.

e.g.

Developer.objects.all().extra(
select={'lower_name': 'lower(name)'}).order_by('lower_name')

Using SQL functions could tie you to a particular database, though in this case the lower() function is standard and should work with most databases. Some other databases do case-insensitive comparisons so wouldn’t need it.

Filed under: Django — Scott @ 8:38 pm

9 Comments »

  1. Ayup: Postgres doesn’t support collations. It claims to, but in truth, as far as I know, what it does is simply to use the OS-level LC_COLLATE variable (or a delegate thereof like LC_ALL) to its sorting functions and consider it a done deal. So it’s impossible to have per-database, or per-table, or, heavens forbid, per-column collations, as in, you know… other RDBMS.

    This is easily the biggest wart in this otherwise wonderful RDBMS and I WISH they’d stop pretending they support collations. For all intents and purposes, they don’t.

    Comment by Sundance — 21 November 2007 @ 10:52 am

  2. Thanks for the hint. It is quite useful!

    Comment by DjangoGuy — 13 November 2008 @ 5:02 pm

  3. Very nice. A great simple solution.

    Comment by Chris — 16 July 2009 @ 9:36 am

  4. Great solution, super helpful. Thanks for posting it.

    It might be helpful also to know that Wikipedia moved the page you linked to about different SQL dialects. The new page is:
    http://en.wikibooks.org/wiki/SQL_Dialects_Reference/Functions_and_expressions/String_functions

    It looks like the syntax is ok unless for some reason we want to switch to Virtuoso =)

    Comment by Dave Naffziger — 27 October 2009 @ 9:04 pm

  5. Yes, unless just released a new version with collation support, what users are requesting to be able to control (override) this:

    ‘AAA’ false in collations where sort order is like aaa AAA bbb BBB … (dictionary sort order)
    ‘AAA’ true in collations where sort order is like AAA … ZZZ aaa … ZZZ (binary)

    It is VERY important to be able to override ANY operator to a given collation. In my experience, most databases do allow this.

    for example:

    ‘AAA’ < 'aaa' collate SQL_Latin1_General_CP1_CS_AS
    CS = case sensitive
    AS = accent sensitive

    'AAA' < 'aaa' collate SQL_Latin1_General_CP1_CI_AS
    CI = case insensitive
    AS = accent sensitive

    or Oracle's nls_sort() for example.

    Also, these databases can be told what collation to use at various levels including per db/table/statement so on.

    I find this VITALLY important. So postgreSQL, I suggest very strongly that you do work on this.
    Depending on LC_XXX is okay for default, but you must be able to override as far down as to the:

    = = != operators
    sort by clauses
    like operations
    so on…

    Check out Oracle, SQL Server, MySql, so on… they all have it.

    Cheers,
    Ron Lentjes
    LC CLS

    Comment by LC CLS Ron — 29 November 2010 @ 6:11 am

  6. Ok, you’ll have to read . between . the . lines. Most < (less than) and > (greater than) are missing.

    Comment by LC CLS Ron — 29 November 2010 @ 6:15 am

  7. Great Solution, Is it works with foreignkey values?
    Forexample:

    Class A(models.Model):
    name = models.CharField(max_length = 50)
    …..

    Class B(models.Model):
    a = models.ForeignKey(A)
    …..

    Now i want to order the all “b” objects based on “a__name”. Is this is possible?
    Can any one help me.

    Comment by ram — 14 March 2011 @ 8:18 am

  8. Thanks a lot !!
    Love this solution.

    Comment by MMusashi — 18 April 2011 @ 8:27 am

  9. Depending on your table size/need for efficiency, another method is to create a ‘sort’ field on your table. Example:

    class FoodItem(models.Model)
      	name = models.CharField(max_length=255)
      	sort_name = models.CharField(max_length=255, blank=True)
    	
    	def save(self):
    		if self.name:
            	        self.name = self.name.strip()
    			self.sort_name = self.name.lower()
                    else:
    			self.sort_name = ''
    			
    		super(FoodItem, self).save()
    		

    Then the ordering would be:

    l = FoodItem.objects.all().order_by(‘sort_name’)

    Comment by raman — 4 February 2012 @ 7:59 pm

RSS feed for comments on this post.

Leave a comment