UK Postcodes to Latitude/Longitude

27 August 2010

For location-based stuff, it’s useful to take a UK postcode and get the lat/long to plot it on a map (such as Google Maps).

The Google Maps API only has postcodes down to sector-level (e.g. NW1 4), so its results are approximate.

Ordnance Survey released the Code Point dataset free as part of OpenData. Among other things, it includes full postcodes with grid references.

The grid references are OSGB36, rather than lat/long. Converting them is more difficult than you’d think. Here’s the solution I used, based on what I cobbled together from forum posts and the like.

Get the data

Download the Code Point data. It’s a series of CSV files, one for each postcode area.

You might want to concatenate them in to one file. Or if you only want London, postcodes, try the e, ec, n, nw, se, sw, w, wc files.

Convert OSGB36 to WGS84 Lat/Lng

The grid references need to be transformed to latitude/longitude on the WGS84 system. There are a few ways to do this, but I used PROJ4‘s cs2cs program.

The command is:

cs2cs -f '%.7f' +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894 +units=m +no_defs +to +proj=latlong +ellps=WGS84 +towgs84=0,0,0 +nodefs

There is a PROJ wrapper for Python (pyproj), but I wasn’t smart enough to figure out how to do the options, so instead I spawned the cs2cs program from a Python script.

Here’s the script. It’s not great, but it does the job.

#!/usr/bin/python

import sys
import csv
import subprocess
import re

def main(argv=None):
    if argv is None:
        argv = sys.argv

    if len(argv) != 3:
        print """\nUsage: %s input_file.csv output_file.csv\n""" % argv[0]
        return 1
    
    input = open(argv[1], 'r')
    reader = csv.reader(input)

    output = open(argv[2], 'w')
    output.write('flatpostcode,postcode,easting,northing,latitude,longitude\n')


    input_fields = []
    for index, row in enumerate(reader):
        postcode, easting, northing = row[0], row[10], row[11]
        input_fields.append((postcode, easting, northing))
        if index % 1000 == 0:
            process(input_fields, output)
            input_fields = []
            print 'processed', index
    process(input_fields, output)
    print 'done'

def process(input_fields, output):
    args = ['cs2cs', '-f', '%.7f', '+proj=tmerc', '+lat_0=49', '+lon_0=-2', '+k=0.9996012717', '+x_0=400000', '+y_0=-100000', '+ellps=airy', '+towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894', '+units=m', '+no_defs', '+to', '+proj=latlong', '+ellps=WGS84', '+towgs84=0,0,0', '+nodefs']
    cs2cs = subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    data = cs2cs.communicate('\n'.join(['%s %s' % (input[1], input[2]) for input in input_fields]))[0]
    for index, line in enumerate(data.split('\n')):
        if line:
            postcode, easting, northing = input_fields[index]
            data_parts = re.split('\s+', line)
            output.write('%s,%s,%s,%s,%s,%s\n' % (postcode.replace(' ', ''), format_postcode(postcode), easting, northing, data_parts[1], data_parts[0]))

def format_postcode(postcode):
    postcode = postcode.replace(' ', '')
    return '%s %s' % (postcode[0:-3], postcode[-3:])

if __name__ == '__main__':
    main()

Run it with your Code Point csv file as input. The output contains the postcode, OS grid refs, latitude and longitude.

Postgresql

Bonus: if you’re using Postgresql, here’s how to get the csv data in to your database.

Create a table matching the csv file layout:

create temporary table postcode (
flatpostcode varchar(8), 
postcode varchar(8), 
easting varchar(10), 
northing varchar(10), 
latitude numeric(10,8), 
longitude numeric(10,8)
);

Load it in:

copy postcode from '/path/to/your/output.csv' with delimiter ',' csv header;

Then copy the fields you want to your real table.

I hope this saves you some time and bafflement. It would be nice to just give you the postcode + lat/long data, but redistributing it is against Ordnance Survey’s terms of use.

Filed under: Data — Scott @ 6:23 pm