No ESRI, no problem. Manipulate shapefiles with the Python library gdal

Although we still use ESRI products such as ArcGIS for geographic analysis, more and more we find ourselves turning to open source software like PostGIS and QGIS. In many cases, though, even PostGIS and QGIS are more software than necessary for a given task and we prefer to skip the ‘middleman’ in favor of the stripped down tools available in the Geospatial Data Abstraction Library (GDAL) which can be used to read and write an incredible number of spatial formats and to manipulate geographies (calculating area, converting types, extracting feature details).

For more detail on using GDAL in Python take a look at this great cookbook. Some of the code below comes from the cookbook and we even left in one of their comments for clarification.

The challenge/example:

We already have a Python function that computes the network distance between two points using a popular mapping API. We want to apply this function to two existing shapefiles (origin, destination). Unfortunately, they have projections and, as a result, we can’t just plug in the X and Y coordinates (the mapping API requires lat/long). We need to project the points and extract the coordinates.

We could use traditional GIS methods and the GUI in ArcGIS or QGIS. We could also use ArcGIS’ Python library arcpy (in fact, this is how we did it previously) or python with QGIS. With GDAL, however, we don’t need ArcGIS or QGIS and we can extract the coordinates very quickly with very little code.

1. Install GDAL and import the libraries

Depending our your system details you can use easy_install GDAL or, if you’re on Windows, you can use the binaries available here.

You will need to import the modules ogr (for vector manipulations) and osr (to handle spatial references).

from osgeo import ogr, osr

2. Read in the shapefile as a layer

driver = ogr.GetDriverByName('ESRI Shapefile')
dataSource = driver.Open("myshapefile.shp", 0) #0 means read-only, 1 means writeable
layer = dataSource.GetLayer()

3. Specify the input and output spatial references

You can grab the input spatial references from the layer directly and you can specify the output using the EPSG. To find an appropriate EPSG you can use spatialreference.org. In this case I want to use the unprojected coordinate system WGS84 (EPSG=4326):

sourceprj = layer.GetSpatialRef()
targetprj = osr.SpatialReference()
targetprj.ImportFromEPSG(4326)
transform = osr.CoordinateTransformation(sourceprj, targetprj)
from osgeo import ogr, osr

4. Grab features, project and extract coordinates

Now that you have the layer and the projection details set up you’re ready to iterate through the features, project and extract the coordinates — all done with osgeo. If you’re more comfortable with a for loop you can use this code:

results = []
for feature in layer:
    pt = feature.GetGeometryRef()
    pt.Transform(transform)
    results.append(pt.GetPoint()[0:2]) 

And if you prefer using list comprehensions this code works well

def doProjection(feature):
    pt = feature.GetGeometryRef()
    pt.Transform(transform)
    return pt.GetPoint()[0:2]

results = [doProjection(feature) for feature in layer]

5. Create functions and timing

Using the code below we timed reading in the shapefile, cycling through the points, projecting to WGS84 and extracting the coordinates on a shapefile of 180 points and it took 0.05 seconds — pretty good!

t1 = time.clock()
results = returnCoordSet("myshape.shp")
print time.clock()-t1

def returnCoordSet(inputlayer):
    from osgeo import ogr, osr
    driver = ogr.GetDriverByName('ESRI Shapefile')
    dataSource = driver.Open(inputlayer, 0) #0 means read-only, 1 means writeable
    layer = dataSource.GetLayer()
    source = layer.GetSpatialRef()
    target = osr.SpatialReference()
    target.ImportFromEPSG(4326)
    transform = osr.CoordinateTransformation(source, target)
    return [doProjection(feature, transform) for feature in layer]


def doProjection(feature, transform):
    pt = feature.GetGeometryRef()	 	
    pt.Transform(transform)
    return [pt.GetPoint()[i] for i in [1,0]]

3 responses

    • Not really. Python and R store the geographic layers very differently and none of the methods or approaches are cross-platform.

  1. Im trying to follow your approach to convert a shape file for which the lonlat is in WGS84 and the vertical attribute in NAVD88 to a new shape file where all is in WSG84. Do you have an indication how I define the transformation for the vertical as well?

Leave a Reply to David Cancel reply

Your email address will not be published. Required fields are marked *