Install Postgres/PostGIS and get started with spatial SQL

PostgreSql is already the second most loved database according to the 2018 StackOverflow survey and it’s the fastest growing in popularity . It’s billed as the “world’s most advanced open source relational database”, and with its comprehensive adherence to SQL standards compliance, it’s easy to see why users love it. But when you add the spatial SQL support that comes with the PostGIS extension, there truly is nothing else out there like it (at least not for free).

“But I already use GIS why would I bother with a database.”

For those not already convinced that they need a database to do GIS work, here are my top 4 reasons to start using one:

  1. Reproducibility – rather than clicking buttons, you can completely script an analysis with just SQL. Need to redo the analysis and tweak a few things? Update the code and run it again. This can also help automate repetitive tasks and skips laborious menus and forms.
  2. Freedom – speaking of buttons, what if you need to accomplish something across several different layers that isn’t supported by a GUI option? With SQL you get freedom to stitch together the analysis you need.
  3. Less limitations – But what if your data doesn’t fit into the 2GB limit of a shapefile, or the  the process is too slow or never finishes in your GIS? Modern databases like Postgres are designed to deal with large datasets, and process millions of records much faster than a traditional GIS, or even scripts with Python or R.
  4. Organization – Have you ever opened your own GIS project, and started crying when you look at the mess of temporary layers that were created in the pursuit of that one perfect map? I can often run a set of several steps with either a single SQL query or SQL script. This saves me time, helps keep me organized and makes it easier to re-run tasks. Layers are also easily organized in schemas or distinct databases, and every table can store metadata (comments) about itself and every column it contains.

If you’re still not convinced, check out this great article all about PostGIS:

View story at Medium.com

So, whether you need local spatial processing for data science or GIS, or you plan to support spatial queries in an application backend, PostgreSql/PostGIS is a worthy tool. In this post, I’ll cover the basics of getting PostgreSql with the PostGIS extension installed on both Mac and Windows for development purposes (in production, we use Ubuntu Linux, which is another post), and look at ways you can also manage your database, import spatial data and see the results of a spatial query. From here on, I will also refer to this combination just as “Postgres”.

Mac Installation

The official documentation for Postgres lists 2 options for Windows installation. For MacOS, it’s six. That can be a barrier for users just getting started with Postgres. The official PostGIS documentation summarizes the options better; if you want a dead simple way to get started, use Postgres.app. For more advanced users, use Homebrew.

It used to be the case that Homebrew was the better option if you needed support for other PostgreSql extensions, like PgRouting or Mapbox Vector Tiles (MVT) support. However, Postgres.app has been adding support for other extensions, and now comes with support for PostGIS, PLV8 and MVT already built in. If you want PgRouting, you can use Homebrew to install the dependencies it needs, and build it from source.

To install Postgres.app, it’s as simple as downloading and moving the application to your Applications folder. The 1-2-3 installation steps on the Postgres.app docs couldn’t be easier to follow:

https://postgresapp.com/#installing-postgresapp

Make sure to follow step #3, as you will need the aforementioned command line tools to import shapefiles and geojson into PostGIS. By default, Postgres.app will startup automatically at login, and you’ll be able to see this if you click the elephant icon, and select “Open Postgres”:

For Homebrew users I’m going to assume you know what you’re doing, but here’s a nice rundown of what to do:

http://exponential.io/blog/2015/02/21/install-postgresql-on-mac-os-x-via-brew/

Make sure to follow all the instructions, including the database creation step. This will ensure you have the same setup as the Postgres.app users. Homebrew users will also have to manually install the PostGIS extension:

$ brew install postgis

Windows Installation

For Windows, the choice seems more clear about which software package to use for installation, as there are only two choices, and only one of those is supported with complete official installers from PostGIS. So we recommend the Enterprise DB installer, which you can download for here:

https://www.enterprisedb.com/downloads/postgres-postgresql-downloads

Take the latest version of Postgres available (11 as of 3/2019). Run the installer and accept all the defaults. Make sure to enter a password that you can remember or is stored appropriately. After the installation completes, you will be prompted about whether you would like to install any additional software with “Stack Driver”. Select the only database in the list and hit next.

Open the Categories -> Spatial Extensions, and check the box for the PostGIS bundle. You can check if the version available here matches the most recent release of PostGIS here:

https://postgis.net/windows_downloads/

If the version is recent enough, you can just install the version from Stack Driver, otherwise you’ll need to download the installer directly from the PostGIS page. Accept all the defaults, but select the option to create a “spatial database” and make the name of the database the same as your username. This will ensure that the instructions below can be followed by both the Mac and Windows users.

The initial database you create with this method will already have the PostGIS extension enabled, but in the future, when you create a new database, you will need to manually enable the extension, which is covered in the next section on psql.

Additionally, to make the Postgres/Postgis command line tools available globally, you can run a script that is included with the install. You can add the script to your system-wide shell startup script (not covered here, see the reference for more), but for now, when you open a new command prompt, execute the following:

C:\>"C:\Program Files\PostgreSQL\11\pg_env.bat"

Note that the quotes are very important here, because “Program Files” has a space in it, the command won’t execute correctly without the quotes. You will need to execute this command every time a new terminal is opened unless you add it to a system-wide shell startup script.

PSQL

Once you have Postgres installed (with either method), you will now also have the Postgres command line client installed, which is called “psql”.  Mac users will need to activate the PostGIS extension in the sample database that was created (and Windows users will need to know how to do this for additional databases). Psql is not necessarily the easiest way to interact with your databases, but it can do everything you need. In the next step, we’ll install PgAdmin and look at other programs, but for now, it’s good to know that psql at least exists.

For Mac users run:

$ psql

from the Terminal.  You can do this because running just “psql” uses standard defaults, and is the same as running:

$ psql --port 5432 --host localhost --username=username

If you’re using Postgres.app, you can also double click on the database, and you should automatically connect to the Postgres. If you did not double click on the Postgres.app database, after starting “psql”, you’ll then need to connect to the “database”. This can often be confusing for new users, who think they already have connected to the database. But in reality, we’ve connected to postgres, within which we can create individual databases. For instance, each project you work on might warrant its own database. Run these commands to connect to the sample database:

# use \l to list the available databases
=> \l 
=> \c yourusername

For Windows users, search for the “SQL Shell (psql)” program (it was also installed by the EnterpriseDb installer), and accept some of the defaults, but supply your username as the database name, and enter the password you created for the postgres user.

Now, that both Windows and Mac users are in the same place, we can execute the final command we need:

=> create extension postgis;
=> \q

Note for reference, these are what the psql commands mean:

  • “\l” is the command to list the available databases
  • “\c” is used to connect to the database
  • “\q” is the command to quit
  • For a complete reference to commands used in psql, see this great psql guide

Loading Spatial Data

The next step is to actually add or “import” some tables with spatial data. We’ll use two layers, a state polygon layer from the US Census, and a point file of wind turbine locations from the USGS. With these layers we can do a classic “point in poly” query, finding all points within a state polygon. To start, download the “cb_2017_us_state_20m.zip” state shapefile from:

https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html

”Unzip the file, and then we’ll use shp2pgsql, a command line utility that’s already been installed along with Postgis, regardless of the method you followed. You can find the documentation about this command line tool in the official Postgis documentation:

https://postgis.net/docs/manual-2.5/using_postgis_dbmanagement.html#shp2pgsql_usage

The command we’ll need for the state shapefile on the Mac is:

$ cd cb_2017_us_state_20m
$shp2pgsql -I -s 4269 ./cb_2017_us_state_20m.shp public.us_state_20 | psql -U postgres -d yourname -p 5432

For Windows, it’s slightly different

C:/> cd cb_2017_us_state_20m
C:/>shp2pgsql -I -s 4269 ./cb_2017_us_state_20m.shp public.us_state_20 | "C:\Program Files\PostgreSQL\11\bin\psql.exe" -U postgres -d yourname -p 5432 --password

Note that regardless of how the output is displayed, this command does prompt you to enter your password, so enter it and hit enter when prompted. Also note that the “-I” flag is used to automatically create the spatial index we’ll need to properly query this layer (see the postgis link above for all of the available flags).

Projections

How did we know to include the SRID of 4269 (the -s flag from above)? If you don’t know what the SRID is, or how to find it, you might want to review this:

https://en.wikipedia.org/wiki/Spatial_reference_system

To quickly convert from the .prj (projection file) of the shapefile to the necessary SRID, you can use this website:

http://prj2epsg.org/search

Or import the file into your favorite GIS (we use QGIS) and look at the projection info.

Convert the Wind Turbines File

The next file we need is the locations of wind turbines which are available from here:

https://eerscmap.usgs.gov/uswtdb/data/

Even though there is a shapefile available, make sure to download the GeoJSON file, since this will force us to explore another tool that is indispensable for working with geographic data. “ogr2ogr” is a command line tool that’s not developed by Postgis, but is installed as a dependency. It’s part of GDAL (Geospatial Data Abstraction Library), which is used by many geospatial software projects, and can also be used directly on the command line. To import the point file on Macs do:

$ ogr2ogr -f "PostgreSQL" PG:"dbname=yourusername user=yourusername port=5432" "uswtdb_v2_2_20191004.geojson" -nln us_wind_turbine

for Windows, you should be able to use the same exact command, but it appears the current version of the EDB Postgres installer comes with a version of ogr2ogr that lacks Postgres support! So, you can either continue by importing the shapefile instead:

C:/> cd uswtdbSHP
C:/>shp2pgsql -I -s 4269 ./uswtdb_v2_2_20191004.shp public.us_wind_turbine | "C:\Program Files\PostgreSQL\11\bin\psql.exe" -U postgres -d yourusername -p 5432 --password

or install a different version of ogr2ogr from this well known and loved package:

https://trac.osgeo.org/osgeo4w/

Note that while the above links should continue to work, the file names of the geojson and shp files may change, so look at the names of the files that get downloaded and update the above “uswtdb_****” file name to match when using these commands.

Administering Your Database (and visualizing our tables!)

Now that we’ve enabled PostGIS in our sample database and imported some tables, let’s install and open PgAdmin, which is as close to an official database administration tool as you can get for Postgres (there is no official one, but knowing how to at least use PgAdmin can come in handy as it’s probably the most popular tool, albeit not the most loved). Windows users will already have this installed. Mac users will need to head over to the download page:

https://www.pgadmin.org/download/

It installs as a regular program, but when you open it, the current version opens in your default web browser (yes, yuck, we know, we’ll look at alternatives later).  The official instructions for adding a server in PgAdmin are here:

https://www.pgadmin.org/docs/pgadmin4/dev/connecting.html

But they are a little vague. Windows users will already have their local server added, but the instructions are useful to both users. To start, you will need to create a new connection or “server”. If you click into the “Server Dialog” section of the instructions, you’ll see this:

You’ll want to enter a name for this connection, I typically like the format:

{{host}}{{Postgres version}}_{{database name}}

So in this case I would use localhost10_chrismarx. Next, keep following the instructions and click the “Connection tab” . Here you will want to add localhost as the host name, your username for username, and I prefer to never click the save password dialog. Typically passwords saved in this manner are not secure.

Then click Save. Opening up the new connection you’ve added, you should be able to see this. Note that what we’re interested in are the tables we’ve added, which by default, will show up several levels down:

 

Maps (finally!) of your table data

If you open a new query tab in PgAdmin ( by clicking Tools > Query Tool), and enter and run (click the lightning bolt button or press F5) the following query:

select * from us_state_20;

and then look in the “Data Output” pane, and horizontally scroll to last column named “geom”, and click the “eye” icon in the column header (which activates the spatial viewer), you should be able to see your states table with all the US states! (If everything was done correctly…)

 

We can do the same for points:

select * from us_wind_turbine;

Or even combine the states and points


select geom from us_state_20
union all
select * from (
  --NOTE the field might be named just "geom" if you added a shapefile rather than geojson
  select wkb_geometry from us_wind_turbine limit 200
)a;

 

Spatial SQL (finally!)

But we’re here to do spatial SQL! Now we can run the point in polygon query, finding out which states have the most wind turbines. You can use this query:

select s.stusps, count(*) as turbine_cnt
  from us_state_20 s
  join
  us_wind_turbine w
  on st_contains(s.geom, w.wkb_geometry)
  group by stusps
  order by turbine_cnt desc

Which yields the following results:

State | Turbine Count
TX       13396
CA         8676
IA           4445
OK         3865
KS          2951

For visualizing the results as a choropleth, we can use QGIS, which gives us this:

 

Use QGIS to visualize PostGIS data (and make the map above)

Getting started with QGIS is another blog post, but as far as working with Postgres is concerned, the setup is very similar to any DB admin tool. In fact, QGIS has a “DB Manager” tool for working with databases. However, lamentably, the one thing you can’t do from that particular window is actually add a DB connection 🙁

So, to get started and add a DB connection (at least as of QGIS 3.6), the easy way is to use the “Browser” tab, which makes sense, since this is the panel where you can explore sources of data to add to your project. To add a connection, right click on the “PostGIS” entry:

That will open the window for adding a new PostGIS connection:

This should look familiar after having to add a connection in PGAdmin, and you’ll need the same information. I also tend to check the “Also list tables with no geometry” option, since sometimes I need to look at or join information from tables that don’t store geometry data.  The next step is to open the “DB Manager”, which you can do from the main menu:

If you open up the PostGIS item in the Providers list, you will see the same table info that you could see in PGAdmin. You can visualize the entire table by right clicking the table and selecting “Add to Canvas”. But the real fun is loading the results of a custom query. Click the “Sql Window” button, which is the table & wrench button (second from the left). To make sure that the new SQL window is opened with a connection to the correct database, make sure you database or one of the entities within your database is highlighted in the “Providers” panel.

Now you can enter the query we used earlier to find the states with the most wind turbines, with a few modifications to ensure we can actually visualize the results.

select b.stusps, b.gid, coalesce(turbine_cnt,0), s.geom from (
  select s.stusps, s.gid, count(*) as turbine_cnt
  from us_state_20 s
  join
  us_wind_turbine w
  on st_contains(s.geom, w.wkb_geometry)
  group by stusps, s.gid
  order by turbine_cnt desc
 )b 
 right join us_state_20 s on b.gid = s.gid 

The extra outer query takes the records we got for the counts of wind turbines in each state, and joins it back with the state layer in order to also include the “geom” column in the results of our query. This is the column that actually stores the geometry associated with each record. It also uses a “right join” so that we get all the states in the final query (since we still want to see the states that have no wind turbines). For these states, we also want to return a 0 rather than a null (so we can visualize it), so we use the “Coalesce” operator to either return the “turbine_cnt” or a 0 (only when that column’s value is null).

To load the results of that query, first click “Execute” button to see/verify the tabular results, then click the “Load as new layer” checkbox to show the new layer dialog, enter a layer name, and then click the “Load” button.

Now you should see your state layer. Right click on the layer, in the “Layers” panel, select “Properties”, and then from the “Symbology” section you can apply the “Graduated” classification. To see more about this process, check out the QGIS manual :

https://docs.qgis.org/3.4/en/docs/training_manual/vector_classification/classification.html?highlight=graduated

 

Wrap Up

Phew, that was a lot of work just to run one query, but now that you’ve got the basics setup, you can unleash all of the spatial SQL power on your big datasets.

Lastly, I’ll put in a plug for DBeaver (the successor to SQuirreL-SQL), a multi-platform database IDE. Given that it can be used to administer just about any database (Postgres, MYSQL, Oracle, etc.), it’s a great tool to have in your toolbox. It’s also open source, and has 11k+ stars on GitHub. It can be downloaded from here:

Download DBeaver Community

I use it for my SQL development, and it has a spatial data viewer for doing quick visualizations:

 

One response

Leave a Reply

Your email address will not be published. Required fields are marked *