Creating beautiful demographic maps in R with the tidycensus and tmap packages

The tidycensus and tmap R packages make an incredible duo for working with and visualizing US Census data. The tidycensus package, authored by Kyle Walker, streamlines geographic and tabular data downloads while the tmap package, written by Martijn Tennekes, vastly simplifies creating maps with multiple layers, accepts many different spatial object types and makes it easy to add scale bars, north arrows and other cartographic details.

Note that this post is designed to allow you to skip ahead to the tmap section if mapping is what you’re interested in.

This post is is intended to mimic a real-world task that requires both data collection and data visualization of geographic data and is broken into in two parts. Part 1 focuses on the collection of Census data using tidycensus. Referring to the data in Part 1, Part 2 outlines the map-making process using tmap. If you’d prefer to skip to Part 2 you can download the Census data here.

This post assumes a basic understanding of data manipulation with dplyr and a basic understanding of working with spatial objects using the sf package. If you need a refresher on working with spatial data in R we recommend the following:

We are currently using version 0.8.1 of tidycensus and version 2.1-1 of tmap. Below are all of the packages we will be using in this post.

library(ggplot2)      # For plotting
library(tidycensus)   # For downloading Census data
library(tmap)         # For creating tmap
library(tmaptools)    # For reading and processing spatial data related to tmap
library(dplyr)        # For data wrangling
library(sf)           # For reading, writing and working with spatial objects

Part 1: Using tidycensus to collect US Census data

Before we begin mapping we’ll grab some data from the US Census using the tidycensus package. Similar to other packages that access the Census and American Community Survey (ACS) you’ll need to acquire an API key. Follow this link to get your Census API key.

As a reminder, if you’d prefer to skip to Part 2 downloading the Census data here will allow you to follow along without completing Part 1.

Download Census data

Health Insurance Coverage Status by Sex and Age

In this post we’ll explore county-level health insurance coverage status in the US, Table B27001 from ACS, for 2012 and 2016. The tidycensus package has a convenience function, get_acs(), for allowing the user to download data either as a table or as an sf object with geometry. For downloading sf objects use the argument geometry = TRUE (FALSE is the default).

In order to download data for the full US leave the argument for state equal to NULL. Also since we’ll be mapping our data we’ll use the argument shift_geo = TRUE. This will adjust the positions of Alaska and Hawaii and is very useful for creating thematic maps of the full 50 United States in one view.

Download data for years 2012 and 2016. Notice that we’re grabbing the spatial data for 2016 but not for 2012.

dat12 <- get_acs("county", table = "B27001", year = 2012, 
  output = "tidy", state = NULL, geometry = FALSE) %>%
  rename(`2012` = estimate) %>%
  select(-NAME, -moe) 

dat16 <- get_acs("county", table = "B27001", year = 2016, 
  output = "tidy", state = NULL, geometry = TRUE, shift_geo = TRUE) %>%
  rename(`2016` = estimate) %>%
  select(-moe)

As an aside you can access just the spatial data with “shifted” geography using the following:

# County data
data("county_laea", package = "tidycensus")

# State data
data("state_laea", package = "tidycensus")

 

Process Census data

For much of this post we’re exploring changes in health insurance coverage between 2012 and 2016 by county in the US. In particular we’re looking at the uninsured young adult population ages 18-34. Our downloaded tables are broken into population by sex, age and health insurance status so we’ll need to do some data wrangling to calculate our final estimates.

Join the 2012 and 2016 data

Use a left_join to merge the tables to keep all records from the 2016 data and only those from the 2012 data that match. Note that we are making the assumption that the county boundaries have not changed from 2012 to 2016 which will mostly be the case.

The joined table, dat, will be used for the tabular data processing so we can safely drop the geometry with the function st_geometry() from the sf package. Once we’ve completed the tabular data processing, later in Part 1, we will join the tabular data back with the geometry.

dat <- left_join(dat16, dat12, by = c("GEOID", "variable"))
st_geometry(dat) <- NULL # This drops the geometry and leaves a table
head(dat)
## # A tibble: 6 x 5
##   GEOID NAME                    variable    `2016`   `2012`
##   <chr> <chr>                   <chr>        <dbl>    <dbl>
## 1 01001 Autauga County, Alabama B27001_001 54387   53571   
## 2 01001 Autauga County, Alabama B27001_002 26392   25794   
## 3 01001 Autauga County, Alabama B27001_003  2003    2213   
## 4 01001 Autauga County, Alabama B27001_004  1982    2209   
## 5 01001 Autauga County, Alabama B27001_005    21.0     4.00
## 6 01001 Autauga County, Alabama B27001_006  5119    5174

Assign health insurance categories

Although we want to calculate the percentage of population ages 18-34 without health insurance you’ll notice from the snapshot below that neither “total population ages 18-34” nor “total population without health insurance 18-34” are actual variables. Instead we need to tally males and females and age categories 18-24 and 25-34. We only show a portion of the table below, you can find the full table at Factfinder.

In the table above you can see that the male population ages 18-24, Males 18 to 24 years, is the 9th variable from the top. This corresponds with variable == "B27001_009"in our table. Similarly Males 18 to 24 years with no health insurance coverage is the 11th variable from the top and corresponds with variable == "B27001_011" in our table. The lines for the female population follow a similar pattern.

So what we need in the end:

    • Total male population would be: B27001_009 + B27001_012
    • Total female population would be B27001_037 + B27001_040 (not shown in table above)
    • Total non-insured male pop: B27001_011 + B27001_014
    • Total non-insured female pop: B27001_039 + B27001_042

Pull out the variables needed to calculate the total population ages 18 to 34 and the population without health insurance ages 18-34. We’ll assign these as pop1834 and pop1834ni respectively. Remove all other variables.

We use the dplyr function case_when() instead of a series of if/then statements. Essentially this function allows us to say “if the variable value is B27001_009, B27001_012, B27001_037 or B27001_040 then assign the value of “pop1834” and so on.

dat <- mutate(dat,
  cat = case_when(
    variable %in% paste0("B27001_0",
      c("09","12","37","40")) ~ "pop1834",
    variable %in% paste0("B27001_0",
      c("11","14","39","42")) ~ "pop1834ni")) %>%
  filter(!is.na(cat))

Summarize the data by county-year-category

In order to calculate our estimates we’ll stack the data, separated by year, using tidyr::gather then group based on GEOIDNAMEyear and our new cat field so that we have counts for total population and population without insurance for each county for each year. Then we use spread() to push the total population and population without insurance into their own variables. The final table should include a county-year summation for variables pop1834 and pop1834ni.

# Create long version
dat <- tidyr::gather(dat, year, estimate, c(`2012`, `2016`))

# Group the data by our new categories and sum
dat <- group_by(dat, GEOID, NAME, year, cat) %>%
  summarize(estimate = sum(estimate)) %>%
  ungroup() %>%
  tidyr::spread(cat, estimate) 
head(dat)
## # A tibble: 6 x 5
##   GEOID NAME                    year  pop1834 pop1834ni
##   <chr> <chr>                   <chr>   <dbl>     <dbl>
## 1 01001 Autauga County, Alabama 2012    10966      2348
## 2 01001 Autauga County, Alabama 2016    11246      1968
## 3 01003 Baldwin County, Alabama 2012    34353     11012
## 4 01003 Baldwin County, Alabama 2016    37178      9639
## 5 01005 Barbour County, Alabama 2012     5040      1918
## 6 01005 Barbour County, Alabama 2016     4996      1504

Calculate the final estimates

For our final numbers we want % of uninsured population ages 18-34 for both 2012 and 2016. In addition we’ll calculate the difference in % uninsured between years (i.e. 2016 - 2012).

dat <- mutate(dat, est = (pop1834ni/pop1834) * 100) %>%
  select(-c(pop1834, pop1834ni)) %>%
  tidyr::spread(year, est) %>%
  mutate(diff = `2016`-`2012`)
head(dat)
## # A tibble: 6 x 5
##   GEOID NAME                    `2012` `2016`   diff
##   <chr> <chr>                    <dbl>  <dbl>  <dbl>
## 1 01001 Autauga County, Alabama   21.4   17.5 - 3.91
## 2 01003 Baldwin County, Alabama   32.1   25.9 - 6.13
## 3 01005 Barbour County, Alabama   38.1   30.1 - 7.95
## 4 01007 Bibb County, Alabama      33.0   16.7 -16.3 
## 5 01009 Blount County, Alabama    27.3   20.4 - 6.88
## 6 01011 Bullock County, Alabama   20.4   34.5  14.1

Initial visualization of the health insurance data using ggplot

Distributions by year

Take a quick look at the data using ggplot. Compare the county distributions as well as the median % uninsured by year using facet_wrap.

datlong <- select(dat, -diff) %>%
  tidyr::gather(year, estimate, c(`2012`, `2016`)) %>%
  group_by(year) %>%
  mutate(med = round(median(estimate, na.rm = TRUE), 1))
ggplot(datlong, aes(estimate)) +
  geom_histogram(fill = "firebrick2", 
    color = "white", bins = 60) +
  xlab("Uninsured adults ages 18-34 by county (%)") +
  theme(plot.title = element_text(hjust = 0.5)) +
  facet_wrap(~year, ncol = 1) +
  geom_vline(aes(xintercept = med,
    group = year), lty = "dashed") +
  geom_text(aes(label = paste("Median = ", med), x = med, y = 55))

Counties with greatest change (+/-) in % uninsured

We’re curious to know which counties experienced the largest increase or decrease in the % of uninsured. Use the function dplyr::top_n to get the first and last 10 records from the diff field.

d10 <- top_n(dat, 10, diff) %>%
  mutate(type = "Insured population decreased",
    difftemp = diff)

i10 <- top_n(dat, -10, diff) %>%
  mutate(type = "Insured population increased",
    difftemp = abs(diff))

id10 <- bind_rows(list(i10, d10)) %>%
  arrange(desc(difftemp))
ggplot(id10) + 
  geom_col(aes(x = forcats::fct_reorder(NAME, difftemp), 
    y = difftemp, fill = type)) +
  coord_flip() +
    scale_fill_manual(values = c("firebrick2", "cyan4")) +
  theme(plot.title = element_text(hjust = 0.5),
    legend.position = "bottom",
    legend.title = element_blank()) +
  ggtitle("Counties with the greatest change (+/-) in
    insured population, ages 18-34, 2012-2016") +
  ylab("Difference in % insured (2016 - 2012)") +
  xlab("")

Create a final geographic file for use with tmap

You may remember that our initial file, dat16, included the actual geography while the file we just created is tabular-only. In order to map the variables we just created we can join dat with the spatial object dat16. Because of the way tidycensus formats the data (stacked) we’ll remove all variables except for one and join the objects by GEOID and NAME.

# dat16 is our original geographic object and dat is the tabular data
shp <- dat16 %>%
  filter(variable == "B27001_001") # much faster than using distinct()
  select(GEOID, NAME) %>%
  left_join(dat, by = c("GEOID", "NAME")) %>%
  arrange(GEOID) %>%
  rename(uninsured_2012 = `2012`,
    uninsured_2016 = `2016`,
    uninsured_diff = diff)


# Remove the Aleutians West from shp for display purposes.
# NOTE: this isn't necessary since I'm using the shift_geo
# argument in the get_acs function. However if you're not
# using shift_geo or joining to a different spatial layer
# for the full US you may want to consider removing this 
# record for display purposes.
shp <- filter(shp, GEOID != "02016")

Part 2: Creating beautiful maps with tmap

One of the best features of tmap is how easy it is to create maps. The tmap syntax is modeled after ggplot2 whereby the initial command specifies the shape object and data input (tm_shape()) and is followed by the map layer (e.g., tm_polygons() or tm_lines()). Layers can be stacked similar to ggplot2 using the + symbol. In addition, attribute elements can be added to the map and maps can be faceted similar to using facets in ggplot2.

Moving forward we encourage you to explore the tmap documentation as this package is extremely function-rich and we are only skimming the surface of what’s available in this powerful package.

If you skipped Part 1 (otherwise you can skip this step)

Download the Census data here and create shp for use with tmap.

# Create shp file
shp <- st_read("path_to_downloaded_file/acs_2012_2016_county_us_B27001.shp",
               stringsAsFactors = FALSE) %>%
  rename(uninsured_2012 = un_2012,
         uninsured_2016 = un_2016,
         uninsured_diff = unnsrd_) %>%
  mutate(STFIPS = stringr::str_sub(GEOID, 1, 2))

Super easy mapping

With tmap there is not a lot of data preparation that needs to happen before mapping. With very little code you can create a simple map.

A) The easiest possible map, just the geography

The “shifted” geography we downloaded from tidycensus is a huge help. The shift_geo argument creates subsets of the continental US, Alaska and Hawaii. The author of tmap has a helpful write-up here with guidance for displaying subsets.

# Define the shape and the layer elements
tm_shape(shp) +
  tm_polygons()

B) Add a variable to your map

Below is a map of the 2012 data using all of the tmap defaults – easy but not all that pretty! The default classification does not work well with this data and in some places it’s difficult to see the counties due to the heavy gray outlines.

tm_shape(shp) +
  tm_polygons("uninsured_2012")

C) Change the shapes

The package provides a lot of flexibility in terms of different approaches to mapping your data. Here we use bubbles in place of polygons and note that no additional data processing is required.

tm_shape(shp) +
  tm_bubbles("uninsured_2012")

D) Include multiple layers

Here we demonstrate including more than one layer. We initially use sf code to create a simple table showing the location of the Empire State Building and add this to our map of counties.

# Create a simple geographic object with one point
dat <- data.frame(c("Empire State Building"), 
  lat = c(40.748595), 
  long = c(-73.985718))
sites <- sf::st_as_sf(dat, coords = c("long", "lat"), 
  crs = 4326, 
  agr = "identity")

tm_shape(shp) +
  tm_polygons() +
  tm_shape(sites) +
  tm_dots(size = 2)

Projecting data on-the-fly

Probably one of the nicest features of tmap is the ability to project data on-the-fly. There’s no need to re-project or transform your data prior to mapping. Simply use the projection argument in the tm_shape function to assign one of the following:

  • coordinate reference system (CRS), examples include:
    • 2163: U.S. National Atlas Equal Area
    • 3857: Mercator
  • proj4string, examples include:
    • +proj=lcc +lat_1=41.03333333333333 +lat_2=40.66666666666666 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs: NAD83/New York Long Island
    • +proj=utm +zone=12 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs: NAD83/UTM zone 12N
  • shortcut/name of projection, examples include:
    • eck4: Eckert IV
    • wintri: Winkel-Tripel

Here’s the Winkel-Tripel projection. Interesting!

tm_shape(shp, projection = "wintri") +
  tm_polygons()

Simplify polygons or lines

The simplify_shape function is incredibly useful, particularly if you work with detailed linework or polygons which can be cumbersome to work with and time consuming to map. The function is from the tmaptools package and uses the argument fact, the simplification factor, to adjust the number of coordinates in the given spatial object. If you do a lot of simplication you can also review the st_simplify() function in sf and, for the greatest control, the functions in the package rmapshaper.

The default geographies downloaded using tidycensus are the Cartographic Boundary Files. For counties the resolution level is 500k. Although much more simplified than the original Tiger Line files, these simplified versions are still fairly large. We could simplify these even more by running the following code:

# Simplify shapes with tmap function
shpsimp <- simplify_shape(shp, fact = 0.05)

Working with colors and cuts

Built-in colors and cuts

The tmap package makes it very easy to color and classify our data using the style and palette arguments.

  • The style argument refers to the classification method that should be used to “bin” the data.
  • The palette argument refers to the color palette you wish to assign to the data. The type of palette (i.e. sequentialqualitativeand diverging) is automatically determined by the data but can be easily overwritten.

NOTE: The author wrote a useful Shiny app for choosing palettes from RColorBrewer. This can be accessed by typing tmaptools::palette_explorer() into your console. To run this you’ll need to make sure you have the shiny and shinyjs packages installed. You can also run display.brewer.all() in your console to view the Color Brewer palettes.

A few of the style options include:

  • quantile
  • jenks
  • pretty
  • equal
  • sd

A few of the palette options include:

  • BuPu
  • OrRd
  • PuBuGn
  • YlOrRd

Below is an example map showing the BuPu color scheme with quantile classification.

var <- "uninsured_2012"

tm_shape(shp, projection = 2163) +
  tm_polygons(var,
    style = "quantile",
    palette = "BuPu") +
  tm_legend(legend.position = c("left", "bottom"))

User-defined classification

For controlling the cut points in our data we’ll drop the style argument and use breaks. Notice too in this example that we changed the color of the county outlines and added a little transparency so that they’re not as overwhelming.

cuts <- c(0, 10, 20, 30, 40, 100)

tm_shape(shp, projection = 2163) +
  tm_polygons(var,
    breaks = cuts,
    palette = "BuPu", 
    border.col = "white", 
    border.alpha = 0.5) +
  tm_legend(legend.position = c("left", "bottom"))

Additional color options

Example 1: Apply type of palette instead of palette scheme

If you don’t know exactly which color scheme to use but you’d like to apply a sequential palette you can use palette = "seq". This will apply colors from the first sequential set of colors in the RColorBrewer color schemes.

tm_shape(shp, projection = 2163) +
  tm_polygons(var,
    breaks = cuts,
    palette = "seq", 
    border.col = "white", 
    border.alpha = 0.5) +
  tm_legend(legend.position = c("left", "bottom"))

Example 2: Reverse the color scheme

We can also reverse the color schemes with a simple -. Here we reverse the BuPu color scheme – which doesn’t make any sense in this context!

tm_shape(shp, projection = 2163) +
  tm_polygons(var,
    breaks = cuts,
    palette = "-BuPu", 
    border.col = "white", 
    border.alpha = 0.5) +
  tm_legend(legend.position = c("left", "bottom"))

Example 3: Choose custom colors

If you want to assign colors outside of RColorBrewer simply create a vector of HEX codes and apply them to the palette argument.

mycols <- c("#f0f4c3", "#dce775", "#cddc39", "#afb42b", "#827717")

tm_shape(shp, projection = 2163) +
  tm_polygons(var,
    breaks = cuts,
    palette = mycols, 
    border.col = "white", 
    border.alpha = 0.5) +
  tm_legend(legend.position = c("left", "bottom"))

Customizing layout features and adding attributes to the map

Many, many options exist for customizing your map. We’ll highlight the ones that we end up using the most.

Add titles to the map

Add a main map title and a legend title.

  • The main title is controlled by the title argument in tm_layout.
  • The legend title is controlled by the titleargument in the layer (in this example tm_polygons).
mymap <- tm_shape(shp, projection = 2163) +
  tm_polygons(var,
    breaks = cuts,
    palette = "BuPu", 
    border.col = "white", 
    border.alpha = 0.5,
    title = "Uninsured (%)") +
  tm_legend(legend.position = c("left", "bottom")) +
  tm_layout(title = "Uninsured adults ages 18-34 by county, 2012",
    title.size = 1.1,
    title.position = c("center", "top"))

mymap

Increase the map margins

The default value for the inner margins (margins inside the frame) is a little small (0.02). Make room for adding other elements to the map. The order of inner.margins inputs is bottom, left, top and right. Values can be between 0 and 1.

mymap + tm_layout(inner.margins = c(0.06, 0.10, 0.10, 0.08))

Add a scalebar and north arrow: the defaults

The default location for both the scalebar and north arrow is the bottom-right corner. Not so pretty.

mymap + 
  tm_scale_bar() + 
  tm_compass()

Add a scalebar and north arrow: customized

We want the scalebar to show units in miles, not kilometers. To do this we’ll need to add the unit argument to the tm_shape function (not the tm_compass function).

# Add unit argument to tm_shape
tm_shape(shp, projection = 2163, unit = "mi")

# Customize scale bar and north arrow
mymap + tm_scale_bar(color.dark = "gray60",
    position = c("right", "bottom")) + 
  tm_compass(type = "4star", size = 2.5, fontsize = 0.5,
    color.dark = "gray60", text.color = "gray60",
    position = c("left", "top"))

Adding additional layers to the map

We want to add state boundaries to the map. We can do this using 1 of 2 methods:

  1. Read in another spatial object and add it as a layer.
  2. Use the function aggregate_map from library(tmaptools) to aggregate our county layer by state FIPS code and add it as a layer.

Below is an example of the second option.

# Create a state FIPS field 
shp <- mutate(shp, STFIPS = stringr::str_sub(GEOID, 1, 2))

# Aggregate the county layer by state 
states <- shp %>%
  aggregate_map(by = "STFIPS")
# Add the new state layer
mymap + tm_shape(states) +
  tm_borders(col = "black")

Voila!

Adding multiple maps to a page

There are a several options for adding multiple maps to the same page. The method you decide to use depends on the format of your data and what it is you’re trying to accomplish.

Using tmap_arrange

The function tmap_arrange allows users to add objects to the page and separately control the look and order of the objects. For example if we want the maps to have different colors for the state boundaries we might create separate map objects and use the tmap_arrange function to plot them.

First write a function that includes all of our previously described elements but allows us to tweak the features that we want.

createMap <- function(.data, varname, statecol, maptitle){
  
  tm_shape(.data, projection = 2163, unit = "mi") +
  tm_polygons(varname,
    breaks = cuts,
    palette = "BuPu", 
    border.col = "white", 
    border.alpha = 0.5,
    title = "Uninsured (%)") +
  tm_legend(legend.position = c("left", "bottom")) +
  tm_layout(title = maptitle,
    title.size = 1.1,
    title.position = c("center", "top"),
    inner.margins = c(0.06, 0.10, 0.10, 0.08),
    frame = FALSE) +
  tm_scale_bar(color.dark = "gray60",
    position = c("right", "bottom")) + 
  tm_compass(type = "4star", size = 2.5, fontsize = 0.5,
    color.dark = "gray60", text.color = "gray60",
    position = c("left", "top")) +
  tm_shape(states) +
  tm_borders(col = statecol)
  
}
m1 <- createMap(shp, 
  varname = "uninsured_2012", 
  statecol = "green", 
  maptitle = "Here is title 1")

m2 <- createMap(shp, 
  varname = "uninsured_2016", 
  statecol = "yellow",
  maptitle = "Here is title 2")

tmap_arrange(m1, m2, ncol = 1)

Using the aesthetic argument with “wide” data

Another way to access the facet functionality is to name the variables in the aesthetic argument of the layer. This works if your data is “wide”, that is if you have separate columns for variables, similar to what we have in our shp object.

Below we identify the variables for mapping. Since we’ll be mapping 2 variables we can also assign 2 map titles.

var2 <- c("uninsured_2012", "uninsured_2016")
title2 <- c("Uninsured adults ages 18-34 by county, 2012", 
  "Uninsured adults ages 18-34 by county, 2016")

createMap(shp, 
  varname = var2, 
  statecol = "black", 
  maptitle = title2)

Using facets with “long” data

Faceting in tmap is very similar to faceting in ggplot, that is, small multiples will be created based on a given variable.

Below we use the tm_facet function and the by argument. In order for this to work with our data we’ll need to create a “long” version. Use the gather function from tidyr. As you can see below we now have 2 records for each county, the 2012 and 2016 estimates.

shplong <- select(shp, GEOID, NAME, uninsured_2012, uninsured_2016) %>%
  tidyr::gather(year, estimate, c(uninsured_2012, uninsured_2016)) %>%
  arrange(GEOID)

glimpse(shplong)
## Observations: 6,282
## Variables: 5
## $ GEOID    <chr> "01001", "01001", "01003", "01003", "01005", "01005",...
## $ NAME     <chr> "Autauga County, Alabama", "Autauga County, Alabama",...
## $ year     <chr> "uninsured_2012", "uninsured_2016", "uninsured_2012",...
## $ estimate <dbl> 21.41164, 17.49956, 32.05542, 25.92662, 38.05556, 30....
## $ geometry <sf_geometry [m]> MULTIPOLYGON (((1269841 -13..., MULTIPOLY...
mymap <- tm_shape(shplong, projection = 2163) +
  tm_polygons("estimate",
    breaks = cuts,
    palette = "BuPu", border.col = "white", 
    border.alpha = 0.3,
    title = "Uninsured (%)") +
  tm_facets(by = "year", free.coords = TRUE, ncol = 1) +
  tm_shape(states) +
    tm_borders(col = "black")

Taking a deeper dive with leaflet

If you remember the plot we made in this section you’ll notice that Texas had the most counties in each category (greatest change (+/-) in insured population, 2012-2016):

  • Insured population increased: 4 counties in Texas were in the top 10
  • Insured population decreased: 3 counties in Texas were in the top 10

Perhaps this isn’t surprising given that Texas has the most counties in the country…by far. There are 254 counties in TX followed by 159 counties in GA. In any case let’s take a closer look at Texas to see how the other counties stack up.

Prepare the Texas data

Filter to Texas and calculate several fields that will be used in the leaflet map. Instead of a chloropleth map we’ll use the function tm_bubbles to map our data.

# Filter to Texas and create some additional
# fields for mapping. Since we'll be using
tx <- filter(shp, STFIPS == "48") %>%
  mutate(NAME = stringr::str_remove(NAME, ", Texas"),
    `Insured Adults Ages 18-34, 2012-2016` = case_when(
      uninsured_diff < 0  ~ "Insured population increased",
      uninsured_diff > 0  ~ "Insured population decreased",
      uninsured_diff == 0 ~ "Insured population stayed the same"),
    diff2 = round(abs(uninsured_diff), 1),
    popup = ifelse(uninsured_diff != 0,
      paste0(`Insured Adults Ages 18-34, 2012-2016`, " by ", diff2, "%"), 
      `Insured Adults Ages 18-34, 2012-2016`),
    diffrad = as.numeric(cut(diff2, c(0, 5, 10, 20, 30, 45), 
      right = FALSE)))

# Remove some unnecessary fields
tx <- select(tx, -c(uninsured_2012, uninsured_2016, uninsured_diff, STFIPS))

Map the Texas data using tmap::tmap_leaflet()

The leaflet function in tmap is super easy. Create a tmap as you normally would and send it to the leaflet widget using the function tmap_leaflet.

# Basemap
carto <- "https://cartodb-basemaps-{s}.global.ssl.fastly.net/light_all/{z}/{x}/{y}{r}.png"

# Create a "normal" tmap except we'll add  
# the basemap and the `popup.vars` argument.
# The symbol size of the bubbles will be
# based on the data so use our calculated
# field `diffrad` which will apply sizes
# 1 through 5. Sizes can be further adjusted
# using the `scale` argument.
mymap <- tm_basemap(carto) +  
  tm_shape(tx) +
  tm_borders(col = "azure2") +
  tm_bubbles("diffrad", 
    col = "Insured Adults Ages 18-34, 2012-2016", 
    border.col = "white", 
    scale = 1.5,
    style = "fixed",
    palette = c("coral2", "aquamarine3", "gray"),
    popup.vars = c("County: " = "NAME", "Change: " = "popup"))
  
tmap_leaflet(mymap)


Working with rasters

The syntax for adding raster elements in tmap is similar to vectors. Below we construct a map using global land cover data provided by the tmap package. The land cover data is a raster brick which contains several layers (land cover, tree cover and elevation).

Map the land cover layer and use the argument bbox to assign the bounding box of the US counties to that of the raster. Keep in mind that our shp object has the shifted geography so we’ll need to create another version without Alaska or Hawaii.

# Load the raster data
data(land)

# Remove AK and HI
shp_rmAKHI <- filter(shp, !STFIPS %in% c("02", "15"))

# Create a color palette for land cover
# This was taken directly from the tmap documentation
pal20 <- c("#003200", "#3C9600", "#006E00", "#556E19", "#00C800", 
  "#8CBE8C", "#467864", "#B4E664", "#9BC832", "#EBFF64", "#F06432", 
  "#9132E6", "#E664E6", "#9B82E6", "#B4FEF0", "#646464", "#C8C8C8", 
  "#FF0000", "#FFFFFF", "#5ADCDC")

# Map the raster data and assign the bounding box of
# the county layer. Add the county layer on top.
tm_shape(land, bbox = shp_rmAKHI) +
  tm_raster("cover", palette = pal20, alpha = 0.8) +
  tm_shape(shp_rmAKHI) + 
  tm_borders(alpha = 0.4, col = "black") +
  tm_layout(inner.margins = c(0.06, 0.10, 0.10, 0.08)) +
  tm_layout(legend.position = c("left","bottom"))

Summary: tmap is awesome!

The tmap package has streamlined the process such that pretty map-making in R can be achieved with ease! In addition to that tmap includes some very useful functions for aggregating, clipping and simplifying data. The list is long and definitely worth checking out 🙂

7 responses

  1. This is arguably my favorite mapping post. Curious, however, is their advantage to the mutate step for the `cat` variable in `id10` [factor(1:length(NAME)…]? Seems like this results in a lot of wizardry to `geom_rect` and `scales_etc` for getting positing correct versus the `forcats::fct_recorder` approach. Something like: `ggplot(id10) + geom_col(x = fct_reorder(NAME, difftemp), y = difftemp, fill = type))` but not sure if there are drawbacks?

    • Thanks Jason for your comment – this is a great suggestion! Using forcats::fct_reorder() is indeed a better approach. The code has been updated 🙂

      Thanks!
      Hollie

  2. Hi Hollie!

    Great blog!

    In your example, would it have been possible to label each US state with two letters (shown within the shape of each state), e.g. for on the map. I tried to use tm_text(“abbreviation_state”, size = 1, col = “black”) but did not succeed.

    Thanks!

    • Hi Max –

      Thanks for your comment! The state layer we’re using for the map in the blog post does not have a field for state abbreviations so you’ll need to add one. Here’s some code to get you started:

      1. Download the table of Census states: https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html

      2. In R read in the downloaded shapefile

      stabbr <- st_read("path_to_file/cb_2016_us_state_500k.shp", stringsAsFactors = FALSE) %>%
      select(GEOID, STUSPS) %>%
      st_set_geometry(NULL)

      3. Find the code where we create the state shapefile and do the following:

      # This is already in the blog post
      states <- shp %>%
      aggregate_map(by = "STFIPS")

      # Run this code which will join the `stabbr` table to the `states` shp
      states <- mutate(states, GEOID = unique(shp$STFIPS)) %>%
      left_join(stabbr, by = "GEOID")

      4. Now you should be able to use the `tm_text` function to add the state abbreviations

      mymap + tm_shape(states) +
      tm_borders(col = "black") +
      tm_text("STUSPS", size = 0.7)

  3. Hi Hollie! I need your help! After I write the shp file and try to run it through tm_shape, I get this error:

    Error in get_projection(mshp_raw, output = “crs”) :
    shp is neither a sf, sp, nor a raster object

    This is for a separate project so I cannot simply use the downloadable shp file. Do you know how to make the shp dataframe we wrote into a .shp file?

    Thanks!

    • We’d need to know more about your process. Can you put it in a stackoverflow post and send the link. Your process should be something like: x < - st_read("my_shape.shp"); tm_shape(x) + tm_polygons()

  4. Really great, thanks–tmap is awesome! I’ve been really frustrated with my previous experience mapping SF objects with other R options. ggplot2 works well for SP objects, but I have struggled getting custom mapping for SF objects.

Leave a Reply

Your email address will not be published. Required fields are marked *