Lakes small multiples

I’ve been inspired by a new type of small multiples map I’ve seen, where multipolygons are extracted from their spatial locations and reassembled in a grid while maintaining their relative sizes. If you unsure what I mean, check out this one as an example. So I thought I’d try applying it to some lakes data in New Zealand.

1 Libraries and import data

The Department of Conservation’s Freshwater Ecosystems of New Zealand (FENZ) geodatabase provided the lake boundary polygons. While one could also source these from the LINZ topo map layers, the advantage of the FENZ layer is that it can be linked to other datasets, such as modelled water quality.

The Ministry for the Environment provides modelled lake water quality data for 3,813 lakes in New Zealand that are larger than 1 hectare. This data can be joined using the FENZ lake ID. The dataset is generated by a statistical learning model (specifically, a random forest model) trained on actual monitoring data and using a number of environmental predictors to extrapolate out to unmeasured lakes.

Show the code

library(tidyverse)
library(sf)
library(gridExtra)

lakes <- st_read("data/fenz_v1_lakes/FENZ_v1_Lakes.shp", quiet = TRUE) |>
  janitor::clean_names() |> 
  drop_na(name)

wq <- read_csv("data/mfe-lake-water-quality-modelled-2016-2020-CSV/lake-water-quality-modelled-2016-2020.csv") |>
  janitor::clean_names() |> 
  filter(indicator == "TLI3")

lakes <- lakes |> 
  left_join(wq, by = c("lid" = "lfenzid"))

2 What are the largest lakes in New Zealand?

Sorting the lakes dataset by size (surface area) and selecting the largest ones is a fairly straightforward task. Interestingly, there is a discrepancy between my data and this Wikipedia page listing the sizes of lakes in NZ. Wikipedia lists it as 141 km², whereas my data shows it as 152 km². Since the Wikipedia article does not list its sources, the origin of this discrepancy remains unclear!¹

I’ve have also selected Trophic Level Index (TLI) as an indicator of water quality - more on that below.

Show the code

lakes_20 <- lakes |> 
  top_n(n = 20, wt = shape_area)

mapview::mapview(lakes_20)

Show the code

lakes_20 |> 
  st_drop_geometry() |> 
  select(name, shape_area, value) |> 
  arrange(desc(shape_area)) |> 
  mutate(shape_area = shape_area / 1e6) |> 
  rename(Name = name,
         `Lake area (km2)` = shape_area,
         TLI3 = value) |> 
  knitr::kable(digits = 1)

Name	Lake area (km2)	TLI3
Lake Taupo (Taupomoana)	613.3	2.6
Lake Te Anau	344.2	1.6
Lake Wakatipu	299.0	1.5
Lake Wanaka	204.3	1.7
Lake Ellesmere (Te Waihora)	197.7	6.0
Lake Pukaki	172.8	1.7
Lake Hawea	152.0	1.4
Lake Manapouri	142.3	1.7
Lake Tekapo	96.6	1.6
Lake Rotorua	80.6	4.2
Lake Wairarapa	77.4	4.6
Lake Benmore	75.9	1.9
Lake Hauroko	72.2	2.6
Lake Ohau	59.3	1.7
Lake Waikaremoana	50.0	2.8
Lake Poteriteri	44.4	3.0
Lake Tarawera	41.2	3.1
Lake Brunner (Moana)	40.6	2.9
Lake Coleridge	36.9	2.1
Lake Waikare	34.4	6.3

There’s an interesting pattern regarding lake size and water quality. None of the lakes with very low values of TLI3 (e.g <2) are small. This could just be an artifact of the modelled data, perhaps a relationship that the model has learned incorrectly due to a strong bias in the training data. It would be better to using monitoring data here to confirm this relationship.

Show the code

lakes |> 
  st_drop_geometry() |> 
  select(name, shape_area, value) |> 
  mutate(shape_area = shape_area / 1e6) |> 
  ggplot(aes(value, shape_area)) +
  geom_point() +
  geom_smooth() +
  labs(x = "TLI3 score", y = "Lake area (km2)") +
  theme_minimal()

3 Plot the 20 largest lakes

Now, let’s get down to plotting the largest lakes, and colour them by their respective TLI3 score. LAWA provides a good fact sheet on what this metric means. In short:

The Trophic Level Index (TLI) is a method of characterising the ecological health of lakes based on the amount of nutrients and algae growing in them. It provides an integrated measure of water quality that can be tracked over time and that can be used to estimate biological productivity. It is presented as a score, the lower the score the better the condition of the lake.

Here’s what the scores correspond to in ecological terms:

TLI Score	Description
0–2	Microtrophic: The lake is very clean with very low levels of nutrients and algae. The lake can have snow or glacial sources.
>2–3	Oligotrophic: The lake is clear and blue, with low levels of nutrients and algae.
>3–4	Mesotrophic: The lake has moderate levels of nutrients and algae.
>4–5	Eutrophic: The lake is murky, with high amounts of nutrients and algae.
>5	Supertrophic: The lake has very high amounts of phosphorus and nitrogen, and can be overly fertile and often associated with poor water clarity. Excessive algae growth can occur. Suitability for recreational purposes is often poor.

Show the code

# Calculate centroids
lake_polygons <- lakes_20 %>%
  mutate(centroid = st_centroid(st_geometry(.)))

# Sort the lake polygons by shape_area in descending order
lake_polygons <- lake_polygons %>%
  arrange(desc(shape_area))

# Compute bounding boxes and find the maximum width and height
bboxes <- lapply(st_geometry(lake_polygons), function(x) as.numeric(st_bbox(x)))
max_width <- max(sapply(bboxes, function(x) abs(x[1] - x[3])))
max_height <- max(sapply(bboxes, function(x) abs(x[2] - x[4])))

# Set padding to ensure polygons are not cut off
padding <- 3500  # Adjust this value based on your specific spatial scale

# Calculate colors based on value
lake_polygons$color <- viridis::viridis(
  n = nrow(lake_polygons), 
  option = "D", 
  direction = 1
)[cut(lake_polygons$value, breaks = nrow(lake_polygons), labels = FALSE)]

# Function to make plots
graph <- function(x) {
  centroid <- st_coordinates(lake_polygons$centroid[x])
  ggplot(lake_polygons[x, ]) +
    geom_sf(fill = lake_polygons$color[x], colour = NA) +
    coord_sf(xlim = c(centroid[1] - max_width/2 - padding, centroid[1] + max_width/2 + padding), 
             ylim = c(centroid[2] - max_height/2 - padding, centroid[2] + max_height/2 + padding), 
             expand = FALSE) +
    theme_void() +
    ggtitle(str_wrap(lake_polygons$name[x], width = 20)) +
    theme(plot.title = element_text(hjust = 0.5, size = 10, vjust = 1),
          legend.position = "none")
}

# Create a list of plots
plot_list <- lapply(1:nrow(lake_polygons), FUN = graph)

# Combine plots and titles
combined_list <- lapply(plot_list, function(p) {
  title <- ggplot_build(p)$layout$panel_params[[1]]$title
  gridExtra::arrangeGrob(p, top = grid::textGrob(
    label = title,
    gp = grid::gpar(fontsize = 10),
    hjust = 0.5,
    vjust = 1
  ))
})

# Create the overall title and subtitle grobs
overall_title <- grid::textGrob("The 20 Largest Lakes of New Zealand", gp = grid::gpar(fontsize = 16, fontface = "bold"))
subtitle <- grid::textGrob("Coloured by Trophic Level Index", gp = grid::gpar(fontsize = 12))

# Create a common legend plot to extract the legend
legend_plot <- ggplot(lake_polygons) +
  geom_sf(aes(fill = value), colour = NA) +
  scale_fill_viridis_c(name = "TLI3", option = "D", direction = 1) +
  theme_void() +
  theme(
    legend.position = "bottom",
    legend.title = element_text(size = 10, hjust = 0.5),
    legend.title.position = "top",
    legend.text = element_text(size = 8),
    legend.box.margin = margin(0, 0, 0, 0)
  )

# Extract the legend
legend_grob <- cowplot::get_plot_component(legend_plot, 'guide-box-bottom', return_all = TRUE)

# Arrange the plots together with the overall title and subtitle
final_plot <- gridExtra::grid.arrange(
  gridExtra::arrangeGrob(overall_title, subtitle, ncol = 1, heights = c(0.08, 0.05)),
  do.call(gridExtra::arrangeGrob, c(combined_list, ncol = 5)),
  legend_grob,
  nrow = 3,
  heights = c(0.1, 0.8, 0.1)
)

4 Plot the 50 largest lakes

Bonus just for the blog readers… here’s the plot for the 50 largest lakes in New Zealand!

5 Next steps

Plot the smallest 20 lakes
Make an estuaries small multiples map
Figure out how to shrink the rows for the smaller lakes

Footnotes

Lakes380 lists it as 151.77 km², Fish and Game as 124 km², and Queenstown Lakes District Council as 149 km².↩︎