Exploding geometries with GeoPandas

Applied to the Taiwan and China scenario

By Carlos A. Toruño P.

July 29, 2023

In my previous blog post, I showed how to program a choropleth map generator using Streamlit. While doing this, I encountered an issue I had never experienced before. More specifically, how to explode geometries. As you might know, there is a huge international debate about Taiwan being a sovereign country. As a result, many “official” datasets containing world boundaries would usually include the island of Taiwan as part of China. The big question that relates to this blog post is, how can we split the boundaries between these two territories? Therefore, in this blog post, I will be explaining to you how I solved this issue using the GeoPandas library in Python.

A little bit of context

Since the approval of the resolution 2758(XXVI) by the United Nations back in 1971, Taiwan has been excluded from many international circles. Taiwan does not have a seat in the United Nations since their representatives were expelled after the approval of this resolution. At the same time, if you search for Taiwan in the list of countries and economies in the World Bank Data portal, you won’t find it.

To be honest, I have been reading a lot about this issue and I have found it to be very interesting. For example, the UN resolution states that “(…) the representatives of the Government of the People’s Republic of China are the only lawful representatives of China”, not that Taiwan is not a country. Which is a very important highlight if you want to use this resolution in favor of the “One-China policy”. This has opened a very interesting diplomatic strategy for the Government of Taipei in trying to highlight the difference between China and Taiwan. If you would like to read about this in more depth, I would suggest you to read this article by prof. Chien-Huei Wu.

This approach is even mmore interesting when you take a look at the data from public opinion polls. According to the National Chengchi University, less than 20% of the population living in the island considered themselves Taiwanese back in 1992, while more than 70% considered themselves chinese or both. Thirty years later, almost 65% of the population considered themselves taiwanese and not chinese (check this article by the BBC for more interesting data).

Now, as I stated in my previous blog post, I intended to use the World Bank Official Boundaries Data, and I still do. So, let’s take a look at the data, let’s subset China and see how it looks in a map.

import geopandas as gpd

# Loading World Bank Official Boundaries
raw_boundaries  = (gpd.read_file("WB_countries_Admin0.geojson"))

# Subsetting China -> Code "CHN"
china    = raw_boundaries.loc[raw_boundaries["WB_A3"] == "CHN"]

# Plotting the geometries
china.plot().set_axis_off()

png

I hope your geography knowledge is outstanding, but in case you are a little bit lost…

Do you see that small island in the southeast? That’s Taiwan! And it should not be displayed in this map because we were ONLY subsetting the geometries that belong to country code CHN, in other words, the People’s Republic of China. And given that there is no listed country code for Taiwan in the documentation, we can assume that these data follow the “One China” policy. Sad, I know. So, how do we solve this?

Possible solutions

The most logical solution that many of you would propose is to use a subnational boundaries or a ADM-1 level data. Sadly (but not surprisingly), the World Bank does not supply an official licensed ADM-1 data for subnational boundaries. Then, why don’t we use any other data source like our beloved geoBoundaries? As it happens, I need an official source for the map generator app, preferrably from the United Nations or the World Bank. Don’t ask too many questions.

Some of you will suggest me to solve this issue using ArcGIS or QGIS. However, and let’s be honest, I refuse to do this without coding. So, I just asked Google about how could I split a geometry and I quickly found something called “exploding”. Which, is a terrible name to be honest.

But, to be quite academic, exploding a geometry means to break a feature into its individual parts. This means that, if you have a multipolygon, you would end up with multiple individual and differentiable polygons. But if you have a polygon, you would end up with multiple individual and differentiable lines, and so on. The most important part, is that each of these splitted parts would be a different feature by itself. The Geopandas library comes with aa native API function that allows us to do this.

Using GeoPandas to explode geometries

Given that country boundaries are multipolygon features, if we were to explode China (the geometry, not the country), we should end up with a geopandas data frame containing the continental part of the country and all its islands, which are quite numerous. Let’s see what happens.

First, let’s take a look at our current subset for China:

china[["TYPE", "WB_A3", "NAME_EN", "geometry"]]

	TYPE	WB_A3	NAME_EN	geometry
8	Country	CHN	People's Republic of China	MULTIPOLYGON (((110.68507 20.15331, 110.67791 ...

As we can observe, we currently have a single-row geo-dataframe and its geometry states that it is a multipolygon. That means that if we explode this subset, we should en up with a geo data frame listing all of its polygons. Let’s see if that’s the case:

## Exploding China's geometry
china_ex = (china[["TYPE", "WB_A3", "CONTINENT", 
                   "REGION_UN", "SUBREGION", "REGION_WB", 
                   "NAME_EN", "WB_NAME", "WB_REGION", 
                   "geometry"]]
            .explode(index_parts = True)
            .reset_index(drop = True))

## How does the new geo-data-frame look like?
china_ex

	TYPE	WB_A3	CONTINENT	REGION_UN	SUBREGION	REGION_WB	NAME_EN	WB_NAME	WB_REGION	geometry
0	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((110.68507 20.15331, 110.67791 20.163...
1	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((110.44264 20.66352, 110.43702 20.667...
2	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((110.60955 20.89728, 110.62452 20.915...
3	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((109.10524 21.02448, 109.11256 21.027...
4	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((110.38982 21.09589, 110.37135 21.080...
...	...	...	...	...	...	...	...	...	...	...
66	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((122.74195 39.23607, 122.71860 39.241...
67	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((122.68442 39.26219, 122.66326 39.272...
68	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((121.35255 39.48241, 121.34059 39.481...
69	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((123.02996 39.50788, 123.04005 39.529...
70	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((123.47141 53.51901, 123.43167 53.535...

71 rows × 10 columns

china_ex.plot(column="geometry").set_axis_off()

png

As we can see, we now have a geo data frame with 71 rows. However, there is no way for us to know which feature is what. If we plot the data and color different features with different colors, we are only able to distinguish the three larger ones: continental China, the island of Taiwan, and the island of Hainan, respectively. Therefore, we need to highlight the second larger polygon. We can do this by estimating the area and subsetting

# Create a new column to store the area values
china_ex["area_m2"] = (china_ex
                       .to_crs("ESRI:54003")
                       .geometry
                       .area)

## Sorting values
china_exsort = china_ex.sort_values(by = "area_m2", 
                                    ascending = False)

## Which row index represents Taiwan?
china_exsort.iloc[1:2]

	TYPE	WB_A3	CONTINENT	REGION_UN	SUBREGION	REGION_WB	NAME_EN	WB_NAME	WB_REGION	geometry	area_m2
31	Country	CHN	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	POLYGON ((121.63600 25.22281, 121.59791 25.271...	4.153658e+10

As we can observe, Taiwan is the feature with row index 31 in the data frame. We could confirm this visually by plotting this feature alone. Again, I’m assuming that you have outstanding geography notions my dear padawan. But if you don’t, here you have the mapped feature that we got and a google maps image of how should Taiwan look like . If they look alike, probably we are doing things right.

china_exsort.iloc[1:2].plot().set_axis_off()

We have been succesfully been able to identify Taiwan, what do we do now? Well, we basically need to rejoin all the other islands back to China, remove the China/Taiwan feature from our boundaries data, and add the exploded data with China and Taiwan as separate features. Sounds complex, right? Well, it’s actually quite straightforward. Let’s go step by step.

First, we need to manually input Taiwan’s country data as follows:

# Manually inputting Taiwan's info
china_ex.at[31, 'WB_A3']   = "TWN"
china_ex.at[31, 'NAME_EN'] = "Taiwan"
china_ex.at[31, 'WB_NAME'] = "Taiwan"

Once we have inputed that data, we can dissolve the geometries to form a simplified data frame. If you are wondering what do I mean by dissolving, it’s bassically the opposite of exploding. Still not great naming skills, I know. We want to aggregate (or dissolve) all those islands to the same feature as continental China. Except for Taiwan, of course.

# Disolving exploded geopandas for China
china_taiwan = (china_ex
                .dissolve(by      = "WB_A3",
                          aggfunc = "first")).reset_index()

china_taiwan

	WB_A3	geometry	TYPE	CONTINENT	REGION_UN	SUBREGION	REGION_WB	NAME_EN	WB_NAME	WB_REGION	area_m2
0	CHN	MULTIPOLYGON (((110.86988 19.99702, 110.89015 ...	Country	Asia	Asia	Eastern Asia	East Asia & Pacific	People's Republic of China	China	EAP	3.727520e+10
1	TWN	POLYGON ((121.63600 25.22281, 121.59791 25.271...	Country	Asia	Asia	Eastern Asia	East Asia & Pacific	Taiwan	Taiwan	EAP	4.153658e+10

Once that we have China and Taiwan as different features, we proceed to append this new data to our world boundaries. Don’t forget to drop the previous China feature!!!

import pandas as pd

# Appending china+taiwan geopandas to raw_boundaries
raw_boundaries = (pd.concat([raw_boundaries.loc[raw_boundaries["WB_A3"] != "CHN"],
                             china_taiwan],
                             ignore_index = True))

This way, we have been able to split China and Taiwan as separate features without having to change our source data and just with a few lines of code.

Posted on:: July 29, 2023

Length:: 8 minute read, 1686 words

Tags:: GIS China Taiwan

See Also:: Simplification Algorithms in R; Programming Web Apps with Streamlit; Geocoding and visualizing address information with R