The maps you create in this course portray the surface of the earth in two dimensions. But, as you know, the world is actually a three-dimensional globe. So we have to use a method called a map projection to render it as a flat surface.

Map projections can’t be 100% accurate. Each projection distorts the surface of the Earth in some way, while retaining some useful property. For instance,

  • The equal-area projections (like “Lambert Cylindrical Equal Area”, or “Africa Albers Equal Area Conic”) preserve area. This is a good choice, if you’d like to calculate the area of a country or city, for example.
  • The equidistant projections (like “Azimuthal Equidistant projection”) preserve distance. This would be a good choice for calculating flight distance.


webp


We use a coordinate reference system (CRS) to show how the projected points correspond to real locations on Earth. In this tutorial, you’ll learn more about coordinate reference systems, along with how to use them in GeoPandas.

import geopandas as gpd
import pandas as pd

Setting the CRS

When we create a GeoDataFrame from a shapefile, the CRS is already imported for us.

# Load a GeoDataFrame containing regions in Ghana
regions = gpd.read_file("../input/geospatial-learn-course-data/ghana/ghana/Regions/Map_of_Regions_in_Ghana.shp")
print(regions.crs)
epsg:32630

How do you interpret that?

Coordinate reference systems are referenced by European Petroleum Survey Group (EPSG) codes.

This GeoDataFrame uses EPSG 32630, which is more commonly called the “Mercator” projection. This projection preserves angles (making it useful for sea navigation) and slightly distorts area.

However, when creating a GeoDataFrame from a CSV file, we have to set the CRS. EPSG 4326 corresponds to coordinates in latitude and longitude.

# Create a DataFrame with health facilities in Ghana
facilities_df = pd.read_csv("../input/geospatial-learn-course-data/ghana/ghana/health_facilities.csv")

# Convert the DataFrame to a GeoDataFrame
facilities = gpd.GeoDataFrame(facilities_df, geometry=gpd.points_from_xy(facilities_df.Longitude, facilities_df.Latitude))

# Set the coordinate reference system (CRS) to EPSG 4326
facilities.crs = {'init': 'epsg:4326'}

# View the first five rows of the GeoDataFrame
facilities.head()
  Region District FacilityName Type Town Ownership Latitude Longitude geometry
0 Ashanti Offinso North A.M.E Zion Clinic Clinic Afrancho CHAG 7.40801 -1.96317 POINT (-1.96317 7.40801)
1 Ashanti Bekwai Municipal Abenkyiman Clinic Clinic Anwiankwanta Private 6.46312 -1.58592 POINT (-1.58592 6.46312)
2 Ashanti Adansi North Aboabo Health Centre Health Centre Aboabo No 2 Government 6.22393 -1.34982 POINT (-1.34982 6.22393)
3 Ashanti Afigya-Kwabre Aboabogya Health Centre Health Centre Aboabogya Government 6.84177 -1.61098 POINT (-1.61098 6.84177)
4 Ashanti Kwabre Aboaso Health Centre Health Centre Aboaso Government 6.84177 -1.61098 POINT (-1.61098 6.84177)

In the code cell above, to create a GeoDataFrame from a CSV file, we needed to use both Pandas and GeoPandas:

  • We begin by creating a DataFrame containing columns with latitude and longitude coordinates.
  • To convert it to a GeoDataFrame, we use gpd.GeoDataFrame().
  • The gpd.points_from_xy() function creates Point objects from the latitude and longitude columns.

Re-projecting

Re-projecting refers to the process of changing the CRS. This is done in GeoPandas with the to_crs() method.

When plotting multiple GeoDataFrames, it’s important that they all use the same CRS. In the code cell below, we change the CRS of the facilities GeoDataFrame to match the CRS of regions before plotting it.

# Create a map
ax = regions.plot(figsize=(8,8), color='whitesmoke', linestyle=':', edgecolor='black')
facilities.to_crs(epsg=32630).plot(markersize=1, ax=ax)


webp


The to_crs() method modifies only the “geometry” column: all other columns are left as-is.

# The "Latitude" and "Longitude" columns are unchanged
facilities.to_crs(epsg=32630).head()
  Region District FacilityName Type Town Ownership Latitude Longitude geometry
0 Ashanti Offinso North A.M.E Zion Clinic Clinic Afrancho CHAG 7.40801 -1.96317 POINT (614422.662 818986.851)
1 Ashanti Bekwai Municipal Abenkyiman Clinic Clinic Anwiankwanta Private 6.46312 -1.58592 POINT (656373.863 714616.547)
2 Ashanti Adansi North Aboabo Health Centre Health Centre Aboabo No 2 Government 6.22393 -1.34982 POINT (682573.395 688243.477)
3 Ashanti Afigya-Kwabre Aboabogya Health Centre Health Centre Aboabogya Government 6.84177 -1.61098 POINT (653484.490 756478.812)
4 Ashanti Kwabre Aboaso Health Centre Health Centre Aboaso Government 6.84177 -1.61098 POINT (653484.490 756478.812)

In case the EPSG code is not available in GeoPandas, we can change the CRS with what’s known as the “proj4 string” of the CRS. For instance, the proj4 string to convert to latitude/longitude coordinates is as follows:

+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
# Change the CRS to EPSG 4326
regions.to_crs("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs").head()
  Region geometry
0 Ashanti POLYGON ((-1.30985 7.62302, -1.30786 7.62198, …
1 Brong Ahafo POLYGON ((-2.54567 8.76089, -2.54473 8.76071, …
2 Central POLYGON ((-2.06723 6.29473, -2.06658 6.29420, …
3 Eastern POLYGON ((-0.21751 7.21009, -0.21747 7.20993, …
4 Greater Accra POLYGON ((0.23456 6.10986, 0.23484 6.10974, 0….

Attributes of geometric objects

As you learned in the first tutorial, for an arbitrary GeoDataFrame, the type in the “geometry” column depends on what we are trying to show: for instance, we might use:

  • A Point for the epicenter of an earthquake.
  • A LineString for a street.
  • A Polygon to show country boundaries.

All three types of geometric objects have built-in attributes that you can use to quickly analyze the dataset. For instance, you can get the x- and y-coordinates of a Point from the $x$ and $y$ attributes, respectively.

# Get the x-coordinate of each point
facilities.geometry.head().x
0   -1.96317
1   -1.58592
2   -1.34982
3   -1.61098
4   -1.61098
dtype: float64

And, you can get the length of a LineString from the length attribute.

Or, you can get the area of a Polygon from the area attribute.

# Calculate the area (in square meters) of each polygon in the GeoDataFrame 
regions.loc[:, "AREA"] = regions.geometry.area / 10**6

print("Area of Ghana: {} square kilometers".format(regions.AREA.sum()))
print("CRS:", regions.crs)
regions.head()
Area of Ghana: 239584.5760055668 square kilometers
CRS: epsg:32630
  Region geometry AREA
0 Ashanti POLYGON ((686446.075 842986.894, 686666.193 84… 24379.017777
1 Brong Ahafo POLYGON ((549970.457 968447.094, 550073.003 96… 40098.168231
2 Central POLYGON ((603176.584 695877.238, 603248.424 69… 9665.626760
3 Eastern POLYGON ((807307.254 797910.553, 807311.908 79… 18987.625847
4 Greater Accra POLYGON ((858081.638 676424.913, 858113.115 67… 3706.511145

In the code cell above, since the CRS of the regions GeoDataFrame is set to EPSG 32630 (a “Mercator” projection), the area calculation is slightly less accurate than if we had used an equal-area projection like “Africa Albers Equal Area Conic”.

But this yields the area of Ghana as approximately 239585 square kilometers, which is not too far off from the correct answer.

Example

You are a bird conservation expert and want to understand migration patterns of purple martins. In your research, you discover that these birds typically spend the summer breeding season in the eastern United States, and then migrate to South America for the winter. But since this bird is under threat of endangerment, you’d like to take a closer look at the locations that these birds are more likely to visit.


webp


There are several protected areas in South America, which operate under special regulations to ensure that species that migrate (or live) there have the best opportunity to thrive. You’d like to know if purple martins tend to visit these areas. To answer this question, you’ll use some recently collected data that tracks the year-round location of eleven different birds.

Before you get started, run the code cell below to set everything up.

import pandas as pd
import geopandas as gpd

from shapely.geometry import LineString

Run the next code cell (without changes) to load the GPS data into a pandas DataFrame birds_df.

# Load the data and print the first 5 rows
birds_df = pd.read_csv("../input/geospatial-learn-course-data/purple_martin.csv", parse_dates=['timestamp'])
print("There are {} different birds in the dataset.".format(birds_df["tag-local-identifier"].nunique()))
birds_df.head()

There are 11 birds in the dataset, where each bird is identified by a unique value in the “tag-local-identifier” column. Each bird has several measurements, collected at different times of the year.

Use the next code cell to create a GeoDataFrame birds.

  • Birds should have all of the columns from birds_df, along with a “geometry” column that contains Point objects with (longitude, latitude) locations.
  • Set the CRS of birds to {‘init’: ‘epsg:4326’}.
# Create the GeoDataFrame
birds = gpd.GeoDataFrame(birds_df, geometry=gpd.points_from_xy(birds_df["location-long"], birds_df["location-lat"]))

# Set the CRS to {'init': 'epsg:4326'}
birds.crs = {'init' :'epsg:4326'}

Plot the Data

Next, we load in the ‘naturalearth_lowres’ dataset from GeoPandas, and set americas to a GeoDataFrame containing the boundaries of all countries in the Americas (both North and South America). Run the next code cell without changes.

# Load a GeoDataFrame with country boundaries in North/South America, print the first 5 rows
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
americas = world.loc[world['continent'].isin(['North America', 'South America'])]
americas.head()
  pop_est continent name iso_a3 gdp_md_est geometry
3 35623680 North America Canada CAN 1674000.0 MULTIPOLYGON (((-122.84000 49.00000, -122.9742…
4 326625791 North America United States of America USA 18560000.0 MULTIPOLYGON (((-122.84000 49.00000, -120.0000…
9 44293293 South America Argentina ARG 879400.0 MULTIPOLYGON (((-68.63401 -52.63637, -68.25000…
10 17789267 South America Chile CHL 436100.0 MULTIPOLYGON (((-68.63401 -52.63637, -68.63335…
16 10646714 North America Haiti HTI 19340.0 POLYGON ((-71.71236 19.71446, -71.62487 19.169…

Use the next code cell to create a single plot that shows both: (1) the country boundaries in the americas GeoDataFrame, and (2) all of the points in the birds_gdf GeoDataFrame.

Don’t worry about any special styling here; just create a preliminary plot, as a quick sanity check that all of the data was loaded properly. In particular, you don’t have to worry about color-coding the points to differentiate between birds, and you don’t have to differentiate starting points from ending points. We’ll do that in the next part of the exercise.

# Your code here
ax = americas.plot(figsize=(10,10), color='white', linestyle=':', edgecolor='gray')
birds.plot(ax=ax, markersize=10)


webp


Where does each bird start and end its journey? (Part 1)

Now, we’re ready to look more closely at each bird’s path. Run the next code cell to create two GeoDataFrames:

  • path_gdf contains LineString objects that show the path of each bird. It uses the LineString() method to create a LineString object from a list of Point objects.
  • start_gdf contains the starting points for each bird.
# GeoDataFrame showing path for each bird
path_df = birds.groupby("tag-local-identifier")['geometry'].apply(list).apply(lambda x: LineString(x)).reset_index()
path_gdf = gpd.GeoDataFrame(path_df, geometry=path_df.geometry)
path_gdf.crs = {'init' :'epsg:4326'}

# GeoDataFrame showing starting point for each bird
start_df = birds.groupby("tag-local-identifier")['geometry'].apply(list).apply(lambda x: x[0]).reset_index()
start_gdf = gpd.GeoDataFrame(start_df, geometry=start_df.geometry)
start_gdf.crs = {'init' :'epsg:4326'}

# Show first five rows of GeoDataFrame
start_gdf.head()
  tag-local-identifier geometry
0 30048 POINT (-90.12992 20.73242)
1 30054 POINT (-93.60861 46.50563)
2 30198 POINT (-80.31036 25.92545)
3 30263 POINT (-76.78146 42.99209)
4 30275 POINT (-76.78213 42.99207)

Use the next code cell to create a GeoDataFrame end_gdf containing the final location of each bird.

The format should be identical to that of start_gdf, with two columns (“tag-local-identifier” and “geometry”), where the “geometry” column contains Point objects. Set the CRS of end_gdf to {‘init’: ‘epsg:4326’}.

end_df = birds.groupby("tag-local-identifier")['geometry'].apply(list).apply(lambda x: x[-1]).reset_index()
end_gdf = gpd.GeoDataFrame(end_df, geometry=end_df.geometry)
end_gdf.crs = {'init': 'epsg:4326'}

Where does each bird start and end its journey? (Part 2)

Use the GeoDataFrames from the question above (path_gdf, start_gdf, and end_gdf) to visualize the paths of all birds on a single map. You may also want to use the americas GeoDataFrame.

ax = americas.plot(figsize=(10, 10), color='white', linestyle=':', edgecolor='gray')

start_gdf.plot(ax=ax, color='red',  markersize=30)
path_gdf.plot(ax=ax, cmap='tab20b', linestyle='-', linewidth=1, zorder=1)
end_gdf.plot(ax=ax, color='black', markersize=30)


webp


Where are the protected areas in South America? (Part 1)

It looks like all of the birds end up somewhere in South America. But are they going to protected areas?

In the next code cell, you’ll create a GeoDataFrame protected_areas containing the locations of all of the protected areas in South America. The corresponding shapefile is located at filepath protected_filepath.

# Path of the shapefile to load
protected_filepath = "../input/geospatial-learn-course-data/SAPA_Aug2019-shapefile/SAPA_Aug2019-shapefile/SAPA_Aug2019-shapefile-polygons.shp"

# Your code here
protected_areas = gpd.read_file(protected_filepath)

Where are the protected areas in South America? (Part 2)

Create a plot that uses the protected_areas GeoDataFrame to show the locations of the protected areas in South America. (You’ll notice that some protected areas are on land, while others are in marine waters.)

# Country boundaries in South America
south_america = americas.loc[americas['continent']=='South America']

# Your code here: plot protected areas in South America
# Plot protected areas in South America
ax = south_america.plot(figsize=(10,10), color='white', edgecolor='gray')
protected_areas.plot(ax=ax, alpha=0.4)

What percentage of South America is protected?

You’re interested in determining what percentage of South America is protected, so that you know how much of South America is suitable for the birds.

As a first step, you calculate the total area of all protected lands in South America (not including marine area). To do this, you use the “REP_AREA” and “REP_M_AREA” columns, which contain the total area and total marine area, respectively, in square kilometers.

Run the code cell below without changes.

P_Area = sum(protected_areas['REP_AREA']-protected_areas['REP_M_AREA'])
print("South America has {} square kilometers of protected areas.".format(P_Area))

Then, to finish the calculation, you’ll use the south_america GeoDataFrame.

south_america.head()
  pop_est continent name iso_a3 gdp_md_est geometry
9 44293293 South America Argentina ARG 879400.0 MULTIPOLYGON (((-68.63401 -52.63637, -68.25000…
10 17789267 South America Chile CHL 436100.0 MULTIPOLYGON (((-68.63401 -52.63637, -68.63335…
20 2931 South America Falkland Is. FLK 281.8 POLYGON ((-61.20000 -51.85000, -60.00000 -51.2…
28 3360148 South America Uruguay URY 73250.0 POLYGON ((-57.62513 -30.21629, -56.97603 -30.1…
29 207353391 South America Brazil BRA 3081000.0 POLYGON ((-53.37366 -33.76838, -53.65054 -33.2…

Calculate the total area of South America by following these steps:

  • Calculate the area of each country using the area attribute of each polygon (with EPSG 3035 as the CRS), and add up the results. The calculated area will be in units of square meters.
  • Convert your answer to have units of square kilometeters.
# Your code here: Calculate the total area of South America (in square kilometers)
totalArea = sum(south_america.geometry.to_crs(epsg=3035).area) / 10**6

Run the code cell below to calculate the percentage of South America that is protected.

# What percentage of South America is protected?
percentage_protected = P_Area/totalArea
print('Approximately {}% of South America is protected.'.format(round(percentage_protected*100, 2)))
Approximately 30.39% of South America is protected.

Where are the birds in South America?

So, are the birds in protected areas?

Create a plot that shows for all birds, all of the locations where they were discovered in South America. Also plot the locations of all protected areas in South America.

To exclude protected areas that are purely marine areas (with no land component), you can use the “MARINE” column (and plot only the rows in protected_areas[protected_areas[‘MARINE’]!=’2’], instead of every row in the protected_areas GeoDataFrame).

ax = south_america.plot(figsize=(10,10), color='white', edgecolor='gray')
protected_areas[protected_areas['MARINE']!='2'].plot(ax=ax, alpha=0.4, zorder=1)
birds[birds.geometry.y < 0].plot(ax=ax, color='red', alpha=0.6, markersize=10, zorder=2)


webp