Study of Annual Migration Patterns of the Baltimore Oriole¶
Background on the Baltimore Oriole (Icterus galbula)¶
The Baltimore Oriole (Icterus galbula) is a colorful migratory bird found throughout the Eastern/Central part of North and Central America. The male has bright orange and black plummage which makes it easy to spot, with the female plummage being only slightly less brilliant. Like many species, the Baltimore Oriole spends the winters in the warmer climates of Central American, and migrates north during the summer (Cornell Lab of Ornithology). According to the Audubon Society, Orioles prefer open woods, riverside groves, elms, and shade trees, and generally prefer to nest on the edges of woodlands. The Audubon Society also notes that this species still has a healthy population, but "surveys show a decline in recent decades." In particular, Dutch elm disease during the mid 20th century may have contributed to a loss of habitat for Orioles. Climate change is another factor threatening Orioles. The Audubon Society lists fire weather, spring heat waves, and urbanization as leading climate change related factors that are negatively impacting Oriole populations.
Migration Mapping¶
Knowing the month-by-month range of the Balitmore Oriole's migration path can be a useful tool for conservation efforts. Since the Oriole is a migratory species, it relies on a wide range of habitats across Central and North America. Knowing when and where this species will be can aid conservation efforts. While this analysis focuses on the 2023 migration range, mapping the range over multiple years or decades can be an important way of assessing the health of the Oriole population.
Migration Data from GBIF¶
To map the Oriole's annual migration, I will be using occurrence data from the Global Biodiversity Information Facility (GBIF). Occurrence data is largely collected using eBird and iNaturalist, as well as several other reputable citizen science data sources (GBIF Data Processing).
Ecoregion Data¶
I will be plotting occurrence (sightings of Orioles) data within ecoregions to visualize the Oriole migration on a map. For the ecoregion data, I will be using data published in 2017 by RESOLVE and OneEarth.
# Important: If you haven't already, you need to install pygbif using the command below
# Run this command in the terminal to install: pip install pygbif
# Import packages
import os # For file paths, data management, interacting with the operating system
import pathlib # Object oriented file path management
import time # Time how long operations take in Python
import zipfile # Deal with zip files
from getpass import getpass # Security login for downloading GBIF data
from glob import glob # Search for specific file extensions
# Libraries for data analysis
import pandas as pd # Data analysis for tabular data
import geopandas as gpd # Data analysis for geospatial data
import pygbif.occurrences as occ # Get occurrence data
import pygbif.species as species # Get species data
import requests # Download data
# Libraries for Dynamic mapping
import cartopy.crs as ccrs #Coordinate reference systems
import panel as pn #customize the dynamic map
import hvplot.pandas #make the holoview plot with geo-dataframe
import calendar # Change month numbers to month names
Create data directory in user's home directory¶
# Define data directory in user's home directory
data_dir = os.path.join(
# Home directory
pathlib.Path.home(),
# Earth analytics data directory
'earth-analytics',
'data',
# Project directory
'baltimore-oriole-migration',
)
print(data_dir)
# Make the directory using the path defined above
os.makedirs(data_dir, exist_ok=True)
C:\Users\warno\earth-analytics\data\baltimore-oriole-migration
Download Data from GBIF¶
# GBIF LOGIN CODE CHUNK
# Code credit: Elsa Culler
# This code allows us to download data in Python from gbif.org
# This code ASKS for your credentials for gbif.org
# and saves it for the rest of the session.
# NEVER put your credentials into your code
# GBIF needs a username, password, and email
# All 3 need to match the account
reset = False
# Request and store username
if (not ('GBIF_USER' in os.environ)) or reset:
os.environ['GBIF_USER'] = input('GBIF username:')
# Securely request and store password
if (not ('GBIF_PWD' in os.environ)) or reset:
os.environ['GBIF_PWD'] = getpass('GBIF password:')
# Request and store account email address
if (not ('GBIF_EMAIL' in os.environ)) or reset:
os.environ['GBIF_EMAIL'] = input('GBIF email:')
Search for the Baltimore Oriole (Icterus galbula)¶
# Search for the Baltimore Oriole (Icterus galbula)
backbone = species.name_backbone(name='Icterus galbula')
# Save the species key
species_key = backbone["speciesKey"]
backbone
{'usageKey': 2484340,
'scientificName': 'Icterus galbula (Linnaeus, 1758)',
'canonicalName': 'Icterus galbula',
'rank': 'SPECIES',
'status': 'ACCEPTED',
'confidence': 99,
'matchType': 'EXACT',
'kingdom': 'Animalia',
'phylum': 'Chordata',
'order': 'Passeriformes',
'family': 'Icteridae',
'genus': 'Icterus',
'species': 'Icterus galbula',
'kingdomKey': 1,
'phylumKey': 44,
'classKey': 212,
'orderKey': 729,
'familyKey': 6176,
'genusKey': 2484282,
'speciesKey': 2484340,
'class': 'Aves'}
Submit download request to GBIF and download occurrence data for year of interest¶
gbif_pattern = os.path.join(data_dir, '*.csv')
if not glob(gbif_pattern):
# Only submit one request
if not 'GBIF_DOWNLOAD_KEY' in os.environ:
# Submit query to GBIF
gbif_query = occ.download([
f"speciesKey = { species_key }",
"hasCoordinate = True",
"year = 2024",
])
os.environ['GBIF_DOWNLOAD_KEY'] = gbif_query[0]
# Wait for download to build
download_key = os.environ['GBIF_DOWNLOAD_KEY']
wait = occ.download_meta(download_key)['status']
while not wait=='SUCCEEDED':
wait = occ.download_meta(download_key)['status']
time.sleep(5)
# Download GBIF data
download_info = occ.download_get(
os.environ['GBIF_DOWNLOAD_KEY'],
path=data_dir)
# Unzip GBIF data
with zipfile.ZipFile(download_info['path']) as download_zip:
download_zip.extractall(path=data_dir)
# Find extracted .csv file path and take 1st result
gbif_path = glob(gbif_pattern)[0]
Read in the downloaded data from GBIF¶
with open(gbif_path) as f:
for i in range(2):
print(f.readline().strip())
gbifID datasetKey occurrenceID kingdom phylum class order family genus species infraspecificEpithet taxonRank scientificName verbatimScientificName verbatimScientificNameAuthorship countryCode locality stateProvince occurrenceStatus individualCount publishingOrgKey decimalLatitude decimalLongitude coordinateUncertaintyInMeters coordinatePrecision elevation elevationAccuracy depth depthAccuracy eventDate day month year taxonKey speciesKey basisOfRecord institutionCode collectionCode catalogNumber recordNumber identifiedBy dateIdentified license rightsHolder recordedBy typeStatus establishmentMeans lastInterpreted mediaType issue 4923513378 36f15a36-6b45-442e-9c31-cd633423aee0 b7ea4555-d6e0-49d9-adaa-ba3f5cad4d85 Animalia Chordata Aves Passeriformes Icteridae Icterus Icterus galbula galbula SUBSPECIES Icterus galbula galbula (Linnaeus, 1758) Icterus galbula galbula US McCormick Place Illinois PRESENT 7b8aff00-a9f8-11d8-944b-b8a03c50a862 41.852812 -87.611721 2454.0 2023-05-16 16 5 2023 5229901 2484340 PRESERVED_SPECIMEN FMNH Birds 530909 4623183 CC0_1_0 The Field Museum of Natural History D. E. Willard 2025-10-07T19:21:28.597Z COORDINATE_ROUNDED;INSTITUTION_MATCH_FUZZY;COLLECTION_MATCH_FUZZY
Load the GBIF Data¶
# Load the GBIF data, this time defining the delimiter (tabular) represents as '\t'
# We also set the index to gbifID, and select the columns of interest (lat, long, and month)
gbif_df = pd.read_csv(
gbif_path,
delimiter='\t',
index_col='gbifID',
usecols=['gbifID','decimalLongitude','decimalLatitude','month']
)
gbif_df.head()
| decimalLatitude | decimalLongitude | month | |
|---|---|---|---|
| gbifID | |||
| 4923513378 | 41.852812 | -87.611721 | 5.0 |
| 4444338249 | 29.757400 | -95.392000 | 4.0 |
| 4158425811 | 41.913727 | -82.509377 | 5.0 |
| 4059974834 | 56.351650 | 12.815890 | 3.0 |
| 5139450263 | 56.349510 | 12.807660 | 1.0 |
Converting the dataframe to a geodataframe¶
# Converting the dataframe to a geodataframe
# Using geopandas library
gbif_gdf = (
gpd.GeoDataFrame(
gbif_df,
geometry=gpd.points_from_xy(
gbif_df.decimalLongitude,
gbif_df.decimalLatitude),
crs="EPSG:4326") # Manually set the crs to match the ecoregions data
# Select the desired columns
[['month','geometry']]
)
gbif_gdf
| month | geometry | |
|---|---|---|
| gbifID | ||
| 4923513378 | 5.0 | POINT (-87.61172 41.85281) |
| 4444338249 | 4.0 | POINT (-95.392 29.7574) |
| 4158425811 | 5.0 | POINT (-82.50938 41.91373) |
| 4059974834 | 3.0 | POINT (12.81589 56.35165) |
| 5139450263 | 1.0 | POINT (12.80766 56.34951) |
| ... | ... | ... |
| 4173218595 | 5.0 | POINT (-83.4423 44.2588) |
| 4173220499 | 5.0 | POINT (-83.1897 41.6275) |
| 4173225614 | 5.0 | POINT (-83.4423 44.2588) |
| 4173224607 | 5.0 | POINT (-83.4423 44.2588) |
| 4444336253 | 4.0 | POINT (-95.6255 30.2757) |
517251 rows × 2 columns
Set up the ecoregion boundary URL¶
# Ecoregions data from GeographyRealm, RESOLVE, and OneEarth:
# GeographyRealm: https://www.geographyrealm.com/terrestrial-ecoregions-gis-data/
# OneEarth: https://www.oneearth.org/announcing-the-release-of-ecoregion-snapshots/
eco_url = (
"https://storage.googleapis.com/teow2016/Ecoregions2017.zip")
# Set up a path to save the data to your machine
eco_dir = os.path.join(data_dir, 'resolve_ecoregions')
# Make ecoregions directory
os.makedirs(eco_dir, exist_ok=True)
# Join ecoregions shapefile path
eco_path = os.path.join(eco_dir, 'ecoregions.shp')
# Only download once
if not os.path.exists(eco_path):
my_gdf = gpd.read_file(eco_url)
my_gdf.to_file(eco_path)
Open the downloaded ecoregion boundaries¶
# Opening the download and renaming some of the column names
eco_gdf = (
gpd.read_file(eco_path)
[['OBJECTID','ECO_NAME','SHAPE_AREA','geometry']]
.rename(columns={
'OBJECTID': 'ecoregion_id',
'ECO_NAME': 'name',
'SHAPE_AREA': 'area'
})
.set_index('ecoregion_id')
)
# Plot the ecoregions quickly to check download
eco_gdf.plot()
eco_gdf.head()
| name | area | geometry | |
|---|---|---|---|
| ecoregion_id | |||
| 1.0 | Adelie Land tundra | 0.038948 | MULTIPOLYGON (((158.7141 -69.60657, 158.71264 ... |
| 2.0 | Admiralty Islands lowland rain forests | 0.170599 | MULTIPOLYGON (((147.28819 -2.57589, 147.2715 -... |
| 3.0 | Aegean and Western Turkey sclerophyllous and m... | 13.844952 | MULTIPOLYGON (((26.88659 35.32161, 26.88297 35... |
| 4.0 | Afghan Mountains semi-desert | 1.355536 | MULTIPOLYGON (((65.48655 34.71401, 65.52872 34... |
| 5.0 | Ahklun and Kilbuck Upland Tundra | 8.196573 | MULTIPOLYGON (((-160.26404 58.64097, -160.2673... |
Join the ecoregion and GBIF data¶
To identify the ecoregion for each observation, we need to join the eco_gdf and gbif_gdf together. The joined data is a new variable called gbif_ecoregion_gdf
gbif_ecoregion_gdf = (
eco_gdf
# Match the CRS of the GBIF data and the ecoregions
.to_crs(gbif_gdf.crs)
# Find ecoregion for each observation
.sjoin(
gbif_gdf,
how='inner',
predicate='contains')
)
gbif_ecoregion_gdf
| name | area | geometry | gbifID | month | |
|---|---|---|---|---|---|
| ecoregion_id | |||||
| 13.0 | Alberta-British Columbia foothills forests | 17.133639 | MULTIPOLYGON (((-119.53979 55.81661, -119.5443... | 5395228674 | 6.0 |
| 13.0 | Alberta-British Columbia foothills forests | 17.133639 | MULTIPOLYGON (((-119.53979 55.81661, -119.5443... | 4843921873 | 6.0 |
| 13.0 | Alberta-British Columbia foothills forests | 17.133639 | MULTIPOLYGON (((-119.53979 55.81661, -119.5443... | 4845489589 | 6.0 |
| 13.0 | Alberta-British Columbia foothills forests | 17.133639 | MULTIPOLYGON (((-119.53979 55.81661, -119.5443... | 4844073404 | 6.0 |
| 13.0 | Alberta-British Columbia foothills forests | 17.133639 | MULTIPOLYGON (((-119.53979 55.81661, -119.5443... | 4620230968 | 7.0 |
| ... | ... | ... | ... | ... | ... |
| 839.0 | Northern Rockies conifer forests | 35.905513 | POLYGON ((-119.99977 54.53117, -119.8914 54.45... | 5317118606 | 8.0 |
| 839.0 | Northern Rockies conifer forests | 35.905513 | POLYGON ((-119.99977 54.53117, -119.8914 54.45... | 5491326849 | 8.0 |
| 839.0 | Northern Rockies conifer forests | 35.905513 | POLYGON ((-119.99977 54.53117, -119.8914 54.45... | 4779690761 | 8.0 |
| 839.0 | Northern Rockies conifer forests | 35.905513 | POLYGON ((-119.99977 54.53117, -119.8914 54.45... | 5315148847 | 5.0 |
| 839.0 | Northern Rockies conifer forests | 35.905513 | POLYGON ((-119.99977 54.53117, -119.8914 54.45... | 4804308520 | 6.0 |
505945 rows × 5 columns
Count observations in each ecoregion each month¶
# Count the observations in each ecoregion each month
occurrence_df = (
gbif_ecoregion_gdf
# Reset index
.reset_index()
# For each ecoregion, for each month...
.groupby(['ecoregion_id','month'])
# ...count the number of occurrences
# creating a new column 'occurrences' that counts all the gbif IDs in a group
.agg(
occurrences=('gbifID', 'count'),
area=('area','first'))
)
Normalize by area¶
This step gives us a new column called "density" which shows the number of occurrences divided by the ecoregion area. Not all ecoregions are the same size, so this step accounts for this. The density column gives us a count of the number of occurences for a given unit area.
#Normalize by area
occurrence_df['density'] = (
occurrence_df.occurrences
/ occurrence_df.area
)
# Get rid of rare observations (possible misidentification?)
occurrence_df = occurrence_df[occurrence_df.occurrences>1]
Take the mean by ecoregion¶
Gives us the mean number of occurrences in an ecoregion.
# Take the mean by ecoregion
mean_occurrences_by_ecoregion = (
occurrence_df
.groupby('ecoregion_id')
.mean()
)
mean_occurrences_by_ecoregion
| occurrences | area | density | |
|---|---|---|---|
| ecoregion_id | |||
| 13.0 | 23.250000 | 17.133639 | 1.356980 |
| 17.0 | 1820.714286 | 7.958751 | 228.768855 |
| 23.0 | 8.166667 | 3.346216 | 2.440568 |
| 33.0 | 1304.750000 | 16.637804 | 78.420808 |
| 34.0 | 1145.875000 | 18.674884 | 61.359151 |
| ... | ... | ... | ... |
| 796.0 | 144.500000 | 49.311356 | 2.930359 |
| 809.0 | 27.142857 | 4.295824 | 6.318429 |
| 810.0 | 37.888889 | 5.968650 | 6.347984 |
| 833.0 | 4.400000 | 0.610793 | 7.203744 |
| 839.0 | 32.000000 | 35.905513 | 0.891228 |
123 rows × 3 columns
Take the mean by month¶
Gives us the mean number of occurrences in a given month.
# Take the mean by month
mean_occurrences_by_month = (
occurrence_df
.groupby('month')
.mean()
)
mean_occurrences_by_month
| occurrences | area | density | |
|---|---|---|---|
| month | |||
| 1.0 | 186.096774 | 7.103442 | 119.720622 |
| 2.0 | 167.935484 | 5.971543 | 114.267011 |
| 3.0 | 147.676923 | 5.911391 | 119.012923 |
| 4.0 | 339.815789 | 9.274769 | 57.863214 |
| 5.0 | 4020.181818 | 17.520827 | 324.009395 |
| 6.0 | 1846.558140 | 24.536460 | 149.367654 |
| 7.0 | 731.162162 | 22.870137 | 60.125886 |
| 8.0 | 656.714286 | 19.313669 | 60.287617 |
| 9.0 | 214.259740 | 13.187489 | 46.109531 |
| 10.0 | 149.589041 | 8.409373 | 63.454302 |
| 11.0 | 126.456140 | 8.570266 | 64.232304 |
| 12.0 | 139.112676 | 7.586825 | 63.587655 |
Normalize by space and time for sampling effort¶
The goal of this step is to remove systematic differences such as some months have more sampling efforts because more birdwatchers are outside, and some regions are more heavily surveyed.
The goal of the new column, "norm_occurrences" is to give an occurrence count that is normalized for space and time differences.
# Normalize by space and time for sampling effort
occurrence_df['norm_occurrences'] = (
occurrence_df[['density']]
/ mean_occurrences_by_ecoregion[['density']]
/ mean_occurrences_by_month[['density']]
)
occurrence_df
| occurrences | area | density | norm_occurrences | ||
|---|---|---|---|---|---|
| ecoregion_id | month | ||||
| 13.0 | 5.0 | 46 | 17.133639 | 2.684777 | 0.006106 |
| 6.0 | 33 | 17.133639 | 1.926036 | 0.009502 | |
| 7.0 | 9 | 17.133639 | 0.525282 | 0.006438 | |
| 8.0 | 5 | 17.133639 | 0.291824 | 0.003567 | |
| 17.0 | 4.0 | 56 | 7.958751 | 7.036280 | 0.000532 |
| ... | ... | ... | ... | ... | ... |
| 833.0 | 12.0 | 3 | 0.610793 | 4.911644 | 0.010722 |
| 839.0 | 5.0 | 12 | 35.905513 | 0.334211 | 0.001157 |
| 6.0 | 79 | 35.905513 | 2.200219 | 0.016528 | |
| 7.0 | 13 | 35.905513 | 0.362061 | 0.006757 | |
| 8.0 | 24 | 35.905513 | 0.668421 | 0.012440 |
738 rows × 4 columns
occurrence_df.norm_occurrences.plot.hist()
<Axes: ylabel='Frequency'>
Figure 1: This histogram plot shows that there is a higher frequency of normalized ecoregions with low occurrence values, and fewer ecoregions with higher occurrence values.
# Plotting a scatter with regular "occurrences" by month
occurrence_df.reset_index().plot.scatter(
x='month', y='occurrences', c='ecoregion_id', logy=True
)
<Axes: xlabel='month', ylabel='occurrences'>
Figure 2: This plot shows a fairly even spread of occurences across the months, suggesting that the data is not heavily influenced by temporal differences in bird observations. However, there are still a higher number during the summer months.
# Plotting a scatter with normalized occurrences by month
occurrence_df.reset_index().plot.scatter(
x='month', y='norm_occurrences', c='ecoregion_id', logy=True
)
<Axes: xlabel='month', ylabel='norm_occurrences'>
Figure 3: Comparing this plot to the un-normalized occurrences, this data has a more clear horizonal cluster along the top of the graph (higher values). There is also a slight dip in the number of occurrences during the summer months, suggesting that the normalized version may be overcompensating for higher occurrences during the summer found in the raw data.
Plotting geographically¶
Next, we will prepare to plot the data geographically. This is a more effective way of interpreting the data.
# Simplify the geometry to speed up processing
eco_gdf.geometry = eco_gdf.simplify(.1, preserve_topology=False)
# Change the CRS to Mercator for mapping
eco_gdf = eco_gdf.to_crs(ccrs.Mercator())
# Check that the plot runs in a reasonable amount of time
eco_gdf.hvplot(geo=True, crs=ccrs.Mercator())
Create an interactive plot of the migration data¶
Note: The plot will not display in the HTML version of this notebook.
# Join the occurrences with the plotting GeoDataFrame
occurrence_gdf = eco_gdf.join(occurrence_df[['norm_occurrences']])
# Remove any empty geometry rows. These can cause errors if they persist in the data.
occurrence_gdf = occurrence_gdf[~occurrence_gdf.geometry.is_empty]
# Get the plot bounds so they don't change with the slider
# xmin, ymin, xmax, ymax = gbif_gdf.to_crs(ccrs.Mercator()).total_bounds
# This code above will plot the entire range of the data (even other continents/unusual sightings)
# I am just interested in the South-North American data, so I set the bounds manually.
xmin = -14821585.49397232
xmax = -4000000.0
ymax = 8826427.537488604
ymin = -1000000.0
# List month names on the slider
month_widget = pn.widgets.DiscreteSlider(
options={
calendar.month_name[month_num]: month_num
for month_num in range(1,13)
}
)
# Plot occurrence by ecoregion and month
migration_plot = (
occurrence_gdf
.hvplot(
c='norm_occurrences',
groupby='month',
# Use background tiles
geo=True, crs=ccrs.Mercator(), tiles='CartoLight',
title="Baltimore Oriole Migration, 2024",
xlim=(xmin, xmax), ylim=(ymin, ymax),
frame_height=500,
widgets={'month': month_widget},
widget_location='bottom'
)
)
# Save the plot
migration_plot.save('baltimore-oriole-migration.html', embed=True)
# Display the plot
migration_plot
WARNING:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='p204203', ...)
BokehModel(combine_events=True, render_bundle={'docs_json': {'cd833894-6a91-45d1-ab89-a745994ab6d8': {'version…
Results and Analysis: According to the interative plot above, the Baltimore Oriole spends the coldest winter months (December/January) in warmer climates with ecoregions with dense observations in South and Central America. Surprisingly, there are also less dense ecoregions with Orioles along the East Coast of North America, extending into Canada during the coldest months. This suggests that while Orioles may prefer the warmer climates of the South, they can still tolorate the colder winter weather along the East Coast. There are also observations of Orioles on the Carribean islands in most months except for summer, suggesting that Orioles are very capable of migrating across fairly large water bodies.
During the summer months, Orioles are almost solely observed in the U.S. and Canada, and are almost completely absent from their winter habitats in Central and South America. Additionally, the Orioles range appears to be limited by the geographic barrier of the Rocky Mountains. This might be explained by the Oriole's habitat preferences which don't align well with the dense forests and steep terrain of the Rockies. However, the Oriole is not affraid to venture far north, with ecoregions with very dense summer observations found in the Northern U.S. and Canada.
These results align closely with maps and information provided by the Cornell Lab of Ornithology and the Audubon Society.