I have officially released ebird-haskell: a set of libraries and tools for working with eBird data in Haskell. Specifically, there are three components:
- ebird-api: A library that provides a complete description of the public eBird API as a servant API type. It also provides types for the litany of values that the eBird API communicates in, and convenient instances and functions for operating on those types.
- ebird-client: A library that provides functions for querying any endpoint of the eBird API, based on the description in the ebird-api library.
- ebird-cli: An executable command-line utility that can query any endpoint of the eBird API and pretty-print the response data.
This post serves as announcement of these tools (a “call for users”, if you will) and an informal tutorial to help birders turned Haskell programmers or Haskell programmers turned birders get started.
What is eBird?
eBird is a massive collection of ornithological science projects developed by the Cornell Lab of Ornithology. The eBird application is a mobile and web application that allows birders to easily contribute their past and present observations to eBird’s database. Using the huge amount of data1 that eBird collects and maintains, scientists are able to make conclusions that inform and improve environmental conservation efforts across the globe. eBird is a great example of a citizen science project.
Accessing eBird data
eBird data is not only made available to scientists. Anyone can run simple queries against the latest data through their public web API (as we will see in this post), or create an eBird account to download the bulk data from their website.
A majority of the endpoints on the public eBird API require an API key, which can be obtained by requesting one here. My request was granted in under an hour.
Getting started with ebird-cli
ebird-cli
is essentially a direct command-line interface to the eBird API. It
can query every endpoint of the eBird API, and supports every query parameter
for each endpoint. In this section, I’ll explain how to install ebird-cli
and
use it to retrieve data from the eBird API.
Installation
I have not yet gone through the trouble of properly packaging and distributing
ebird-cli
2 so for now it must be installed with
cabal
, which
itself can be installed using ghcup
.
Once you have the prerequisites installed, you can install ebird-cli
with the
following command (the first is only necessary if you have a new cabal
installation or your package index is very out of date):
cabal update cabal install ebird-cli
By default, the executable will be placed in
$HOME/.cabal/bin
, so make sure that directory is on your$PATH
. For more information on usingcabal
in this manner, see the user’s guide.
Usage
To ensure your installation is working as expected and begin familiarizing
yourself with the tool, try the --help
flag:
ebird-cli --help
At the time of writing, this command yields output that looks something like:
ebird-cli - Go birding on your command line!
Usage: ebird-cli [-k|--api-key API_KEY] COMMAND
Query the official eBird API
Available options:
-k,--api-key API_KEY Specify an eBird API key
-h,--help Show this help text
Observation commands:
observations Get recent observations within a region
...
Product commands:
recent-checklists Get recent checklists within a region
...
Hotspot commands:
region-hotspots Get a list of hotspots in one or more regions
...
Taxonomy commands:
taxonomy Get any version of the eBird taxonomy
...
Region commands:
region-info Get information about a region ...
As the output above suggests, there are five sections of subcommands. Each
section roughly corresponds to a section of the eBird API as described by their
documentation. Additionally, each subcommand has its own
--help
flag that outputs specific usage information for that command. For
example, ebird-cli observations --help
yields output like the following:
Usage: ebird-cli observations --region REGION_CODE [--back N]
[--taxonomy-categories CATEGORIES]
[--only-hotspots] [--include-provisional]
[--max-results N] [--extra-regions REGION_CODE]
[--spp-locale LOCALE]
Get recent observations within a region
Available options:
--region REGION_CODE Specify the regions to fetch observations from (e.g.
"US-WY,US-CO,US-ID" or "US-CA-037")
--back N Only fetch observations submitted within the last N
days (1 - 30, default: 14)
--taxonomy-categories CATEGORIES
Specify a list of one or more taxonomy categories to
include observations of (e.g. "issf" or "hybrid")
(default: all categories)
--only-hotspots Only include observations from hotspots
--include-provisional Include observations which have not yet been reviewed
--max-results N Specify the max number of observations to include (1
to 10000, default: all)
--extra-regions REGION_CODE
Up to 10 extra regions to fetch observations from
--spp-locale LOCALE Specify a locale to use for common names -h,--help Show this help text
A simple example
Let’s use this information to come up with an ebird-cli
invocation that will
output the 5 most recent observations available from Deschutes County, Oregon.
To do this, we need to determine the eBird “region code” corresponding to
Deschutes County in Oregon.
Region codes
Region codes are custom values used by eBird to identify geographic regions of varying specificity. The region code corresponding to the entire world is the value “world”. The world region is segmented into countries, for example “US” for the United States or “NL” for The Netherlands. Two region codes separated by a comma forms a new region code that identifies the combination of their geographic regions. For example, “US,NL” identifies a geographic region that includes the United States and The Netherlands.
Countries are further segmented into states. For example, Oregon’s region code is US-OR; and Alberta, Canada’s CA-AB. Finally, states are segmented into counties. For example, Albany County, Wyoming’s region code is US-WY-0013.
Sometimes we need to specify whether we expect the eBird API to give us country, state, or county regions. To do this, we must specify the region type we are interested in. Country regions are simply referred to by the “country” region type, state regions are referred to by the “subnational1” region type, and county regions are referred to by the “subnational2” region type.
Given the above, we know that the region code of Deschutes County, Oregon should
be a subnational2 region code that looks something like US-OR-XYZ where XYZ is
the county number of Deschutes County. Since it’s not obvious what the county
number of Deschutes County is, we can use the subregions
command to list all
the subnational2 subregions of US-OR, i.e. all the counties in Oregon:
ebird-cli subregions -k YOUR_API_KEY_HERE --region US-OR --region-type subnational2
To avoid repeatedly pasting your API key for the
-k
flag, write your API key in a file located at~/.ebird/key.txt
. The key will be automatically read from this file byebird-cli
for any command that requires it.
The output of the above command includes the following entry:
{
"code": "US-OR-017",
"name": "Deschutes"
}
Knowing that the region code for Deschutes County in Oregon is US-OR-017, we can build the command that will get us the 5 most recent observations in that region:
ebird-cli observations --region US-OR-017 --max-results 5
At the time of writing, this command yields the following output:
[
{
"comName": "Ring-necked Duck",
"howMany": 2,
"lat": 44.069084,
"lng": -121.163923,
"locId": "L26617295",
"locName": "Stenkamp",
"locationPrivate": true,
"obsDt": "2023-11-19 08:06",
"obsReviewed": false,
"obsValid": true,
"sciName": "Aythya collaris",
"speciesCode": "rinduc",
"subId": "S154767518"
},
{
"comName": "Bufflehead",
"howMany": 1,
"lat": 44.069084,
"lng": -121.163923,
"locId": "L26617295",
"locName": "Stenkamp",
"locationPrivate": true,
"obsDt": "2023-11-19 08:06",
"obsReviewed": false,
"obsValid": true,
"sciName": "Bucephala albeola",
"speciesCode": "buffle",
"subId": "S154767518"
},
{
"comName": "California Scrub-Jay",
"howMany": 2,
"lat": 44.069084,
"lng": -121.163923,
"locId": "L26617295",
"locName": "Stenkamp",
"locationPrivate": true,
"obsDt": "2023-11-19 08:06",
"obsReviewed": false,
"obsValid": true,
"sciName": "Aphelocoma californica",
"speciesCode": "cowscj1",
"subId": "S154767518"
},
{
"comName": "American Robin",
"howMany": 4,
"lat": 44.069084,
"lng": -121.163923,
"locId": "L26617295",
"locName": "Stenkamp",
"locationPrivate": true,
"obsDt": "2023-11-19 08:06",
"obsReviewed": false,
"obsValid": true,
"sciName": "Turdus migratorius",
"speciesCode": "amerob",
"subId": "S154767518"
},
{
"comName": "Townsend's Solitaire",
"howMany": 1,
"lat": 44.069084,
"lng": -121.163923,
"locId": "L26617295",
"locName": "Stenkamp",
"locationPrivate": true,
"obsDt": "2023-11-19 08:06",
"obsReviewed": false,
"obsValid": true,
"sciName": "Myadestes townsendi",
"speciesCode": "towsol",
"subId": "S154767518"
}
]
All of these observations happen to have been submitted on the same checklist,
which is why their locations and times are all equivalent. While many of these
fields are self-explanatory, the documentation for the
Observation
type in
ebird-api
may make it more clear what some of these fields are.
Footnotes
Over 100 million observations are contributed to eBird annually. At the time of writing, the complete eBird data set is over 137GiB in size.↩︎
If you would appreciate a distribution of
ebird-cli
on your preferred package manager, please open an issue so I can be aware of the demand.↩︎County region codes appear to be numbered in alphabetical order of the counties within the state. However, I haven’t found anything in the eBird documentation that confirms this convention.↩︎