GEDI - The Force Awakens in Open Science#

In the realm of open science, contributions from the Global Ecosystem Dynamics Investigation (GEDI) are significant, supporting a wide range of applications, from improving weather forecasting and forest management to monitoring glaciers and snowpacks. By providing detailed 3D structural information, GEDI helps scientists address key challenges in carbon cycling and biodiversity. The mission’s data is also invaluable for generating more accurate digital elevation models, which are essential for various environmental and scientific studies. Overall, the derived data products in total are ~300TB in size

The GEDI mission is a pioneering project by NASA that uses advanced laser technology to map the 3D structure of Earth’s forests. Launched in 2018 and mounted on the International Space Station (ISS), GEDI’s instrument faced earth and powered on its advanced lidar system. Firing laser pules at the Earth’s surface, GEDI captured and collected the returning lidar waveforms, recording the amount of laser energy reflected by different surfaces, thus allowing scientists to measure forest canopy height, surface elevation and quantify the overall canopy density.

To provide a sense of how much data GEDI collected during it’s 24 month mission, each laser on the lidar system fired off 242 times each second, bouncing off the surface with a 25-meter diameter footprint. Around 16 billion laser pulses per year were continuously collected and an estimated 10 billion cloud-free observations were produced.

Existing pan-tropical biomass maps use laser data acquired nearly 15 years ago and were based on less than 5 million laser observations in total. GEDI collects 6 million laser observations every day. So over the tropics, we’ve already collected about two orders of magnitude more data than what was ‘state-of-the-art’ before.

– Ralph Dubayah, GEDI Principal Investigator and professor of geographical sciences at the UMD.

GEDI in the Decentralized Web#

Building on GEDI’s significant contributions to open science, our team has been onboarding this extensive dataset onto the InterPlanetary File System (IPFS) and the Filecoin network. IPFS is a peer-to-peer network that allows for the decentralized storage and sharing of data, making it more resilient and accessible. The Filecoin network, on the other hand, is a decentralized storage network that incentivizes users to provide storage space, ensuring data is stored reliably and securely. These technologies are crucial for open science as they promote data accessibility, transparency, and collaboration. By leveraging these DWeb systems, we aim to enhance the accessibility and resilience of GEDI’s data, ensuring collaboration across the scientific community is open and accessible.

The remainder of this post we will guide you through the process of retrieving GEDI data stored on IPFS via our python library, ipfs-stac. We’ll cover the steps needed to retrieve the data from IPFS, tools and techniques leveraging the STAC spec, and some practical examples for exploration and visualization.

Let’s dive in!

Preqrequisites#

Install the IPFS desktop app or Kubo CLI client as this will will allow you to start up a IPFS local node on your machine.
Python that’s version 3.10.x or higher as to install our dependencies such as ipfs-stac and Jupyter Notebook.

To get started, save the libraries we’ll be using as a text file named gedi-blog-requirements.txt. I also recommend creating a virtual environment by running the following commands in the terminal.

python -m venv .venv
source .venv/bin/activate
pip install -r gedi-blog-requirements.txt

Next, start up Jupyter Notebook session by with the following command.

jupyter notebook

and create a new notebook by clicking on the New button and selecting Python 3.

Preparing our Notebook#

Let’s start by importing the necessary libraries and methods we’ll be using in this notebook.

[1]:

from typing import Dict
import h5py
from io import BytesIO
import geopandas as gpd
from ipfs_stac import client
import json
import folium
from folium.plugins import MiniMap
from pystac import Item

import pandas as pd
import tabulate
from IPython.display import HTML, display
from datetime import datetime
import numpy as np

import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)

### CONSTANTS ###
# Example usage:
SELECTED_VARIABLES = {
    "lat_lowestmode": [],
    "lon_lowestmode": [],
    "delta_time": [],
    "sensitivity": [],
    "surface_flag": [],
    "agbd": [],
    "predict_stratum": [],
}


### FUNCTIONS ###

# Function to check if a layer is already added to the map
def is_layer_added(map_obj: folium.Map, layer_name: str) -> bool:
    """
    Function to check if a layer is already added to the map

    Args:
        map_obj (folium.Map): Map object containing a set of layers
        layer_name (str): Name of the layer to identify if it is already added to the map

    Returns:
        bool: True if the layer is already added to the map, False otherwise
    """
    found = False
    for feat, vals in map_obj._children.items():
        if "geo_json_" in feat and vals.layer_name == layer_name:
            print(f"{layer_name} has already been added to the map")
            found = True
            break
    return found


def copy_map(map_obj: folium.Map) -> folium.Map:
    """
    Create a copy of a map object without the MiniMap and LayerControl plugins

    Args:
        map_obj (folium.Map): Map object to be copied

    Returns:
        folium.Map: Copy of the map object without the MiniMap and LayerControl plugins
    """
    new_map = folium.Map(location=map_obj.location)
    new_map.options = map_obj.options
    for feat, vals in map_obj._children.items():
        if isinstance(vals, folium.map.LayerControl):
            continue
        if isinstance(vals, MiniMap):
            continue
        else:
            new_map.add_child(vals)
    return new_map


def display_table(data: list, headers: list) -> None:
    """
    Display a table in HTML format

    Args:
        data (list): List of data to be displayed in the table
        headers (list): List of headers for the table
    """
    print("\n")
    print("Table 1. Attributes and discription from `METADATA` group")
    headers = ["attribute", "description"]
    display(HTML(tabulate.tabulate(data, headers, tablefmt="html")))


def print_statistics(df: pd.DataFrame) -> None:
    """
    Prints the start time, end time, total period, and total shots from the DataFrame.

    Parameters:
    df (pd.DataFrame): The DataFrame containing the delta_time column.
    """
    # The reference datetime to calculate the min and max datetime.
    reference_datetime = datetime(2018, 1, 1, 0, 0, 0)

    # Calculate min, max, and range datetime
    min_dt = reference_datetime + pd.to_timedelta(df.delta_time.min(), unit="s")
    max_dt = reference_datetime + pd.to_timedelta(df.delta_time.max(), unit="s")
    range_dt = max_dt - min_dt

    # Find the total distance between the first and last observation
    first_p = df.iloc[0].to_dict()
    last_p = df.iloc[-1].to_dict()
    tmp_line = gpd.GeoSeries(
        [first_p["geometry"], last_p["geometry"]], crs="epsg:4326"
    ).to_crs(epsg=3857)
    distance_traveled = tmp_line.distance(tmp_line.shift(1))[1]
    # Convert distance traveled to miles
    in_miles = distance_traveled / 1609.34

    # Prepare the print string
    print_s = f"""Time of the first observation: {min_dt}
Time of the last observation: {max_dt}
Total period (elapsed time): {range_dt:}
Total number of Observations: {len(df.index):,}
The beam traveled a total of {distance_traveled:,.0f} meters ({in_miles:,.2f} miles)"""

    # Print the statistics
    print(print_s)


def extract_gedi_data(granule_item: Item, selected_beam: str, selected_variables: Dict[str, list] = SELECTED_VARIABLES) -> pd.DataFrame:
    """
    Extracts data from a granule data for a selected beam group and a set of selected variables.

    Args:
        granule_item (_type_): Granule item from the STAC catalog
        selected_beam (str): The name of the beam group to be extracted from the granule data
        selected_variables (Dict[str, list]): A dictionary containing the selected variables to be extracted from the beam group

    Returns:
        pd.DataFrame: DataFrame containing the extracted data
    """
    print(f"Reading data from {selected_beam}")
    with h5py.File(BytesIO(granule_item.data), "r") as hf:
        beam = hf.get(selected_beam)

        # Filter out keys where the corresponding value is an instance of h5py.Group
        filtered_keys = [
            key for key in beam.keys() if not isinstance(beam[key], h5py.Group)
        ]

        print(
            f"Feel free to explore by modifying the `SELECTED_VARIABLES` dictionary above by adding any of the following:\n    - {(chr(10)+'    -').join(filtered_keys)}"
        )
        print(
            "\nNOTE: The remainder of this notebook relies on the variables `lat_lowestmode`, `lon_lowestmode`, and `delta_time`. They should always be included. \n"
        )

        # Loop over the variables and extract data
        for var in selected_variables.keys():
            data = beam.get(var)[:]

            # Accounting for the predict_stratum variable which needs to be recast as a string
            if var == "predict_stratum":
                data = data.astype(str)
            selected_variables[var].extend(data.tolist())

        # Add beam number as a new column
        n = len(selected_variables["lat_lowestmode"])  # number of shots in the beam group
        selected_variables["beam"] = np.repeat(str(selected_beam), n).tolist()

    # Create DataFrame from the variables dictionary
    l4a_beam_df = pd.DataFrame(selected_variables)

    # Only convert columns which are of type `object` to string.
    for col in l4a_beam_df.columns:
        if l4a_beam_df[col].dtype == "object":
            l4a_beam_df[col] = l4a_beam_df[col].astype(str)

    return l4a_beam_df

Searching a STAC Catalog#

The ipfs-stac library leverages the Easier STAC API for content discoverablity and IPFS for content retrieval.

The Easier STAC API, built on the STAC spec, is our resource for organizing geospatial metadata. It allows us to search for geospatial content and utilize the underlying metadata to understand where content can be retrieved from, or in our case what content we want. 😉

Content in IPFS and the Filecoin network are identified not by where they are located, but by the content itself through content addressing via a content identifier (CID). A CID is a unique identifier based on the content itself, allowing us to retrieve the content from IPFS. If you want to learn more, We go into detail in a previous post. For clarity, retrieving content from IPFS is synonymous to downloading content from a centralized source.

Creating our client object#

We’ll be exploring the GEDI L4A collection, a data product that’s been processed and converted to footprint estimates of above ground biomass density. As mentioned before, we’ll need the CIDs for the data that we’re interested in.

To do this, we’ll first create a client object that will allow us to interact with the STAC API.

[2]:

easier_client = client.Web3(
    local_gateway="localhost",
    gateway_port="8080",
    api_port="5001",
    stac_endpoint="https://stac.easierdata.info",
)

1. STAC Collections#

The easier_client object is our main interface to the Easier STAC API and IPFS. Let’s start by seeing what collections are available at the endpoint that was passed into the stac_endpoint property.

[3]:

easier_client.collections

[3]:

['landsat-c2l1', 'GEDI_L4A_AGB_Density_V2_1_2056.v2.1']

The collections property returns a list of the collection id’s available to us. A collection id is a unique identifier that so that a collection can be distinctly identified and referenced.

2. STAC Items#

Just like the Landsat 9 imagery, we’ve also configured and prepared the STAC metadata for a sample subset of GEDI data, or what they are commonly referred to as granules.

You can think of a granule as a manageable unit of data representing specific geographic areas or time periods. These granules contain detailed 3D structural information about Earth’s surface and vegetation like canopy height and density.

Let’s see how many granules are in the collection id GEDI_L4A_AGB_Density_V2_1_2056.v2.1.

[4]:

collection_id = "GEDI_L4A_AGB_Density_V2_1_2056.v2.1"
items = easier_client.searchSTAC(collections=[collection_id])
print(f"The {collection_id} collection has {len(items)} items")

The GEDI_L4A_AGB_Density_V2_1_2056.v2.1 collection has 956 items

Use Case: Monitoring Forest Fragmentation#

The Harvard Forest is a long-term ecological research (LTER) site located in Petersham, Massachusetts. It is one of the most studied forests in North America, providing valuable insights into forest dynamics and ecosystem processes.

How can GEDI be used to study Harvard Forest ecosystems? Monitoring the impact of forest fragmentation and edge effects on forest ecosystems is one application where GEDI could enhance on-going research.

Part 1 - Initial data gathering stage#

1. Fetching the Harvard Forest boundary from IPFS#

Let’s set the stage for one of the many tasks, data gathering, and how decentralized technology like IPFS can be leveraged to support open science research.

A colleague of mine just uploaded the geojson of the Harvard Forest boundary to IPFS and shared the CID bafkreib5tmwa7qb2qnm2zqgklsnesjlt4w7uxwbqvbqz7se54t7kxceuu4 with me.

With the easier_client object, we can use the getFromCID method to fetch content from IPFS and save it to the variable geojson_cid.

[5]:

geojson_cid = "bafkreib5tmwa7qb2qnm2zqgklsnesjlt4w7uxwbqvbqz7se54t7kxceuu4"
geojson_result = easier_client.getFromCID(geojson_cid)
print(
    f"To view the geojson, visit the following link: http://{geojson_cid}.ipfs.localhost:{easier_client.gateway_port}"
)

✅  Fetching bafkreib5tmwa7qb2qnm2zqgklsnesjlt4w7uxwbqvbqz7se54t7kxceuu4 - 0.01/0.01 MB
To view the geojson, visit the following link: http://bafkreib5tmwa7qb2qnm2zqgklsnesjlt4w7uxwbqvbqz7se54t7kxceuu4.ipfs.localhost:8081

2. Visualizing the Harvard Forest boundary#

Next, we’ll convert geojson_cid into a proper json object so we can plot it on a Folium map.

[6]:

geojson = json.loads(geojson_result)

# Add the first layer manually using GeoJson
harvard_forests = folium.GeoJson(
    data=geojson,
    name="Harvard Forest Boundary",
    style_function=lambda feature: {
        "fillColor": "lightgreen",
        "color": "black",
        "weight": 5,
        "fillOpacity": 0.5,
        "dashArray": "10, 10",
    },
)

Create our map object, set the properties, add the layer and finally view the map!

[7]:

# Fit the map to the bounds of the GeoJSON layer
geojson_gpd = gpd.GeoDataFrame.from_features(geojson["features"], crs="epsg:4326")
bounds = geojson_gpd.total_bounds
center = geojson_gpd.representative_point()

# Create the map object and center on the extent of the GeoJSON layer
m = folium.Map([center.y, center.x], scrollWheelZoom=False, zoom_start=12)

# m.fit_bounds([[bounds[1], bounds[0]], [bounds[3], bounds[2]]])
m.control_scale = True

# Add the layer to the map and display it
harvard_forests.add_to(m)
folium.LayerControl(collapsed=False).add_to(m)
m

[7]:

Make this Notebook Trusted to load map: File -> Trust Notebook

3. Using STAC to query geospatial metadata#

Next, we need to query the GEDI granules that intersect with the Harvard Forest boundary.

Using the searchSTAC method, we can pass in the geojson object to perform a spatial intersection on the collection id of interest, GEDI_L4A_AGB_Density_V2_1_2056.v2.1.

[8]:

# Query content from STAC server and process it
granules = easier_client.searchSTAC(
    intersects=geojson["features"][0]["geometry"], collections=[collection_id]
)
print(
    f"Of the {len(items)} granules on {easier_client.stac_endpoint}, {len(granules)} were identified intersecting the geojson\n"
)

Of the 956 granules on https://stac.easierdata.info, 29 were identified intersecting the geojson

For now, let’s create a layer from our resulting STAC search. We will be referencing it later on.

[9]:

# Convert the granules to a GeoJSON object
granules_dict = [granule.to_dict() for granule in granules]
granules_geojson = {"type": "FeatureCollection", "features": granules_dict}

# Add as a layer to the map
granules_layer = folium.GeoJson(
    data=granules_geojson,
    name="Granule Search Results",
    style_function=lambda feature: {
        "fillColor": "blue",  # Orange fill color
        "color": "black",  # Orange border color
        "weight": 5,  # Border width
        "opacity": 0.1,  # Border opacity
        "fillOpacity": 0.01,  # Fill opacity
    },
    tooltip=folium.GeoJsonTooltip(fields=["datetime"], aliases=["Date"]),
    zoom_start=1,
)

Part 2 - Downloading content from IPFS#

Now that we’ve identified a subset of granules for our area of interest, let’s fetch one from IPFS and take a look at the data.

1. Fetching a granule from IPFS#

We’ll start by grabbing the CID from the first granule in the search results and then announce a request for the content to IPFS.

[ ]:

# granule_of_interest = easier_client.getFromCID(granules[0].assets["data"].cid)
granule_assets = easier_client.getAssetNames(granules[0])
print(f"{granules[0].id} contains the following assets:\n{[g for g in granule_assets]}")

GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2023031173011_O23431_02_T07911_02_003_01_V002.h5 contains the following assets:
['SHA-256 file', 'gov/protected/gedi/GEDI_L4A_AGB_Density_V2_1/data/GEDI04_A_2023031173011_O23431_02_T07911_02_003_01_V002', 'metadata', 'opendap']

The CID for the granule can be found in the last asset listed so let’s grab that and fetch the granule from IPFS.

NOTE: The granule is ~275 mb so it may take some time, depending on your connection and the number of peers online that contain the data.

[ ]:

granule_item = easier_client.getAssetFromItem(
    item=granules[0], asset_name=granule_assets[1], fetch_data=True
)

✅  Fetching bafybeihq35zikextntdqdguwtfdtcrwqjzrfs2zwuzwmemcefv44xpig7e - 275.10/275.10 MB

2. Inspecting the fetched content#

When content is fetched from IPFS, it’s stored in the local node on your machine. The next time we request this particular CID, it’ll be retrieved directly from our machine instead of requesting the content from IPFS network.

The variable granule_item is an Asset object in ipfs-stac. The Asset object contains contains a few properties that describe it, one of which is the data property that contains the content itself, in byte form.

Let’s explore the content in more detail.

[ ]:

print(f"This content is of the type, `{type(granule_item.data)}`\n")
print(
    f"Content with the cid, `{granule_item.cid}` has a file signature of `{granule_item.data[:5]}`\n"
)

This content is of the type, `<class 'bytes'>`

Content with the cid, `bafybeihq35zikextntdqdguwtfdtcrwqjzrfs2zwuzwmemcefv44xpig7e` has a file signature of `b'\x89HDF\r'`

TIP 👇#

File signatures, often referred to as magic numbers, is one way for identifying file types. We can take a look at the file header by reading the first few bytes of a file and reveal a signature that can be helpful in identifying or determining the format and type of the content.

We can see, the granule file we fetched from IPFS is of the type HDF5. With that confirmed, Let’s take a look at the metadata to see what kind of information is stored in the file by using the h5py library to interact with the HDF5 file and read the metadata group.

[ ]:

subset_df = pd.DataFrame()
with h5py.File(BytesIO(granule_item.data), "r") as hf:
    # read the METADATA group
    metadata = hf["METADATA/DatasetIdentification"]
    # store attributes and descriptions in an arra
    data = []
    for attr in metadata.attrs.keys():
        data.append([attr, metadata.attrs[attr]])
    display_table(data, ["Attribute", "Description"])



Table 1. Attributes and discription from `METADATA` group

attribute	description
PGEVersion	003
VersionID	01
abstract	The GEDI L4A standard data product contains predictions of aboveground biomass density within each laser footprint.
characterSet	utf8
creationDate	2023-04-24T13:21:42.580653Z
credit	The software that generates the L4A product was implemented at the Department of Geographical Sciences at the University of Maryland (UMD), in collaboration with the GEDI Science Data Processing System at the NASA Goddard Space Flight Center (GSFC) in Greenbelt, Maryland and the Institute at Brown for Environment and Society at Brown University.
fileName	GEDI04_A_2023031173011_O23431_02_T07911_02_003_01_V002.h5
gedi_l4a_githash	8145a32755ee6f4f58a2315ee244450f846d33d9
language	eng
originatorOrganizationName	GSFC GEDI-SDPS > GEDI Science Data Processing System and University of Maryland
purpose	The purpose of the L4A dataset is to provide an estimate of aboveground biomass density from each GEDI waveform.
shortName	GEDI_L4A
spatialRepresentationType	along-track
status	onGoing
topicCategory	geoscientificInformation
uuid	f75dec80-5f77-45f0-8de1-2e6279f7a774

3. Exploring the granule metadata#

Onboard GEDI are three lasers producing a total of 8 beams, each instantaneously sampling a 25 meter footprint at the surface of the Earth. Let’s plot the data for one of the beams, BEAM0101, to get a sense of the spatial distribution.

We’ll start by referencing the group for BEAM0101 and grabbing observation data for a few of the variables.

[ ]:

selected_beam = "BEAM0101"

l4a_beam_df = extract_gedi_data(granule_item, selected_beam)
print(l4a_beam_df.head())

Reading data from BEAM0101
Feel free to explore by modifying the `SELECTED_VARIABLES` dictionary above by adding any of the following:
    - agbd
    -agbd_pi_lower
    -agbd_pi_upper
    -agbd_se
    -agbd_t
    -agbd_t_se
    -algorithm_run_flag
    -beam
    -channel
    -degrade_flag
    -delta_time
    -elev_lowestmode
    -l2_quality_flag
    -l4_quality_flag
    -lat_lowestmode
    -lon_lowestmode
    -master_frac
    -master_int
    -predict_stratum
    -predictor_limit_flag
    -response_limit_flag
    -selected_algorithm
    -selected_mode
    -selected_mode_flag
    -sensitivity
    -shot_number
    -solar_elevation
    -surface_flag
    -xvar

NOTE: The remainder of this notebook relies on the variables `lat_lowestmode`, `lon_lowestmode`, and `delta_time`. They should always be included.

   lat_lowestmode  lon_lowestmode    delta_time  sensitivity  surface_flag  \
0       -0.005583     -114.766895  1.604228e+08     0.878229             1
1       -0.005152     -114.766608  1.604228e+08     0.849464             1
2       -0.004721     -114.766322  1.604228e+08     0.859624             1
3       -0.004290     -114.766036  1.604228e+08     0.858320             1
4       -0.003859     -114.765750  1.604228e+08     0.880380             1

     agbd predict_stratum      beam
0 -9999.0                  BEAM0101
1 -9999.0                  BEAM0101
2 -9999.0                  BEAM0101
3 -9999.0                  BEAM0101
4 -9999.0                  BEAM0101

Part 3. Data wrangling and visualization#

Now that we have selected the data and organized into a data structure we can easily manipulate, we can start to prepare our final map containing the following content:

The path of BEAM0101.
The total footprint of all 8 beams as to get a sense of the granule across-track width.
Our area of interest, the Harvard Forest boundary.
Observations from BEAM0101 that intersect with the Harvard Forest boundary.

1. Creating our feature layers#

In order to plot the path of BEAM0101 as a line, we’ll need to convert the observations which are represented as points. There are various ways to approach this using libraries such as shapely , GDAL, Cartopy, Geemap and geopandas. For our approach, we’ll mainly be using geopandas due to it’s flexible data structure manipulation but mostly because we’re already starting with a pandas dataframe.

Once we have our geodataframe, we can convert that into a Geojson LineString by converting the lon_lowestmode and lat_lowestmode columns into a list of coordinate pairs, and adding it to the coordinates property of the Geojson object. The last thing is to create our Folium layer by passing in the Geojson data.

[15]:

# Create a geodataframe from the beam observations
beam_observations = gpd.GeoDataFrame(
    data=l4a_beam_df,
    geometry=gpd.points_from_xy(l4a_beam_df.lon_lowestmode, l4a_beam_df.lat_lowestmode),
    crs="epsg:4326",
)

# Create a geojson line feature for the beam observations
geojson_single_beam = {
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "id": "GEDI L4A - BEAM0110",
            "properties": {"id": "BEAM0110"},
            "geometry": {
                "type": "LineString",
                "coordinates": beam_observations[
                    ["lon_lowestmode", "lat_lowestmode"]
                ].values.tolist(),
            },
        }
    ],
}

beam_0110_layer = folium.GeoJson(
    data=geojson_single_beam,
    name="BEAM0110",
    style_function=lambda feature: {
        "color": "indigo",  # Orange border color
        "weight": 5,  # Border width
        "opacity": 0.5,  # Border opacity
    },
    tooltip=folium.GeoJsonTooltip(fields=["id"], aliases=["Beam"]),
)

Last but not least, we’ll create our footprint layer using the granule_layer that we created earlier from our STAC search results. Since we are only interested in the first result, we’ll grab the geometry from the data property to create the layer.

[16]:

# create layer of the granule extent
granule_extent = folium.GeoJson(
    data=granules_layer.data["features"][0],
    name="Granule Extent",
    style_function=lambda feature: {
        "fillColor": "blue",  # Orange fill color
        "color": "black",  # Orange border color
        "weight": 5,  # Border width
        "opacity": 0.5,  # Border opacity
        "fillOpacity": 0.1,  # Fill opacity
    },
)

2. Selecting observations near the Harvard Forest boundary#

Next, let’s identify the observations that are within 5000 meters of Harvard Forest and plot them on our map.

We can easily identify the observations by creating a buffer around the boundary and checking if the observations are within that buffer. Any observations that are within the buffer will be added to a new geodataframe that we’ll use to plot on the map.

[17]:

# Create a buffer around the center point of the harvard forest boundary
# harvard_forests_buffer = gpd.GeoSeries(center).to_crs(epsg=3857).buffer(5000)

# Create a buffer around the harvard forest boundary
# First convert the center point to a GeoSeries and reproject to a projected coordinate system (e.g., EPSG:3857)
buff_distance = 5000  # meters
harvard_forests_buffer = geojson_gpd.to_crs(epsg=3857).buffer(buff_distance)


# Reproject to a projected coordinate system (e.g., EPSG:3857)
# Select the beam observations that intersect with the buffered boundary
beam_observations_proj = beam_observations.to_crs(epsg=3857)
selected_beam_observations = beam_observations_proj[
    beam_observations_proj.intersects(harvard_forests_buffer.geometry.values[0])
]

# Convert the results back to geographic system (EPSG:4326) for plotting with Folium
harvard_forests_buffer = harvard_forests_buffer.to_crs("epsg:4326")
selected_beam_observations = selected_beam_observations.to_crs("epsg:4326")

# Create buffer layer
buffer_layer = folium.GeoJson(
    data=harvard_forests_buffer,
    name=f"{buff_distance}m buffer",
    style_function=lambda feature: {
        "color": "red",  # border color
        "weight": 5,  # Border width
        "opacity": 0.3,  # Border opacity
        "fillOpacity": 0.09,  # Fill opacity
    },
)

# Create layer of the selected beam observations
selected_observations_layer = folium.GeoJson(
    data=selected_beam_observations,
    name="Selected Observations",
)

3. Putting it all together#

All that’s left is to add the layers to the map and display it!

[18]:

# New map to display just this granule
m2 = folium.Map([center.y, center.x], scrollWheelZoom=False, zoom_start=13)
m2.control_scale = True

# Add layers to the map
harvard_forests.add_to(m2)
granule_extent.add_to(m2)
beam_0110_layer.add_to(m2)
selected_observations_layer.add_to(m2)
buffer_layer.add_to(m2)

folium.LayerControl(collapsed=False).add_to(m2)
m2 = copy_map(m2)

# Add a inset (inset map)
minimap = MiniMap(toggle_display=True)
m2.add_child(minimap)
folium.LayerControl(collapsed=False).add_to(m2)
m2

[18]:

Make this Notebook Trusted to load map: File -> Trust Notebook

With our final map, we can see the path of BEAM0101, the total footprint of all 8 beams within the granule track, the Harvard Forest boundary, and the observations that are within 5000 meters of the boundary. This visualization helps provide a sense of the spatial distribution but also the density of the observations near the boundary.

Though, keep in mind that we’re only looking at a subset of the data. Zoom out to really get a sense of the number of observations that were collected from this single granule!

To put this into perspective, run the following cell to view some statistics pertaining to the to this single granule that we’ve been working with:

[19]:

print("Statistics for the entire beam:\n")
print_statistics(beam_observations)


print("\nStatistics for observations near Harvard Forest:\n")
print_statistics(selected_beam_observations)

Statistics for the entire beam:

Time of the first observation: 2023-01-31 17:53:04.859189
Time of the last observation: 2023-01-31 18:16:14.850337
Total period (elapsed time): 0:23:09.991148
Total number of Observations: 167,231
The beam traveled a total of 11,535,979 meters (7,168.14 miles)

Statistics for observations near Harvard Forest:

Time of the first observation: 2023-01-31 18:08:22.387697
Time of the last observation: 2023-01-31 18:08:23.643886
Total period (elapsed time): 0:00:01.256189
Total number of Observations: 151
The beam traveled a total of 11,730 meters (7.29 miles)

Wrapping up#

In this Post, we’ve explored how we can use the ipfs-stac library to query GEDI metadata from the Easier STAC Catalog and acquiring the CID of a granule. With the CID, we were able to not only fetch the granule from IPFS but also explore the dataset in more detail and visualizing the data on a map.

An area our team has been discussing is the potential of extracting a subset of data from an HDF5 file, thus reducing the amount of data that needs to be transferred by only fetching the data that you need.

Bonus Content#

Want to learn more about the GEDI mission and dive deeper into the various use cases and applications of the data? Check out our take on the GEDI Tutorials that was created by the ORNL DAAC team. We’ve revised the tutorials to utilize the ipfs-stac library to access and fetch the GEDI data from IPFS.

Tracking with our vision of the DWeb and data democratization in the open science community, as more users check out out the tutorials and access the content, the easier it will be for others to access the data as well!

GEDI - The Force Awakens in Open Science#

GEDI in the Decentralized Web#

Preqrequisites#

Preparing our Notebook#

Searching a STAC Catalog#

Creating our client object#

1. STAC Collections#

2. STAC Items#

Use Case: Monitoring Forest Fragmentation#

Part 1 - Initial data gathering stage#

1. Fetching the Harvard Forest boundary from IPFS#

2. Visualizing the Harvard Forest boundary#

3. Using STAC to query geospatial metadata#

Part 2 - Downloading content from IPFS#

1. Fetching a granule from IPFS#

2. Inspecting the fetched content#

TIP 👇#

3. Exploring the granule metadata#

Part 3. Data wrangling and visualization#

1. Creating our feature layers#

2. Selecting observations near the Harvard Forest boundary#

3. Putting it all together#

4. Share your work on IPFS#

Wrapping up#

Bonus Content#