Preface#
One of the pivotal projects the EASIER Data Initiative has produced is ipfs-stac. The Python library is a testament to the feasibility of onboarding and interfacing geospatial data on IPFS. The library enables developers and researchers to leverage STAC APIs enriched with Filecoin and IPFS metadata to seamlessly fetch, pin, and explore data in a familiar manner. In an ambiguous ecosystem with everchanging advancements, updates, breaking changes, and new infrastructure features will emerge. The team has made it a responsibility to adhere to these changes, prompting our projects to remain flexible. This notebook will explore the many new features and changes to ipfs-stac in version 0.2.
Changes Summary#
When fetching content, the file size is now human-readable (progress is now tracked in Megabytes)
New search functionality via
searchSTAC
method added to theweb3
client class. Returns a collection of itemsA user can now pass in many of the query parameters options to search a STAC catalog
Added parameters for content uploads to ipfs:
By default, CIDv1 are created
Added option to select whether to pin content to your IPFS node
Added option to add mutable file system (MFS) reference to the content on upload
Added option to provide a filename to content that’s uploaded
If a user uploads a file, the filename is extracted. You can override by passing in a value to this parameter.
Optimized functions that start and stop ipfs daemon
Assets are no longer fetched by default
Added
getAssetNames
function to retrieve the asset names from a collection or itemNew
Web3
class property that automatically grabs all the collection id from the stac endpoint when instantiated.pinned_list
returns
Environment Setup#
1 - Install IPFS Kubo CLI (if you haven’t already). This will allow you to run an IPFS node on your local machine.
2 - Set up a Jupyter Notebook environment. A convenient method for achieving this is by utilizing the Jupyter integration in Visual Studio Code.
3 - Run pip install ipfs-stac
to install the latest version of the library.
[2]:
from ipfs_stac.client import Web3, Asset
### Initialize client
easier = Web3(local_gateway="localhost", gateway_port=8081, stac_endpoint="https://stac.easierdata.info")
Attributes added to the Web3 class#
A couple attributes have been added to the Web3 class which support a deeper understanding of its current configuration and high level exploration of the STAC endpoint:
Added
client
attribute - Instance of a Pystac catalog client to support a variety of additional API functionality.Added
collections
attribute - List of unique collection identifiers to enable the discovery of additional collection metadata.
Added methods to start and stop IPFS daemon#
Added
startDaemon
method toWeb3
class - Will attempt to start ipfs daemon on the device running the Python program. After the program has finished running, the daemon will be shut down. If the daemon is already running, it will not be tagged.Added
shutdown_process
method toWeb3
class - will shut down the ipfs daemon on the device running the Python program.
[3]:
easier.startDaemon()
print(f"Collections: {easier.collections} \n")
easier.client
Collections: ['landsat-c2l1', 'GEDI_L4A_AGB_Density_V2_1_2056.v2.1']
[3]:
- type "Catalog"
- id "stac-fastapi"
- stac_version "1.0.0"
- description "stac-fastapi"
links[] 11 items
0
- rel "self"
- href "https://stac.easierdata.info"
- type "application/json"
1
- rel "root"
- href "https://stac.easierdata.info/api/v1/pgstac/"
- type "application/json"
- title "stac-fastapi"
2
- rel "data"
- href "https://stac.easierdata.info/api/v1/pgstac/collections"
- type "application/json"
3
- rel "conformance"
- href "https://stac.easierdata.info/api/v1/pgstac/conformance"
- type "application/json"
- title "STAC/OGC conformance classes implemented by this server"
4
- rel "search"
- href "https://stac.easierdata.info/api/v1/pgstac/search"
- type "application/geo+json"
- title "STAC search"
- method "GET"
5
- rel "search"
- href "https://stac.easierdata.info/api/v1/pgstac/search"
- type "application/geo+json"
- title "STAC search"
- method "POST"
6
- rel "http://www.opengis.net/def/rel/ogc/1.0/queryables"
- href "https://stac.easierdata.info/api/v1/pgstac/queryables"
- type "application/schema+json"
- title "Queryables"
- method "GET"
7
- rel "child"
- href "https://stac.easierdata.info/api/v1/pgstac/collections/landsat-c2l1"
- type "application/json"
- title "Landsat Collection 2 Level-1 Product"
8
- rel "child"
- href "https://stac.easierdata.info/api/v1/pgstac/collections/GEDI_L4A_AGB_Density_V2_1_2056.v2.1"
- type "application/json"
- title "GEDI L4A Footprint Level Aboveground Biomass Density, Version 2.1"
9
- rel "service-desc"
- href "https://stac.easierdata.info/api/v1/pgstac/api"
- type "application/vnd.oai.openapi+json;version=3.0"
- title "OpenAPI service description"
10
- rel "service-doc"
- href "https://stac.easierdata.info/api/v1/pgstac/api.html"
- type "text/html"
- title "OpenAPI service documentation"
conformsTo[] 18 items
- 0 "https://api.stacspec.org/v1.0.0-rc.2/item-search#filter"
- 1 "https://api.stacspec.org/v1.0.0/collections/extensions/transaction"
- 2 "http://www.opengis.net/spec/cql2/1.0/conf/basic-cql2"
- 3 "http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/geojson"
- 4 "https://api.stacspec.org/v1.0.0/item-search#query"
- 5 "http://www.opengis.net/spec/ogcapi-features-3/1.0/conf/filter"
- 6 "http://www.opengis.net/spec/cql2/1.0/conf/cql2-text"
- 7 "http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/core"
- 8 "https://api.stacspec.org/v1.0.0/ogcapi-features"
- 9 "https://api.stacspec.org/v1.0.0/item-search#fields"
- 10 "https://api.stacspec.org/v1.0.0/ogcapi-features/extensions/transaction"
- 11 "https://api.stacspec.org/v1.0.0/core"
- 12 "https://api.stacspec.org/v1.0.0/item-search"
- 13 "http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/oas30"
- 14 "http://www.opengis.net/spec/cql2/1.0/conf/cql2-json"
- 15 "http://www.opengis.net/spec/ogcapi-features-3/1.0/conf/features-filter"
- 16 "https://api.stacspec.org/v1.0.0/collections"
- 17 "https://api.stacspec.org/v1.0.0/item-search#sort"
- title "stac-fastapi"
You can also retrieve a list of Collection objects through the new get_collections()
method.
Let’s explore STAC Collection for GEDI.
[5]:
# Grab the list index containing the GEDI data
gedi_index = easier.collections.index('GEDI_L4A_AGB_Density_V2_1_2056.v2.1')
# Grab the GEDI collection via the list index
easier.getCollections()[gedi_index]
[5]:
- type "Collection"
- id "GEDI_L4A_AGB_Density_V2_1_2056.v2.1"
- stac_version "1.0.0"
- description "This dataset contains Global Ecosystem Dynamics Investigation (GEDI) Level 4A (L4A) Version 2 predictions of the aboveground biomass density (AGBD; in Mg/ha) and estimates of the prediction standard error within each sampled geolocated laser footprint. In this version, the granules are in sub-orbits. The algorithm setting group selection used for GEDI02_A Version 2 has been modified for Evergreen Broadleaf Trees in South America to reduce false positive errors resulting from the selection of waveform modes above ground elevation as the lowest mode. The footprints are located within the global latitude band observed by the International Space Station (ISS), nominally 51.6 degrees N and S and reported for the period 2019-04-18 to 2023-03-16. The GEDI instrument consists of three lasers producing a total of eight beam ground transects, which instantaneously sample eight ~25 m footprints spaced approximately every 60 m along-track. The GEDI beam transects are spaced approximately 600 m apart on the Earth's surface in the cross-track direction, for an across-track width of ~4.2 km. Footprint AGBD was derived from parametric models that relate simulated GEDI Level 2A (L2A) waveform relative height (RH) metrics to field plot estimates of AGBD. Height metrics from simulated waveforms associated with field estimates of AGBD from multiple regions and plant functional types (PFTs) were compiled to generate a calibration dataset for models representing the combinations of world regions and PFTs (i.e., deciduous broadleaf trees, evergreen broadleaf trees, evergreen needleleaf trees, deciduous needleleaf trees, and the combination of grasslands, shrubs, and woodlands). For each of the eight beams, additional data are reported with the AGBD estimates, including the associated uncertainty metrics, quality flags, model inputs, and other information about the GEDI L2A waveform for this selected algorithm setting group. Model inputs include the scaled and transformed GEDI L2A RH metrics, footprint geolocation variables and land cover input data including PFTs and the world region identifiers. Additional model outputs include the AGBD predictions for each of the six GEDI L2A algorithm setting groups with AGBD in natural and transformed units and associated prediction uncertainty for each GEDI L2A algorithm setting group. Providing these ancillary data products will allow users to evaluate and select alternative algorithm setting groups. Also provided are outputs of parameters and variables from the L4A models used to generate AGBD predictions that are required as input to the GEDI04_B algorithm to generate 1-km gridded products."
links[] 7 items
0
- rel "items"
- href "https://stac.easierdata.info/api/v1/pgstac/collections/GEDI_L4A_AGB_Density_V2_1_2056.v2.1/items"
- type "application/geo+json"
1
- rel "parent"
- href "https://stac.easierdata.info/api/v1/pgstac/"
- type "application/json"
2
- rel "root"
- href "https://stac.easierdata.info"
- type "application/json"
- title "stac-fastapi"
3
- rel "self"
- href "https://stac.easierdata.info/api/v1/pgstac/collections/GEDI_L4A_AGB_Density_V2_1_2056.v2.1"
- type "application/json"
4
- rel "about"
- href "https://cmr.earthdata.nasa.gov/search/concepts/C2237824918-ORNL_CLOUD.html"
- type "text/html"
- title "HTML metadata for collection"
5
- rel "via"
- href "https://cmr.earthdata.nasa.gov/search/concepts/C2237824918-ORNL_CLOUD.json"
- type "application/json"
- title "CMR JSON metadata for collection"
6
- rel "http://www.opengis.net/def/rel/ogc/1.0/queryables"
- href "https://stac.easierdata.info/api/v1/pgstac/collections/GEDI_L4A_AGB_Density_V2_1_2056.v2.1/queryables"
- type "application/schema+json"
- title "Queryables"
- title "GEDI L4A Footprint Level Aboveground Biomass Density, Version 2.1"
extent
spatial
bbox[] 1 items
0[] 4 items
- 0 -180.0
- 1 -53.0
- 2 180.0
- 3 54.0
temporal
interval[] 1 items
0[] 2 items
- 0 "2019-04-17T23:00:00Z"
- 1 "2023-03-16T21:33:25Z"
- license "not-provided"
Enhancements to search#
The team has introduced methods and attributes to the Web3
class which support searching/exploring a STAC catalog that may not be entirely managed by the user. With the following additions, the user experience of being able to query and index unknown assets has been improved:
Added
searchSTAC
method to Web3 class - Searches through STAC catalog leveraging the pystac-client attribute, effectively allowing one to use the same exact parameters.Added
getAssetNames
method to Web3 class - List of asset names given a collection or item
searchSTAC#
With the searchSTAC
method, we can define our search parameters more effectively. Using the query extension, we can now define the desired logic and pass it to the query
parameter.
The following is an example of using the searchSTAC
method:
[8]:
# Selecting the index representing the landsat id
landsat_index = easier.collections.index('landsat-c2l1')
# Query parameters
query_params = {"eo:cloud_cover": {"gte": 0, "lte": 20}}
# Search an entire catalog
search_items = easier.searchSTAC(collections=easier.collections[landsat_index])
print(f"Total scenes for {easier.collections[landsat_index]}: {len(search_items)}.")
# Search an entire catalog with filter logic
search_items = easier.searchSTAC(
collections=easier.collections[landsat_index], query=query_params
)
print(f"Total scenes with 0% to 20% cloud coverage: {len(search_items)}")
Total scenes for landsat-c2l1: 465.
Total scenes with 0% to 20% cloud coverage: 70
Let’s refine our search with some spatial features by identifying the items that intersect the mid-Atlantic states.
For this example, we will pass in the geometry from a GeoJSON object.
[9]:
geojson_feature = {
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"coordinates": [
[
[
-73.34480105418365,
45.04588475346222
],
[
-79.11384967787046,
43.52056357721321
],
[
-80.47118272957593,
42.28803694032902
],
[
-80.5365358352876,
40.79229977072049
],
[
-82.49364337671977,
38.39703841681529
],
[
-81.93909548527188,
37.5876942513829
],
[
-83.66567377966864,
36.58712944146795
],
[
-75.852486772675,
36.563723138993936
],
[
-75.01397348543532,
38.467247825352786
],
[
-73.86651757284588,
40.753014073703696
],
[
-73.34480105418365,
45.04588475346222
]
]
],
"type": "Polygon"
}
}
],
"bbox": [
-90.22893757804734,
35.65674455842897,
-76.26898766968725,
43.58558355601974,
],
}
search_items = easier.searchSTAC(
collections="landsat-c2l1",
query=query_params,
intersects=geojson_feature["features"][0]["geometry"],
)
print(
f"Total scenes with 5% to 20% cloud coverage AND also intersect our area of interest: {len(search_items)}"
)
Total scenes with 5% to 20% cloud coverage AND also intersect our area of interest: 2
getAssetNames#
So far we’ve been able to take advantage of the underlying item metadata and spatial component to select items but what can we download?
To get an idea, we can use the getAssetNames
method to get a list of the asset ids. Knowing the asset id’s will allow us to easily drill down into the asset metadata and pull out the reference links to download the data.
What kind of assets can we retrieve from our search results?
[10]:
easier.getAssetNames(search_items)
[10]:
['ANG.txt',
'MTL.json',
'MTL.txt',
'MTL.xml',
'SAA',
'SZA',
'VAA',
'VZA',
'blue',
'cirrus',
'coastal',
'green',
'index',
'lwir11',
'lwir12',
'nir08',
'pan',
'qa_pixel',
'qa_radsat',
'red',
'reduced_resolution_browse',
'swir16',
'swir22',
'thumbnail']
For those eagle eye readers, you may have noticed that items
contained more than one item. ipfs-stac eases the process of pulling out asset names from a collection of items
. It supports getting the asset names from CollectionClient
, ItemCollection
, and Item
objects. This dynamic approach will return unique asset names for the iterable objects.
Refactored data fetching#
Two critical changes have been made to ipfs-stac, which affect the results of the pinned_list
method and when an instance of an Asset
is created:
The
pinned_list
You can now specify which type of pinned content to list with thepin_type
argument.The
pinned_list
method now has anames
argument (boolean), which dictates whether or not to include link names associated with each CID. You can think of link names as a label, such as a filename, making it much easier to identify content with human-readable names.The data associated with an
Asset
object will no longer be fetched by default. To retrieve the data, you must call thefetch
method and then access it through thedata
attribute
pinned_list#
This method fetches pinned CIDs from the configured node. It will now take two arguments:
pin_type
- (optional string): The type of pinned CIDs to list, can be between:direct
,indirect
,recursive
, orall
, it previously defaulted to all. Defaults torecursive
names
- (optional boolean): Whether to include pin/link names in the output json with CIDs. Defaults to false
[11]:
## Usage of updated pinned_list method
recursive_pins = easier.pinned_list()
indirect_pins = easier.pinned_list(pin_type="indirect", names=False)
print(f"Recursive pins: {len(recursive_pins)}")
print(f"Indirect pins: {len(indirect_pins)}")
Recursive pins: 297
Indirect pins: 18900
[ ]:
## Fetching data for an asset
demo_asset = easier.getAssetFromItem(search_items[0], asset_name="SAA")
print(f"Before: {demo_asset.data}")
demo_asset.fetch()
print(f"After: {len(demo_asset.data)}")
[ ]:
# Alternatively, you can force data to be fetched through the fetch_data argument of getAssetFromItem
demo_asset = easier.getAssetFromItem(items[0], asset_name="SAA", fetch_data=True)
Added ability to write/upload to IPFS Mutable File System#
The IPFS Mutable File System (MFS) is a powerful feature to optimize the organization of data stored on the network.
The
uploadToIPFS
method has been updated to support writing to an MFS pathThe
Asset
class now has anaddToMFS
method which supports writing to a specific directory with the option of specifying a file name.
[ ]:
## Example usage
easier.uploadToIPFS(file_path="./image.tiff", file_name="example.tiff", pin_content=True, mfs_path="images")
demo_asset.addToMFS(filename="blog_post")
[5]:
# And finally, shutdown ipfs daemon (will automatically shut down if startDaemon method was used)
easier.shutdown_process()
All in all, the team has produced new features that optimize interfacing with STAC catalogs enriched with IPFS metadata. These changes are a huge step forward in bringing to light the capabilities of decentralized infrastructure when mingled with geospatial data. Stay tuned for more posts that highlight these changes in action. For more technical details, keep an eye out for the Github Repository