Preface#

One of the pivotal projects the EASIER Data Initiative has produced is ipfs-stac. The Python library is a testament to the feasibility of onboarding and interfacing geospatial data on IPFS. The library enables developers and researchers to leverage STAC APIs enriched with Filecoin and IPFS metadata to seamlessly fetch, pin, and explore data in a familiar manner. In an ambiguous ecosystem with everchanging advancements, updates, breaking changes, and new infrastructure features will emerge. The team has made it a responsibility to adhere to these changes, prompting our projects to remain flexible. This notebook will explore the many new features and changes to ipfs-stac in version 0.2.

Changes Summary#

  1. When fetching content, the file size is now human-readable (progress is now tracked in Megabytes)

  2. New search functionality via searchSTAC method added to the web3 client class. Returns a collection of items

    1. A user can now pass in many of the query parameters options to search a STAC catalog

  3. Added parameters for content uploads to ipfs:

    1. By default, CIDv1 are created

    2. Added option to select whether to pin content to your IPFS node

    3. Added option to add mutable file system (MFS) reference to the content on upload

    4. Added option to provide a filename to content that’s uploaded

      1. If a user uploads a file, the filename is extracted. You can override by passing in a value to this parameter.

  4. Optimized functions that start and stop ipfs daemon

  5. Assets are no longer fetched by default

  6. Added getAssetNames function to retrieve the asset names from a collection or item

  7. New Web3 class property that automatically grabs all the collection id from the stac endpoint when instantiated.

  8. pinned_list returns

Environment Setup#

1 - Install IPFS Kubo CLI (if you haven’t already). This will allow you to run an IPFS node on your local machine.

2 - Set up a Jupyter Notebook environment. A convenient method for achieving this is by utilizing the Jupyter integration in Visual Studio Code.

3 - Run pip install ipfs-stac to install the latest version of the library.

[2]:
from ipfs_stac.client import Web3, Asset

### Initialize client
easier = Web3(local_gateway="localhost", gateway_port=8081, stac_endpoint="https://stac.easierdata.info")

Attributes added to the Web3 class#

A couple attributes have been added to the Web3 class which support a deeper understanding of its current configuration and high level exploration of the STAC endpoint:

  1. Added client attribute - Instance of a Pystac catalog client to support a variety of additional API functionality.

  2. Added collections attribute - List of unique collection identifiers to enable the discovery of additional collection metadata.

Added methods to start and stop IPFS daemon#

  1. Added startDaemon method to Web3 class - Will attempt to start ipfs daemon on the device running the Python program. After the program has finished running, the daemon will be shut down. If the daemon is already running, it will not be tagged.

  2. Added shutdown_process method to Web3 class - will shut down the ipfs daemon on the device running the Python program.

[3]:
easier.startDaemon()

print(f"Collections: {easier.collections} \n")

easier.client

Collections: ['landsat-c2l1', 'GEDI_L4A_AGB_Density_V2_1_2056.v2.1']

[3]:

You can also retrieve a list of Collection objects through the new get_collections() method.

Let’s explore STAC Collection for GEDI.

[5]:
# Grab the list index containing the GEDI data
gedi_index = easier.collections.index('GEDI_L4A_AGB_Density_V2_1_2056.v2.1')

# Grab the GEDI collection via the list index
easier.getCollections()[gedi_index]

[5]:

Refactored data fetching#

Two critical changes have been made to ipfs-stac, which affect the results of the pinned_list method and when an instance of an Asset is created:

  1. The pinned_list You can now specify which type of pinned content to list with the pin_type argument.

  2. The pinned_list method now has a names argument (boolean), which dictates whether or not to include link names associated with each CID. You can think of link names as a label, such as a filename, making it much easier to identify content with human-readable names.

  3. The data associated with an Asset object will no longer be fetched by default. To retrieve the data, you must call the fetch method and then access it through the data attribute

pinned_list#

This method fetches pinned CIDs from the configured node. It will now take two arguments:

  1. pin_type - (optional string): The type of pinned CIDs to list, can be between: direct, indirect, recursive, or all, it previously defaulted to all. Defaults to recursive

  2. names - (optional boolean): Whether to include pin/link names in the output json with CIDs. Defaults to false

[11]:
## Usage of updated pinned_list method
recursive_pins = easier.pinned_list()

indirect_pins = easier.pinned_list(pin_type="indirect", names=False)

print(f"Recursive pins: {len(recursive_pins)}")
print(f"Indirect pins: {len(indirect_pins)}")

Recursive pins: 297
Indirect pins: 18900
[ ]:
## Fetching data for an asset
demo_asset = easier.getAssetFromItem(search_items[0], asset_name="SAA")
print(f"Before: {demo_asset.data}")

demo_asset.fetch()

print(f"After: {len(demo_asset.data)}")

[ ]:
# Alternatively, you can force data to be fetched through the fetch_data argument of getAssetFromItem
demo_asset = easier.getAssetFromItem(items[0], asset_name="SAA", fetch_data=True)

Added ability to write/upload to IPFS Mutable File System#

The IPFS Mutable File System (MFS) is a powerful feature to optimize the organization of data stored on the network.

  1. The uploadToIPFS method has been updated to support writing to an MFS path

  2. The Asset class now has an addToMFS method which supports writing to a specific directory with the option of specifying a file name.

[ ]:
## Example usage
easier.uploadToIPFS(file_path="./image.tiff", file_name="example.tiff", pin_content=True, mfs_path="images")

demo_asset.addToMFS(filename="blog_post")

[5]:
# And finally, shutdown ipfs daemon (will automatically shut down if startDaemon method was used)
easier.shutdown_process()

All in all, the team has produced new features that optimize interfacing with STAC catalogs enriched with IPFS metadata. These changes are a huge step forward in bringing to light the capabilities of decentralized infrastructure when mingled with geospatial data. Stay tuned for more posts that highlight these changes in action. For more technical details, keep an eye out for the Github Repository