April 20, 2026

From Non-Fungible to Fungible and Composable Data Tokens

From Non-Fungible to Fungible and Composable Data Tokens

Within this Data Economy, the data-oriented blockchain substrate is about access control, which is built into Ocean Protocol, where v1.0 shipped in July 2019. (Network stats. Code.) Here’s the architecture of this substrate.

The second part key part of the Data Economy — the lifeblood — is data tokens. The rest of this post builds on the previously-introduced non-fungible data tokens to round out the other key data tokens: fungible and composable data tokens.

  1. A driving use case for these tokens is to improve data custody. The aim is to get bank-grade security over access control by leveraging crypto hardware wallets, multisig paradigms, and the like. Part 1 gave details.
  2. NFDTs exist will be able to flow through existing crypto infrastructure for existing NFT use cases: NFT marketplaces, loans, and more.
  3. Non-fungible data tokens will serve as a foundation for fungible and composable data tokens, as we will see later in this essay.

Here’s the main idea: use a non-fungible token (NFT — ERC721) to wrap Ocean-style data access controls. Part 1 gave details.

From a legal standpoint, data is intellectual property, either as copyright or a trade secret. Then data access control is a legal contract giving the holder the right to access the data.

Every non-fungible data token is its own unique snowflake. However, there are cases where you might want identical data access tokens.

[Image: CC0]

Here are some use cases for identical data access tokens:

  1. Leverage more infrastructure. This is probably the biggest use case for fungible data tokens. Most crypto wallets and exchanges work for fungible (ERC20) but not non-fungible (ERC71). This goes for other infrastructure too, from token bridges to DeFi loans.
  2. Limited edition datasets. You want to use artificial scarcity to help price discovery, just as it does for NYC taxi medallions. Imagine 1000 “limited edition” data tokens each providing access to a valuable data feed. These tokens could be bought and sold in an order-book based exchanges to provide price discovery. (Related: limited-edition digital art like our past ascribe work.)
  3. Popularity-based pricing. Consider if you want the price of a dataset to go up as token popularity (# buyers) goes up, and down as popularity goes down, according to a pre-set schedule. There is no bound to the supply of data tokens.

We can implement a fungible data token by simply wrapping an ERC721 data token with an ERC20 token, i.e. wrap a non-fungible with a fungible token.

Case (1) above would use any ERC20 wrapper. For example, it could be a simple ERC20 token contract with a fixed or open-ended supply.

Case (2) would use a specific fixed supply of 1000 editions, or probably even simpler, keep as an NFT.

Case (3) above would use an ERC20 that includes a bonding curve, exactly as Billy Rennekamp showed in his “Re-Fungible Token” article.

  • Each time a person wants to buy a fungible data token (an ERC20 token to access a given dataset), that token is minted (going right on the y-axis of the bonding curve). The price with each new buy is according to the shape of the bonding curve; typically this is monotonically increasing which means price goes up with each new purchase.
  • Each time someone sells their fungible data token, the token is burned (going left on the bonding curve). The price of the token goes down after each sell.

Here are some reasons that we might want to hierarchically organize (compose) data tokens:

  1. Streaming data. Consider a stream of data, where every interval of ten minutes there’s another chunk of data. You want to package and sell the last 24 hours worth of data as a single token.
  2. Many data sources. Consider 100 data streams from 100 unique Internet-of-Things (IoT) devices. You want to package and sell one interval’s worth of 100 data chunks.
  3. Data baskets. Consider: you’re an asset manager with ninja data science skills. You’ve grouped together an awesome group of 1000 datasets that each have individual (but small) value. You want to sell this group of data as a single asset to others growing their asset base, wanting to diversify by holding data assets. These data assets could be static or dynamic (streaming) data services.
  4. Data indexes. Consider a future with thousands potentially investable datastreams. You track the top 100 and make it easy for others to invest in those as a single asset, similar to today’s index funds.
  5. Data frames. You have a huge dataset, but you only want to give access to a subset of that dataset to others. You’ve specified the subset using a Pandas dataframe (Pandas is a mainstay of Python data science tooling).
  6. Priced compute pipelines. Consider: You’re a data scientist and have put together a series of steps — a compute pipeline — to train a private AI model. You want to make that pipeline available to others. It needs to operate on private data, i.e. you can’t see the training data or intermediate results. How do you easily pay for the total set of services across the pipeline?
  7. Annotating metadata including reputation or quality. You’re a data marketplace that wants to give more information about the dataset: what reputation your users have given the dataset, what is the dataset’s quality according to your marketplace’s quality measures, or whether it is input training data vs output. You want this data to be on-chain, but you don’t have access control to the metadata field.
[Image: Bernhard_Staerck]

ERC998 is a standard for composable tokens. Each item in the basket can be ERC20, ERC721, or an ERC998.

We can implement a composable data token (CDT) by simply collecting together any existing fungible (ERC20) or non-fungible (ERC721) data tokens into an ERC998 token. We can build larger hierarchies by collecting together ERC998s in addition to ERC20 and ERC721 data tokens.

Top down or bottom-up. ERC998 allows for top-down composition, where the holder at the root node ERC998 controls the rest of the tree. It also allows for bottom-up composition, where a token can “attach itself” to other tokens. E.g. metadata information could be attached to a raw blob of data.

Note that while it’s relatively high gas fees to use ERC 998 on Ethereum mainnet, it matters less in an Ocean Protocol context because Ocean runs on a network with lower gas costs.

Here’s how each use case above is handled.

(1) Streaming data. The chunk of each time interval is a non-fungible data token (NFDT). Then a composable data token (CDT) collects together 24 hours’ worth of NFDTs.

(2) Many data sources. Each independent datasource is a NFDT. A CDT collects them together.

(3) Data baskets. An ERC998-based CDT collects together sub-tokens (NFDTs, FDTs, and smaller CDTs) into an asset of value. This might also use other financial basket protocols like Set Protocol or Melon Protocol.

(4) Data indexes. One way to implement this is like (3), using top-down ERC998. An owner of the top-level ERC998 token would own all the sub-tokens in the basket (think Set Protocol). Alternatively, one could create new data index token and attach it to each of the data tokens, bottom-up style (ERC998). To make it tradeable, the bottom-up token would be attached to collateral (think Uma Protocol), or put into a prediction market (e.g. Gnosis).

(5) Data frames. A lower-level data token holds the whole dataset. Then a higher-level CDT holds permissions to just the subset, where the specific subset is described in a Pandas dataframe stored in the higher-level CDT metadata.

(6) Compute pipelines. The pipeline might look like: input raw training data X/y → clean the data → store cleaned data → build model → store model → input raw test data X_test → run predictions → store predicted result y_test. This is an interleaving of data service → compute service → data service → etc. It could be executed as an Ocean Service Execution Agreement (SEA) where Ocean orchestrates the steps. Here’s the extra-interesting part: each compute & data service is itself tokenized. Then there’s an ERC998 composable token that holds each of those tokens, along with metadata about how to connect them (e.g. SEA style).

(7) Annotating metadata, e.g. reputation, quality. Your marketplace would do this by creating a new token holding the extra metadata, then attaching that new token to the existing dataset token using ERC998 bottom-up approach. Alternatively, you could use a “tagging” standard.

The following image summarizes the relation among data tokens.

Published at Mon, 25 Nov 2019 21:14:13 +0000

{flickr|100|campaign}

Previous Article

Bitcoin Bulls Defend Key Trendline as Investors Express Extreme Fear

Next Article

Crypto Markets Showing Mild Signs of Recovery, BTC Stalls At $7,200

You might be interested in …