What kinds of data work well with blockchain?

Steven Pu • 5 min read • Nov 30, 2019

What kinds of data work well with blockchain?

Preamble

Blockchain technology is often touted to guarantee the veracity, provenance, and immutability of data in a wide variety of applications. But is this true? Or more precisely, is this true for all kinds of data? A similar question was posed at one of Taraxa's meetups, and here we elaborate our thoughts on the subject.

What does "work well" mean?

While many metrics of excellence exist when it comes to handling or processing data, blockchain's first application, bitcoin, has cemented a few interesting characteristics within the popular psyche - we describe a few here that are particularly relevant to the discussion.

  • Veracity is the truthfulness of the data, or how accurately it portrays the objective reality.
  • Provenance is the source and history of a piece of data, it tells you where / what / who generated the data and how it had been handled since generation.
  • Immutability is the un-changeability of the data, or that the data cannot be tampered with once generated.

Can blockchain technology guarantee these properties for all kinds of data?

What kinds of data are there (in the context of blockchain)?

In the context of blockchain, we divide data into three categories.

  • Generated on-chain: this type of data is generated entirely within a specific blockchain network. Because it is generated entirely on the blockchain, it is fully known to and can be completely validated by every node connected to the blockchain. If the network is permission-less, then anyone in the world can know and validate the data. Examples are cryptocurrency account balance, balance transfers, and smart contract persistent state.
  • Generated off-chain, known to many: this type of data is data from the off-chain world that a great number of entities can know about, and usually publicly available. Examples include the weather, outcome of a national election, and stock prices of listed companies.
  • Generated off-chain, known to a few: this type of data is also from the off-chain world but is either privately held data or data that's only known to a few entities. Examples include your phone calls, insurance claims from a car accident, and data from a temperature sensor on a factory's air compressor.
Data categorized by origin
Data categorized by origin

How much of each type of data is there? In the very unscientifically drawn illustration above, on-chain data is the tiniest, followed by off-chain known to many, and finally most of the world's data is generated off-chain and known only to a few.

What can blockchain guarantee for each kind of data?

Now let's put the two parts together and see which properties can blockchain guarantee for each kind of data.

For data that's generated on-chain, we have the trifecta.

  • Veracity is guaranteed through consensus
  • Provenance is guaranteed through cryptographic signatures
  • Immutability is guaranteed through full state replication across nodes

But since this type of data only really exist for cryptocurrencies or other types of purely on-chain assets, the scope of such guarantees is highly limited. These strong guarantees are precisely what makes blockchain technology so alluring, but as we move onto other kinds of data, these guarantees become weaker.

For data that's generated off-chain but known to many, we have weaker guarantees.

  • Veracity is not guaranteed through a consensus algorithm, but through a carefully crafted game whereby players are incentivized to expose each other for telling lies. For example, if the highest stock price for company X on the NYSE was $1, but I provided a data point of $1.2, then others could come forward and challenge my claim. If enough challenges were accumulated, I would lose my deposit (I am punished) to the challengers (they are rewarded). Such games are common among oracles such as Chainlink or off-chain computation solutions such as Truebit.
  • Provenance can only be guaranteed if the data generating entity has a well-known public key. It is typically desirable that any piece of data comes from many sources so that they could compete in the game and see which version of reality is true.
  • Immutability is guaranteed through full state replication just like on-chain data.

For data that's generated off-chain but known only to a few, the guarantees are weaker still - keeping in mind that kind of data accounts for most of the data in the world.

  • Veracity cannot be guaranteed, since so few entities are aware of the data there is not enough players to make the outcome of a game convincing.
  • Provenance can only be guaranteed if the data generating entity has a well-known public key, e.g., a sensor gateway.
  • Immutability is guaranteed through full state replication just like on-chain data.

For most of the world's data, blockchain loses the ability to guarantee the most attractive property of all - veracity. This may seem to be bad news but keep in mind that the other two properties, provenance and immutability, are still very powerful.

Working with IoT

IoT data is mostly generated off-chain and known to few
IoT data is mostly generated off-chain and known to few

Almost all IoT-generated data fall squarely into the third category (they're generated off-chain and are almost never made publicly available). This means that blockchain can only guarantee provenance and immutability, but it also means the addressable market is extremely large. At Taraxa, we are building and deploying many solutions to address these large-scale problems.

As the world becomes more connected (through devices) and automated, we're increasingly reliant upon device-generated data as the basis for business transactions. Sensors provide data on usage patterns, quality of service, contractual adherence, etc. the list goes on and on. Without basic trust in device-generated data, business models become frictional or outright impossible, leading to massive added operational and opportunity costs.

Blockchain gives devices identities through cryptographic keys, helping them to prove data provenance, and data immutability, helping them to prove the data wasn't tampered with after generation. Such properties create a foundation of trust enabling innovative business models.

Taraxa is committed to help IoT devices become trusted entities through blockchain technology and do so at scale, and we are working hard to deploy practical solutions to business pain points today.

Stay tuned.