Fluence

Image

OverView

Symbol N/A
Concept

Fluence will allow to store, query, manage and monetize encrypted structured data, securing it with the power of blockchain technologies. Our goal is to become a go-to database for decentralized Internet applications.

Team Connections

EVGENY PONOMAREV is Organizing Committee Member at CodeFest Developers Conference

EVGENY PONOMAREV

CEO

Entrepreneur and product manager, highly passionate about blockchain technology and artificial intelligence. Worked on product at 2GIS (20M monthly audience). In charge of the project management track at CodeFest developers conference (1500+ attendees / year). Mined first Bitcoin in 2011on CPU.

DMITRY KURINSKIY

CTO

Software engineer and technical leader, building complex engineering projects since 2004. Keen functional programming advocate. Prior to Fluence he was a CTO / lead engineer at a bunch of technology startups.

ALEXANDER DEMIDKO

Technical Advisor

Experienced distributed systems engineer with deep understanding of computer science, game theory and recent interest in machine learning. At Metamarkets he built a petabyte scale analytics platform capable of processing hundreds of billions events / day.

MICHAEL EGOROV

Advisor

Michael is a CTO and co-founder of Nucypher, a company providing an encryption layer for popular Big Data frameworks. After obtaining a PhD in physics from Swinburne University he built ZeroDB, an open source end-to-end encrypted database.

Website

view

Blog

view

White Papper

view

Whitepaper

Abstract

Fluence is a decentralized database based on p2p network of independent peers. It provides all features of traditional databases like querying and filtering data with the addition of total encryption and flexible data access management. Fluence is relatively fast, scalable, fault tolerant and censorship resistant by design.

Fluence organizes nodes to clusters that responsible for particular Dataset. Each cluster keeps its own blockchain to have a consensus about all operations and rewards. Also, a set of nodes called Arbiters looks after each cluster to verify operations. Storage economy is implemented with storage contracts, which all parties should sign and correspond to. Rewards for nodes are defined by contract conditions after providing proof-of-retrievability every time tick. For a node, there is no way to spend rewards other than by withdrawal.

The project implements end-to-end encryption for both NoSQL database and B-Tree indices, allowing to perform range queries and search by key. Each request is processed by a set of nodes that have to reach consensus about results. Additionally, we use proxy re-encryption technology to let encrypted data be shared with other parties without exposing encryption keys.

Network Overview

Fluence relies on libp2p of IPFS for networking. It is based on S/Kademlia approach: every node and every resource (Dataset in our case) have an ID from the same ID space. We place the database to nodes with IDs closest to Dataset ID, to achieve the more uniform distribution and faster allocation.

The keyspace is the only shared state in the whole Fluence Network, so there's no need to keep open connections or continually exchange any data between nodes. However, each Dataset forms its own subnetwork, that becomes fully connected while operates. This subnetwork consists of Cluster Nodes which store Dataset replicas and perform operations on it and Arbitrage Nodes. Arbitrage Nodes role is to verify the Cluster's state.

Node Overview

Each node in Fluence Network has several roles: it participates in some Clusters, storing and managing corresponding Datasets and Blockchains, it arbiters some other Clusters, helps them keep their Blockchains valid and consistent. Each node should be ready to accept incoming connection from the Client and perform outgoing communication.

In general, the node consists of three parts: Network Layer (described in previous part), Dataset Management System, and Blockchain.

DSMS Overview

Dataset consists of one or more indexed columns and unindexed rows. All data is encrypted on the client side, so that node knows nothing about its contents.

Rows are stored as raw byte arrays in a key-value store. The structure of rows contents, the number of columns are hidden from the node.

For ordered indices, B-Tree data structure is used. We reuse ZeroDB approach to search through the encrypted index. The Client can apply Order Preserving Encryption to prepare indices, in this case, querying can be made without round trip, as a node can compare values and select rows by ranges of indexed values.

Dataset is fully replicated among all Cluster Nodes. To enable replication and recovery, all writing operations to Dataset are saved to a journal.

After Dataset state change, Cluster Nodes must reach consensus in the new state's hash, and store a new block in Cluster Blockchain. This block contains proofs that all nodes after performing given writes have Dataset files with the same hash.

Blockchain Overview

Cluster blockchain is used to safely store Datastore meta information: contract, functional tokens transactions, proxy re-encryption key for data sharing and witnesses of proofs of retrievability. Blocks are being built and added by Cluster Nodes, and validated by Arbiters.

Each node checks the block’s content for the validity. Consider Proof of Retrievability block: every tick of time (1 hour by default) all Cluster Nodes must prove that they are alive and have a copy of latest Dataset to query over. For each node, after each tick, one or more Arbiter asks for part of data and corresponding salted hash, to prove that node still has the data. Then Arbiter adds results of its check to the block and signs it with the private key.

Each block has all Cluster Nodes signatures and some Arbiter signatures. It contains a link to previous block's hash and exposes block ID with its own hash, so at any moment of time, it's hard to fake. That's how Fluence makes cheap and fast internal sources of trust.

Client API Overview

To operate with Fluence, a Client should install Fluence service. It performs all low-level operations and offers simple unencrypted API as a local service.

For each Dataset, Client API performs key pairs management in private keys store. The Client should back it up in some cold dark place.

Client API provides simple JSON MongoDB-like gateway to all Client's data. You can think of each Dataset as of MongoDB's collection.

In a case of data being too big, Client API handles Transparent Sharding: a new Dataset is allocated on another Cluster in Fluence Network, and the Client can operate both, just like they are located in a single place.

Functional Token I/O

To fuel up Fluence Network, we introduce Fluence Functional Token (FFT) – an internal token to handle transactions between Clients and node owners. This token is not tradable and not transferable; it is issued and burned by Fluence Gateway in return to Tradable Tokens (FLU).

FFT aims to provide fast and cheap transactions within Dataset Cluster. When FFT are issued in external blockchain (via Ethereum contract of our Gateway), it's tracked by the corresponding Cluster, and the incoming transaction is placed on Cluster Blockchain.

In order to store Dataset or perform any operation with, the Client is required to pay with FFT tokens. During payment, tokens are transferred to node account in Cluster Blockchain

Once node owner wants to withdraw FFT and get tradable tokens, he creates a burn transaction, which is seen and verified by the Gateway, and tokens are issued to Ethereum address.

Contract

Storage contract is an agreement between Client and Cluster, and it’s stored in Cluster Blockchain. The contract is signed by all parties including Arbiters at the beginning of the collaboration; it has a time period, data allocation size, read price, etc.

The example of contract object:
● Contract ID (public key)
● Client ID (public key)
● Valid until (default: month)
● Nodes, Arbiters
● Replication size
● Node requirements
● Allocation in GB
● Max response time
● Finance conditions
● Price/GB/month
● Gas price

The Contract is being deposited by the Client and is active while it has funds. Any time Client can terminate the Contract. If there are enough tokens on Contract at the moment when its period ends, its prolongated automatically.

Nodes receive rewards both for storage and performing operations on data. To receive rewards for storing, every hour each node in the Cluster should perform a proof of retrievability by putting proof hash in Cluster Blockchain and getting verifications from Arbiters. Nodes actually get funds for Contract only when its period ends. If the node is offline for too long during the Contract period, it’s punished either by not receiving tokens for the hours when it’s offline, or – in the worst case – by not receiving tokens for the Contract at all, and being removed from the Contract in favor of other, more stable nodes. This decision is made by Arbiters and is regulated by SLA.

Another way to get tokens is performing operations on data. For each operation, its complexity is evaluated using the formula known by all parties. Each “write” operation must be executed and signed by all nodes of the Cluster, so all nodes are rewarded. For read operations, the Client can choose between better data integrity guarantees (more signatures that different nodes performed the same query and got the same result) or speed (even with just one signature).

Sharing Data Overview

In addition to enabling storage of structured sensitive data, Fluence Network also lets Clients share their data with known third parties, and get rewarded.

Dataset Sharing is organized with Sharing Contract, which is placed on Cluster Blockchain. By default, Sharing Contract sets zero profit – zero loss policy: the same gas price for operations as on the main contract with no additions. The receiver must fund the sharing contract before querying it, and then each query is billed just as Client's request is.

However, the Client has a possibility to set up increased operations price for Sharing Contract. In this case, on every read, the transaction is divided between nodes and Client.

Proxy Re-Encryption Key

All data in the Dataset is encrypted by Client's private key, so just sharing the Contract is not enough to get data. Along with Sharing Contract, the Client provides Proxy Re-Encryption Key, which is derived from Client's Private Key and receiver's Public Key.

Once receiver asks Cluster Nodes for data, data is re-encrypted using the Proxy key. Then a receiver can decrypt data with his private key. No data is disclosed to third parties.

To provide fine grained access control, the Client can encrypt the Data with a tree of Private Keys, even with a special key for every row. In this case, at the price of more complex keys management, the Client can share just the required amount of data.

Security Concerns

The described system has a few attack vectors. Some of them typical for all distributed systems, some are database and storage specific. We’re describing just a part of attacks; other require further investigation.

Malicious node has multiple ways to attack system security. Because we don’t use PoW, there is no work that node should perform to generate a block. The node can speed up time, pretend that next time tick for the new block is coming and start issuing a message with a new block.

Fluence has two levels protection architecture from such attacks. At first, nodes in the same cluster should decline messages of this node, because they are concerned to receive a reward for the whole contract and don’t be punished. However, the Cluster is limited (usually about seven nodes) and can unite for cheating. The Cluster can act like single malicious node, by not checking each other’s proof of retrievability and speeding up time for blocks.

If that happens, Arbiters, which number is much greater than Cluster, throw out malicious nodes from the Cluster and choose substitutions from themselves. The chance of cheating is defined by the amount of Arbiters for each Cluster. Because of Arbiters only store Cluster Blockchain, but not data, it is very easy to scale their amount without harm for the network.

There is no motivation for Arbiters to cheat because most of the network will notice that and they will lose reward for block verification.

Regulation Activity

Regulators may try to block or isolate nodes that store unwanted content. Due to cluster limitation, it is possible, for example, to ban nodes by IP in particular country. However, the Client has all tools to run recovery mode for Cluster and substitute nodes in the Cluster to Arbiters. Because Arbiters number is rather large, the regulator will have to ban new nodes every time.

Denial of Remove

Probably, some node may deny "remove data" requests from the Client and keep it on the drive. This behavior brings no benefit since data is encrypted. If such node tries to participate in requests/responses in the Cluster, it will be ignored by other nodes because it has different database version and can’t perform right database requests.

Also, the node may store re-encryption key that was asked to be revoked by owner. However, this won’t bring benefit to data buyer, because this node’s responses will be ignored by Cluster consensus.

Data buyer may try to set up the own node to get in the Cluster and recover access to data. The probability of this is minimal because to be accepted by Cluster node should get ID that close to Cluster by Kademlia.