Filecoin

VIDEO

Image

OverView

Symbol N/A
Concept

A MASSIVE AMOUNT OF STORAGE SITS UNUSED IN DATA CENTERS AND HARD DRIVES AROUND THE WORLD.

Website

view

White Papper

view

Social Media Ststistics.

In Crypto, community matters
Image

Facebook

Image

Twitter

Joined July 2014

Members

Whitepaper

Abstract

The internet is in the middle of a revolution: centralized proprietary services are being replaced with decentralized open ones; trusted parties replaced with verifiable computation; brittle location addresses replaced with resilient content addresses; inefficient monolithic services replaced with peer-to-peer algorithmic markets. Bitcoin, Ethereum, and other blockchain networks have proven the utility of decentralized transaction ledgers. These public ledgers process sophisticated smart contract applications and transact crypto-assets worth tens of billions of dollars. These systems are the first instances of internetwide Open Services, where participants form a decentralized network providing useful services for pay, with no central management or trusted parties. IPFS has proven the utility of content-addressing by decentralizing the web itself, serving billions of files used across a global peer-to-peer network. It liberates data from silos, survives network partitions, works offline, routes around censorship, and gives permanence to digital information.

Filecoin is a decentralized storage network that turns cloud storage into an algorithmic market. The market runs on a blockchain with a native protocol token (also called “Filecoin”), which miners earn by providing storage to clients. Conversely, clients spend Filecoin hiring miners to store or distribute data. As with Bitcoin, Filecoin miners compete to mine blocks with sizable rewards, but Filecoin mining power is proportional to active storage, which directly provides a useful service to clients (unlike Bitcoin mining, whose usefulness is limited to maintaining blockchain consensus). This creates a powerful incentive for miners to amass as much storage as they can, and rent it out to clients. The protocol weaves these amassed resources into a self-healing storage network that anybody in the world can rely on. The network achieves robustness by replicating and dispersing content, while automatically detecting and repairing replica failures. Clients can select replication parameters to protect against different threat models. The protocol’s cloud storage network also provides security, as content is encrypted end-to-end at the client, while storage providers do not have access to decryption keys. Filecoin works as an incentive layer on top of IPFS [1], which can provide storage infrastructure for any data. It is especially useful for decentralizing data, building and running distributed applications, and implementing smart contracts.

This work: (a) Introduces the Filecoin Network, gives an overview of the protocol, and walks through several components in detail.
(b) Formalizes decentralized storage network (DSN) schemes and their properties, then constructs Filecoin as a DSN.
(c) Introduces a novel class of proof-of-storage schemes called proof-of-replication, which allows proving that any replica of data is stored in physically independent storage. (d) Introduces a novel useful-work consensus based on sequential proofs-of-replication and storage as a measure of power.
(e) Formalizes verifiable markets and constructs two markets, a Storage Market and a Retrieval Market, which govern how data is written to and read from Filecoin, respectively.
(f) Discusses use cases, connections to other systems, and how to use the protocol.
Note: Filecoin is a work in progress. Active research is under way, and new versions of this paper will appear at https://filecoin.io. For comments and suggestions, contact us at research@filecoin.io.

1 Introduction

Filecoin is a protocol token whose blockchain runs on a novel proof, called Proof-of-Spacetime, where blocks are created by miners that are storing data. Filecoin protocol provides a data storage and retrieval service via a network of independent storage providers that does not rely on a single coordinator, where: (1) clients pay to store and retrieve data, (2) Storage Miners earn tokens by offering storage (3) Retrieval Miners earn tokens by serving data.

1.1 Elementary Components
The Filecoin protocol builds upon four novel components. 1. Decentralized Storage Network (DSN): We provide an abstraction for network of independent storage providers to offer storage and retrieval services (in Section 2). Later, we present the Filecoin protocol as an incentivized, auditable and verifiable DSN construction (in Section 4).

2. Novel Proofs-of-Storage: We present two novel Proofs-of-Storage (in Section 3): (1) Proof-ofReplication allows storage providers to prove that data has been replicated to its own uniquely dedicated physical storage. Enforcing unique physical copies enables a verifier to check that a prover is not deduplicating multiple copies of the data into the same storage space; (2) Proof-of-Spacetime allows storage providers to prove they have stored some data throughout a specified amount of time.

3. Verifiable Markets: We model storage requests and retrieval requests as orders in two decentralized verifiable markets operated by the Filecoin network (in Section 5). Verifiable markets ensure that payments are performed when a service has been correctly provided. We present the Storage Market and the Retrieval Market where miners and clients can respectively submit storage and retrieval orders.

4. Useful Proof-of-Work: We show how to construct a useful Proof-of-Work based on Proof-ofSpacetime that can be used in consensus protocols. Miners do not need to spend wasteful computation to mine blocks, but instead must store data in the network.

1.2 Protocol Overview

• The Filecoin protocol is a Decentralized Storage Network construction built on a blockchain and with a native token. Clients spend tokens for storing and retrieving data and miners earn tokens by storing and serving data.

• The Filecoin DSN handle storage and retrieval requests respectively via two verifiable markets: the Storage Market and the Retrieval Market. Clients and miners set the prices for the services requested and offered and submit their orders to the markets.

• The markets are operated by the Filecoin network which employs Proof-of-Spacetime and Proof-ofReplication to guarantee that miners have correctly stored the data they committed to store.

• Finally, miners can participate in the creations of new blocks for the underlining blockchain. The influence of a miner over the next block is proportional to the amount of their storage currently in use in the network.

A sketch of the Filecoin protocol, using nomenclature defined later within the paper, is shown in Figure 1 accompanied with an illustration in Figure 2.

1.3 Paper organization

The remainder of this paper is organized as follows. We present our definition of and requirements for a theoretical DSNscheme in Section 2. In Section 3 we motivate, define, and present our Proof-of-Replication and Proof-of-Spacetime protocols, used within Filecoin to cryptographically verify that data is continuously

stored in accordance with deals made. Section 4 describes the concrete instantiation of the Filecoin DSN, describing data structures, protocols, and the interactions between participants. Section 5 defines and describes the concept of Verifiable Markets, as well as their implementations, the Storage Market and Retrieval Market. Section 6 motivates and describes the use of the Proof-of-Spacetime protocol for demonstrating and evaluating a miner’s contribution to the network, which is necessary to extend the blockchain and assign the block reward. Section 7 provides a brief description of Smart Contracts within the Filecoin We conclude with a discussion of future work in Section 8

2 Definition of a Decentralized Storage Network

We introduce the notion of a Decentralized Storage Network (DSN) scheme. DSNs aggregate storage offered by multiple independent storage providers and self-coordinate to provide data storage and data retrieval to clients. Coordination is decentralized and does not require trusted parties: the secure operation of theses systems is achieved through protocols that coordinate and verify operations carried out by individual parties. DSNs can employ different strategies for coordination, including Byzantine Agreement, gossip protocols, or CRDTs, depending on the requirements of the system. Later, in Section 4, we provide a construction for the Filecoin DSN.

Definition 2.1. A DSN scheme Π is a tuple of protocols run by storage providers and clients:
(Put, Get, Manage)
• Put(data) → key: Clients execute the Put protocol to store data under a unique identifier key.
• Get(key) → data: Clients execute the Get protocol to retrieve data that is currently stored using key.
• Manage(): The network of participants coordinates via the Manage protocol to: control the available storage, audit the service offered by providers and repair possible faults. The Manage protocol is run by storage providers often in conjunction with clients or a network of auditors1 .

A DSN scheme Π must guarantee data integrity and retrievability as well as tolerate management and storage faults defined in the following sections.

2.1 Fault tolerance
2.1.1 Management faults
We define management faults to be byzantine faults caused by participants in the Manage protocol. A DSN scheme relies on the fault tolerance of its underlining Manage protocol. Violations on the faults tolerance assumptions for management faults can compromise liveness and safety of the system. For example, consider a DSN scheme Π, where the Manage protocol requires Byzantine Agreement (BA) to audit storage providers. In such protocol, the network receives proofs of storage from storage providers and runs BA to agree on the validity of these proofs. If the BA tolerates up to f faults out of n total nodes, then our DSN can tolerate f < n/2 faulty nodes. On violations of these assumptions, audits can be compromised.

2.1.2 Storage faults We define storage faults to be byzantine faults that prevent clients from retrieving the data: i.e. Storage Miners lose their pieces, Retrieval Miners stop serving pieces. A successful Put execution is (f, m)-tolerant if it results in its input data being stored in m independent storage providers (out of n total) and it can tolerate up to f byzantine providers. The parameters f and m depend on protocol implementation; protocol designers can fix f and m or leave the choice to the user, extending Put(data) into Put(data, f, m). A Get execution on stored data is successful if there are fewer than f faulty storage providers.

For example, consider a simple scheme, where the Put protocol is designed such that each storage provider stores all of the data. In this scheme m = n and f = m − 1. Is it always f = m − 1? No, some schemes can be designed using erasure coding, where each storage providers store a special portion of the data, such that x out of m storage providers are required to retrieve the data; in this case f = m − x.

2.2 Properties
We describe the two required properties for a DSN scheme and then present additional properties required by the Filecoin DSN. In the case where the Manage protocol relies on a blockchain, we consider the miners as auditors, since they verify and coordinate storage providers 2.2.1 Data Integrity
This property requires that no bounded adversary A can convince clients to accept altered or falsified data at the end of a Get execution. Definition 2.2. A DSN scheme Π provides data integrity if: for any successful Put execution for some data d under key k, there is no computationally-bounded adversary A that can convince a client to accept d 0 , for d 0 6= d at the end of a Get execution for identifier k.

2.2.2 Retrievability
This property captures the requirement that, given our fault-tolerance assumptions of Π, if some data has been successfully stored in Π and storage providers continue to follow the protocol, then clients can eventually retrieve the data. Definition 2.3. A DSN scheme Π provides retrievability if: for any successful Put execution for data under key, there exists a successful Get execution for key for which a client retrieves data.

2.2.3 Other Properties
DSNs can provide other properties specific to their application. We define three key properties required by the Filecoin DSN: public verifiability, auditability, and incentive-compatibility. Definition 2.4. A DSN scheme Π is publicly verifiable if: for each successful Put, the network of storage providers can generate a proof that the data is currently being stored. The Proof-of-Storage must convince any efficient verifier, which only knows key and does not have access to data. Definition 2.5. A DSN scheme Π is auditable, if it generates a verifiable trace of operation that can be checked in the future to confirm storage was indeed stored for the right duration of time. Definition 2.6. A DSN scheme Π is incentive-compatible, if: storage providers are rewarded for successfully offering storage and retrieval service, or penalized for misbehaving, such that the storage providers’ dominant strategy is to store data.

3 Proof-of-Replication and Proof-of-Spacetime

In the Filecoin protocol, storage providers must convince their clients that they stored the data they were paid to store; in practice, storage providers will generate Proofs-of-Storage (PoS) that the blockchain network (or the clients themselves) verifies.

In this section we motivate, present and outline implementations for the Proof-of-Replication (PoRep) and Proof-of-Spacetime (PoSt) schemes used in Filecoin.

3.1 Motivation
Proofs-of-Storage (PoS) schemes such as Provable Data Possession (PDP) [2] and Proof-of-Retrievability (PoR) [3, 4] schemes allow a user (i.e. the verifier V) who outsources data D to a server (i.e. the prover P) to repeatedly check if the server is still storing D. The user can verify the integrity of the data outsourced to a server in a very efficient way, more efficiently than downloading the data. The server generates probabilistic proofs of possession by sampling a random set of blocks and transmits a small constant amount of data in a challenge/response protocol with the user.

PDP and PoR schemes only guarantee that a prover had possession of some data at the time of the challenge/response. In Filecoin, we need stronger guarantees to prevent three types of attacks that malicious miners could exploit to get rewarded for storage they are not providing: Sybil attack, outsourcing attacks, generation attacks.

• Sybil Attacks: Malicious miners could pretend to store (and get paid for) more copies than the ones physically stored by creating multiple Sybil identities, but storing the data only once.
• Outsourcing Attacks: Malicious miners could commit to store more data than the amount they can physically store, relying on quickly fetching data from other storage providers.
• Generation Attacks: Malicious miners could claim to be storing a large amount of data which they are instead efficiently generating on-demand using a small program. If the program is smaller than the purportedly stored data, this inflates the malicious miner’s likelihood of winning a block reward in Filecoin, which is proportional to the miner’s storage currently in use.

3.2 Proof-of-Replication
Proof-of-Replication (PoRep) is a novel Proof-of-Storage which allows a server (i.e. the prover P) to convince a user (i.e. the verifier V) that some data D has been replicated to its own uniquely dedicated physical storage. Our scheme is an interactive protocol, where the prover P: (a) commits to store n distinct replicas (physically independent copies) of some data D, and then (b) convinces the verifier V, that P is indeed storing each of the replicas via a challenge/response protocol. To the best of our knowledge, PoRep improves on PoR and PDP schemes, preventing Sybil Attacks, Outsourcing Attacks, and Generation Attacks. Note. For a formal definition, a description of its properties, and an in-depth study of Proof-of-Replication, we refer the reader to [5].

Definition 3.1. (Proof-of-Replication) A PoRep scheme enables an efficient prover P to convince a verifier V that P is storing a replica R, a physical independent copy of some data D, unique to P. A PoRep protocol is characterized by a tuple of polynomial-time algorithms: (Setup, Prove, Verify)

• PoRep.Setup(1λ , D) → R, SP , SV , where SP and SV are scheme-specific setup variables for P and V, λ is a security parameter. PoRep.Setup is used to generate a replica R, and give P and V the necessary information to run PoRep.Prove and PoRep.Verify. Some schemes may require the prover or interaction with a third party to compute PoRep.Setup. 10 • PoRep.Prove(SP , R, c) → π c , where c is a random challenge issued by a verifier V, and π c is a proof that a prover has access to R a specific replica of D. PoRep.Prove is run by P to produce a π c for V. • PoRep.Verify(SV , c, πc ) → {0, 1}, which checks whether a proof is correct. PoRep.Verify is run by V and convinces V whether P has been storing R. 3.3 Proof-of-Spacetime Proof-of-Storage schemes allow a user to check if a storage provider is storing the outsourced data at the time of the challenge. How can we use PoS schemes to prove that some data was being stored throughout a period of time? A natural answer to this question is to require the user to repeatedly (e.g. every minute) send challenges to the storage provider. However, the communication complexity required in each interaction can be the bottleneck in systems such as Filecoin, where storage providers are required to submit their proofs to the blockchain network. To address this question, we introduce a new proof, Proof-of-Spacetime, where a verifier can check if a prover is storing her/his outsourced data for a range of time. The intuition is to require the prover to (1) generate sequential Proofs-of-Storage (in our case Proof-of-Replication), as a way to determine time (2) recursively compose the executions to generate a short proof. Definition 3.2. (Proof-of-Spacetime) A PoSt scheme enables an efficient prover P to convince a verifier V that P is storing some data D for some time t. A PoSt is characterized by a tuple of polynomial-time algorithms: (Setup, Prove, Verify) • PoSt.Setup(1λ , D) → SP , SV , where SP and SV are scheme-specific setup variables for P and V, λ is a security parameter. PoSt.Setup is used to give P and V the necessary information to run PoSt.Prove and PoSt.Verify. Some schemes may require the prover or interaction with a third party to compute PoSt.Setup. • PoSt.Prove(SP , D, c, t) → π c , where c is a random challenge issued by a verifier V, and π c is a proof that a prover has access to D for some time t. PoSt.Prove is run by P to produce a π c for V. • PoSt.Verify(SV , c, t, πc ) → {0, 1}, which checks whether a proof is correct. PoSt.Verify is run by V and convinces V whether P has been storing D for some time t.

3.4 Practical PoRep and PoSt
We are interested in practical PoRep and PoSt constructions that can be deployed in existing systems and do not rely on trusted parties or hardware. We give a construction for PoRep (see Seal-based Proof-of-Replication in [5]) that requires a very slow sequential computation Seal to be performed during Setup to generate a replica. The protocol sketches for PoRep and PoSt are presented in Figure 4 and the underlying mechanism of the proving step in PoSt is illustrated in Figure 3.

3.4.1 Cryptographic building blocks
Collision-resistant hashing. We use a collision resistant hash function CRH : {0, 1} ∗ → {0, 1} O(λ) . We also use a collision resistant hash function MerkleCRH, which divides a string in multiple parts, construct a binary tree and recursively apply CRH and outputs the root. zk-SNARKs. Our practical implementations of PoRep and PoSt rely on zero-knowledge Succinct Noninteractive ARguments of Knowledge (zk-SNARKs) [6, 7, 8]. Because zk-SNARKs are succinct, proofs are very short and easy to verify. More formally, let L be an NP language and C be a decision circuit for L. A trusted party conducts a one-time setup phase that results in two public keys: a proving key pk and a verification key vk. The proving key pk enables any (untrusted) prover to generate a proof π attesting that 11 x ∈ L for an instance x of her choice. The non-interactive proof π is both zero-knowledge and proof-ofknowledge. Anyone can use the verification key vk to verify the proof π; in particular zk-SNARK proofs are publicly verifiable: anyone can verify π, without interacting with the prover that generated π. The proof π has constant size and can be verified in time that is linear in |x|. A zk-SNARK for circuit satisfiability is a triple of polynomial-time algorithms (KeyGen, Prove, Verify) • KeyGen(1λ , C) → (pk, vk). On input security parameter λ and a circuit C, KeyGen probabilistically samples pk and vk. Both keys are published as public parameters and can be used to prove/verify membership in LC . • Prove(pk, x, w) → π. On input pk and input x and witness for the NP-statement w, the prover Prove outputs a non-interactive proof π for the statement x ∈ LC . • Verify(vk, x, π) → {0, 1}. On input vk, an input x, and a proof π, the verifier Verify outputs 1 if x ∈ LC . We refer the interested reader to [6, 7, 8] for formal presentation and implementation of zk-SNARK systems. Generally these systems require the KeyGen operation to be run by a trusted party; novel work on Scalable Computational Integrity and Privacy (SCIP) systems [9] shows a promising direction to avoid this initial step, hence the above trust assumption.

3.4.2 Seal operation
The role of the Seal operation is to (1) force replicas to be physically independent copies by requiring provers to store a pseudo-random permutation of D unique to their public key, such that committing to store n replicas results in dedicating disk space for n independent replicas (hence n times the storage size of a replica) and (2) to force the generation of the replica during PoRep.Setup to take substantially longer than the time expected for responding to a challenge. For a more formal definition of the Seal operation see [5]. The above operation can be realized with Sealτ AES−256, and τ such that Sealτ AES−256 takes 10-100x longer than the honest challenge-prove-verify sequence. Note that it is important to choose τ such that running Sealτ BC is distinguishably more expensive than running Prove with random access to R

References

[1] Juan Benet. IPFS - Content Addressed, Versioned, P2P File System. 2014.
[2] Giuseppe Ateniese, Randal Burns, Reza Curtmola, Joseph Herring, Lea Kissner, Zachary Peterson, and Dawn Song. Provable data possession at untrusted stores. In Proceedings of the 14th ACM conference on Computer and communications security, pages 598–609. Acm, 2007.
[3] Ari Juels and Burton S Kaliski Jr. Pors: Proofs of retrievability for large files. In Proceedings of the
14th ACM conference on Computer and communications security, pages 584–597. Acm, 2007.
[4] Hovav Shacham and Brent Waters. Compact proofs of retrievability. In International Conference on the Theory and Application of Cryptology and Information Security, pages 90–107. Springer, 2008.
[5] Protocol Labs. Technical Report: Proof-of-Replication. 2017.
[6] Rosario Gennaro, Craig Gentry, Bryan Parno, and Mariana Raykova. Quadratic span programs and succinct nizks without pcps. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 626–645. Springer, 2013.
[7] Nir Bitansky, Alessandro Chiesa, and Yuval Ishai. Succinct non-interactive arguments via linear interactive proofs. Springer, 2013.
[8] Eli Ben-Sasson, Alessandro Chiesa, Daniel Genkin, Eran Tromer, and Madars Virza. Snarks for c: Verifying program executions succinctly and in zero knowledge. In Advances in Cryptology–CRYPTO 2013, pages 90–108. Springer, 2013.
[9] Eli Ben-Sasson, Iddo Bentov, Alessandro Chiesa, Ariel Gabizon, Daniel Genkin, Matan Hamilis, Evgenya Pergament, Michael Riabzev, Mark Silberstein, Eran Tromer, et al. Computational integrity with a public random string from quasi-linear pcps. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 551–579. Springer, 2017. [10] Henning Pagnia and Felix C G¨artner. On the impossibility of fair exchange without a trusted third party. Technical report, Technical Report TUD-BS-1999-02, Darmstadt University of Technology, Department of Computer Science, Darmstadt, Germany, 1999.
[11] Joseph Poon and Thaddeus Dryja. The bitcoin lightning network: Scalable off-chain instant payments. 2015.
[12] Andrew Miller, Iddo Bentov, Ranjit Kumaresan, and Patrick McCorry. Sprites: Payment channels that go faster than lightning. arXiv preprint arXiv:1702.05812, 2017.
[13] Protocol Labs. Technical Report: Power Fault Tolerance. 2017.
[14] Protocol Labs. Technical Report: Expected Consensus. 2017.
[15] Iddo Bentov, Charles Lee, Alex Mizrahi, and Meni Rosenfeld. Proof of activity: Extending bitcoin’s proof of work via proof of stake [extended abstract] y. ACM SIGMETRICS Performance Evaluation Review, 42(3):34–37, 2014.
[16] Iddo Bentov, Rafael Pass, and Elaine Shi. Snow white: Provably secure proofs of stake. 2016.
[17] Silvio Micali. Algorand: The efficient and democratic ledger. arXiv preprint arXiv:1607.01341, 2016.
[18] Vitalik Buterin. Ethereum , April 2014. URL https://ethereum.org/.
[19] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008.
[20] Eli Ben Sasson, Alessandro Chiesa, Christina Garman, Matthew Green, Ian Miers, Eran Tromer, and Madars Virza. Zerocash: Decentralized anonymous payments from bitcoin. In Security and Privacy (SP), 2014 IEEE Symposium on, pages 459–474. IEEE, 2014.