Files
stacks-puppet-node/blockstack_client/backend/drivers/_skel.py

482 lines
19 KiB
Python

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Blockstack
~~~~~
copyright: (c) 2014-2015 by Halfmoon Labs, Inc.
copyright: (c) 2016-2017 by Blockstack.org
This file is part of Blockstack.
Blockstack is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Blockstack is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with Blockstack. If not, see <http://www.gnu.org/licenses/>.
"""
"""
Overview
========
This is a skeleton no-op driver, meant for tutorial purposes.
It will be dynamically imported by blockstack.
At the driver level, Blockstack expects a key/value store. If a user
does a `put`, then the data stored should be readable by any other
user that does a `get` on the same key. Blockstack itself chooses what
the keys are; they are *not* derived from the data.
To see what is expected, consider the following example:
Suppose Alice and Bob both use a Blockstack-powered blogging application.
When Alice writes a new blog post, the blogging application asks
Blockstack to save it. The app gives the blogpost the application-chosen name
"alice_2017-05-30-15:05:30", and passes both the name and data into
Blockstack. Blockstack calls into its storage drivers and saves the
data to each underlying storage service.
When Bob goes to read Alice's blog, his client discovers that the
new blog post is called "alice_2017-05-30-15:05:30". His client
then asks Blockstack to load up the blogpost's contents. Bob's
storage drivers use the name "alice_2017-05-30-15:05:30" to look
up and fetch the blog data from each service.
Later, Alice decides to delete "alice_2017-05-30-15:05:30". When
Bob goes to read Alice's blog, his client again fetches the blogpost
titled "alice_2017-05-30-15:05:30". Since Alice has removed the
data from her storage providers, none of Bob's drivers return the
blogpost data.
Background
==========
Blockstack storage drivers are responsible for implementing
a get/put/delete interface for two logical types of I/O:
mutable data, and immutable data.
Mutable data is data that does NOT touch the underlying blockchain.
Instead, mutable data is signed by a private key derived from
the keypair listed in the user's zone file. Most user data
(profiles, application data stores) follows the mutable data
I/O model, since mutable I/O can happen as fast as the storage
service allows.
Immutable data is data that touches the underlying blockchain.
Each 'put' and 'delete' corresponds to an on-chain transaction
(specificially, a NAME_UPDATE transaction that modifies the user's
zone file). Similarly, each 'get' corresponds to a previously-sent
transaction. Immutable data is appropriate for storing data that
will only be written once, where freshness, integrity, and consistency
are more important than I/O performance (examples include storing
PGP keys, software releases, and certificates).
In practice, most storage drivers can implement the mutable I/O
path and immutable I/O path the same way; the only difference
between the two will be the interfaces. For example, the `disk`
driver simply stores everything to disk, immutable or mutable.
Replication Strategy
====================
Replication in Blockstack is best-effort. On a given `put`, some data may
be successfully replicated to some storage providers, and some data may not.
Blockstack automatically masks any inconsistencies that get introduced
(see Responsibilities below).
Blockstack uses three configuration fields in its config file to
determine how to replicate data.
* blockstack-client.storage_drivers. This is the list of storage drivers
to use to both read and write data. All of these drivers will be attempted
on any `get` or `put`. A `get` or `put` is attempted on each driver in the
order they are listed (but this may change in the future).
* blockstack-client.storage_drivers_required_write. This is the list of
storage drivers that must successfully `put` data in order for a write
to succeed. If even one of them fails, the entire write fails.
* blockstack-client.storage_drivers_local. This is the list of drivers that
keep their data invisible to other clients. For example, the `disk` driver
is listed here by default since writes to disk are invisible to other clients.
In order for `put` to work on mutable data, there must be at least one driver listed in
blockstack-client.storage_drivers_required_write that is NOT listed in
blockstack-client.storage_drivers_local.
There are no long-term plans for creating more sophisticated replication strategies. This
is because more sophisticated strategies can be implemented as "meta drivers" that load
existing drivers as modules, and forward `get` and `put` requests to them according to the
desired strategy.
Access Strategy
===============
It is up to the storage drivers to not only store the data given
to them, but also to store any metadata required to later translate
the app-given name back into the data that was previously stored.
Moreover, once data is stored in Blockstack, *any* user with the
data's name should be able to read it.
Some storage systems make this easy. For example, the `disk` and `s3`
drivers achieve this simply by storing the data under the name given
by the application. Using the example in the Overview section, the
blogpost data for "alice_2017-05-30-15:05:30" can simply be stored as
a file or object with the name "alice_2017-05-30-15:05:30".
This is less easy for storage systems like Dropbox, where the storage
system creates its own URI for each piece of data stored. In these cases,
the driver must build and maintain an index over all of the data stored,
so it can later translate the app-given name (i.e. "alice_2017-05-30-15:05:30")
back into the service-given URI (i.e. "https://www.dropbox.com/s/pa4lugfa8yiuoio/profile.json?dl=1")
on `get`.
For indexing, driver developers are encouraged to use the following methods
from `common.py` to build a co-located index:
* `get_indexed_data()`: loads data from the storage by translating an
app-given name into a service-specific URI.
* `put_indexed_data()`: stores data with a given name into the storage
system, and inserts an entry for it in an index alongside the data.
* `delete_indexed_data()`: removes data with a given name from the storage
system, and updates the co-located index to remove its name-to-URI link.
* `index_setup()`: instantiates an index (callable from the driver's
`storage_init()` method).
* `driver_config()`: sets up callbacks to be used by the indexer code
for loading and storing both data and pages of the index.
Please see the docstrings for each of these methods in the `common.py` file.
Responsibilities
================
Blockstack handles a lot of higher-level storage responsibilities on its
own, so the driver implementer can focus on interfacing with the storage
provider and/or creating the desired replication strategy. The responsibilities
are divided as follows:
* Consistency. Blockstack takes care of writing immutable data
hashes to the zone file, and takes care of maintaining consistency info
for mutable data. Specifically:
* Blockstack guarantees per-key monotonic read consistency
for mutable data (i.e. a `get` on a key returns the same or newer data as
the previous `get` on the same key, but does not guarantee that the `get` returns
the same data written by the last `put` on it).
* A correct driver must guarantee per-key read-your-writes
consistency (i.e. a `put` followed by a `get` on the same key
should return the last-`put` data to the local client).
* It is acceptable to rely on the storage system to enforce consistency.
For example, most cloud storage providers claim to offer per-key sequential
consistency already (i.e. a `put` followed by a `get` on the same key returns the
data stored by the `put` to all clients). However, the driver must mask
weak consistency by the storage provider if the provider cannot offer per-key
read-your-writes consistency.
* Authenticity. Blockstack signs all data before giving it to
the driver. The driver does not need to implement separate
authenticity checks.
* Integrity. Similarly, Blockstack ensures that the data hasn't
been tampered with. No action is required by the driver.
* Data Confidentiality. Blockstack encrypts data before giving it to
the driver, and decrypts it after it loads it. However, Blockstack
does not guarantee that all the data it writes will be encrypted
(i.e. the user or application may specify that it is "public" data).
If this is unacceptable, then the driver may take its own additional
steps to ensure data confidentiality.
* Behavioral Confidentiality. Blockstack does NOT take any action to
hide network-visible access patterns. Without assistance from the driver,
someone watching the network can do timing analysis on the packets
Blockstack sends and receives, and deduce things like the user's network
location and the application being used. If behavior confidentiality is
required, then the driver must take additional steps to implement it.
* Optimizations. Things like write-batching, caching, write-deferrals, and
so on are handled by Blockstack. The driver should operate synchronously on
both gets and puts. Specifically, the driver should NOT attempt to cache
reads, and the driver should NOT return from a put until the data is guaranteed
to be durable.
"""
# You're free to do pretty much anything you want
# in terms of imports, but you should save any stateful
# initialization for the `storage_init()` method below.
import os
import logging
from common import *
from ConfigParser import SafeConfigParser
log = get_logger("blockstack-storage-drivers-skel")
log.setLevel( logging.DEBUG if DEBUG else logging.INFO )
def storage_init(conf, **kwargs):
"""
This method initializes the storage driver.
It may be called multiple times, so if you need idempotency,
you'll need to implement it yourself.
kwargs can include:
* index (True/False): whether or not to instantiate a storage index. This is useful
for systems like Dropbox where you cannot construct a URL to a piece of data, given
the data name (i.e. Dropbox has to do it for you). If you are making a driver for
such a storage system, you should honor this flag by calling `driver_config()` to make
a driver configuration structure for the index, and then call `index_setup()` to create
the index (defined in .common.py).
* force_index (True/False): If True, then the driver should call `index_setup()`
even if the index already exists. THIS SHOULD ERASE THE EXISTING INDEX. If this flag
is given, then this is the desired effect.
Return True on successful initialization
Return False on error.
"""
# path to the CLI's configuration file (where you can stash driver-specific configuration)
config_path = conf['path']
if os.path.exists( config_path ):
parser = SafeConfigParser()
try:
parser.read(config_path)
except Exception, e:
log.exception(e)
return False
# TODO load config here
# TODO do initialization here
# example of driver_config() and index_setup:
#
# dvconf = driver_config(
# "name of your driver",
# "path to the config file (i.e. conf['path'])"
# callable to load a chunk of data via this driver (takes driver config and chunk ID as arguments and returns the data),
# callable to store a chunk of data via this driver (takes the driver config, chunk ID, and chunk data and returns the URL),
# callable to delete a chunk of data via this driver (takes the driver config and chunk ID and returns True/False),
# driver_info={a dict of driver-specific information, like API keys},
# index_stem="the prefix for all index-related metadata, like "/blockstack/index' or similar",
# compress=True/False
# )
#
# index_setup(dvconf, force=force_index)
return True
def handles_url( url ):
"""
Does this storage driver handle this kind of URL?
It is okay if other drivers say that they can handle it.
This is used by the storage system to quickly filter out
drivers that don't handle this type of URL.
A common strategy is simply to check if the scheme
matches what your driver does. Another common strategy
is to check if the URL matches a particular regex.
"""
return False
def make_mutable_url( data_id ):
"""
This method creates a URL, given an (opaque) data ID string.
The data ID string will be printable, but it is not guaranteed to
be globally unqiue. It is opaque--do not assume anything about its
structure.
The URL does not need to contain the data ID or even be specific to it.
It just needs to contain enough information that it can be used by
get_mutable_handler() below.
This method may be called more than once per data_id, and may be called
independently of get_mutable_handler() below (which consumes this URL).
Returns a string
"""
return None
def get_immutable_handler( data_hash, **kw ):
"""
Given a cryptographic hash of some data, go and fetch it.
This is used by the immutable data API, whereby users can
add and remove data hashes in their zone file (hence the term
"immutable"). The method that puts data for this method
to fetch is put_immutable_handler(), described below.
Drivers are encouraged but not required to implement this method.
A common strategy is to treat the data_hash like the data_id in
make_mutable_url().
**kw contains hints from Blockstack about the nature of the request.
Including:
* fqu (string): the fully-qualified username (i.e. the blockchain ID)
Returns the data on success. It must hash to data_hash (sha256)
Returns None on error. Does not raise an exception.
"""
return None
def get_mutable_handler( url, **kw ):
"""
Given the URL to some data generated by an earlier call to
make_mutable_url().
**kw contains hints from Blockstack about the nature of the request.
Including:
* fqu (string): the fully-qualified username (i.e. the blockchain ID)
Drivers are encouraged but not required to implement this method.
Returns the data on success. The driver is not expected to e.g. verify
its authenticity (Blockstack will take care of this).
Return None on error. Does not raise an exception.
"""
return None
def put_immutable_handler( data_hash, data_txt, txid, **kw ):
"""
Store data that was written by the immutable data API.
That is, the user updated their zone file and added a data
hash to it. This method is given the data's hash (sha256),
the data itself (as a string), and the transaction ID in the underlying
blockchain (i.e. as "proof-of-payment").
The driver should store the data in such a way that a
subsequent call to get_immutable_handler() with the same
data hash returns the given data here.
**kw contains hints from Blockstack about the nature of the request.
Including:
* fqu (string): the fully-qualified username (i.e. the blockchain ID)
* zonefile (True/False): whether or not this is a zone file hash
Drivers are encouraged but not required to implement this method.
Read-only data sources like HTTP servers would not implement this
method, for example.
Returns True on successful storage
Returns False on failure. Does not raise an exception
"""
return False
def put_mutable_handler( data_id, data_txt, **kw ):
"""
Store (signed) data to this storage provider. The only requirement
is that a call to get_mutable_url(data_id) must generate a URL that
can be fed into get_mutable_handler() to get the data back. That is,
the overall flow will be:
# store data
rc = put_mutable_handler( data_id, data_txt, **kw )
if not rc:
# error path...
# ... some time later ...
# get the data back
data_url = get_mutable_url( data_id )
assert data_url
data_txt_2 = get_mutable_handler( data_url, **kw )
if data_txt_2 is None:
# error path...
assert data_txt == data_txt_2
The data_txt argument is the data itself (as a string).
**kw contains hints from the Blockstack implementation.
Including:
* fqu (string): the fully-qualified username (i.e. the blockchain ID)
* zonefile (True/False): whether or not this is a zone file being stored
* profile (True/False): whether or not this is a profile being stored
Returns True on successful store
Returns False on error. Does not raise an exception
"""
return False
def delete_immutable_handler( data_hash, txid, tombstone, **kw ):
"""
Delete immutable data. Called when the user removed a datum's hash
from their zone file, and the driver must now go and remove the data
from the storage provider.
The driver is given the hash of the data (data_hash) and the underlying
blockchain transaction ID (txid).
The tombstone argument is used to prove to the driver that
the request to delete data corresponds to an earlier request to store data.
sig_data_txid is the signature over the string
"delete:{}{}".format(data_hash, txid). The user's data private key is
used to generate the signature. Most driver implementations
can ignore this, but some storage systems with weak consistency
guarantees may find it useful in order to NACK outstanding
writes.
You can use blockstack_client.storage.parse_data_tombstone() to parse a tombstone.
**kw are hints from Blockstack to the driver.
Including:
* fqu (string): the fully-qualified username (i.e. the blockchain ID)
Returns True on successful deletion
Returns False on failure. Does not raise an exception.
"""
return False
def delete_mutable_handler( data_id, tombstone, **kw ):
"""
Delete mutable data. Called when user requested that some data
stored earlier with put_mutable_handler() be deleted.
The tombstone argument is used to prove to the driver and
underlying storage system that the
request to delete the data corresponds to an earlier request
to store it. It is the signature over the string
"delete:{}".format(data_id). Most driver implementations can
ignore this; it's meant for use with storage systems with
weak consistency guarantees.
You can use blockstack_client.storage.parse_data_tombstone() to parse a tombstone.
**kw are hints from Blockstack to the driver.
Including:
* fqu (string): the fully-qualified username (i.e. the blockchain ID)
Returns True on successful deletion
Returns False on failure. Does not raise an exception.
"""
return False
if __name__ == "__main__":
"""
Unit tests would go here.
"""
pass