diff --git a/sip/sip-000-stacks-improvement-proposal-process.md b/sip/sip-000-stacks-improvement-proposal-process.md index baf193f9c..987c9ca6c 100644 --- a/sip/sip-000-stacks-improvement-proposal-process.md +++ b/sip/sip-000-stacks-improvement-proposal-process.md @@ -1,848 +1,5 @@ # SIP-000 Stacks Improvement Proposal Process -# Preamble +This document formerly contained SIP-000 before the Stacks 2.0 mainnet launched. -Title: Stacks Improvement Proposal Process - -Author: Ken Liao , Jude Nelson - -Status: Draft - -Consideration: Governance - -Type: Meta - -Created: 2020-06-23 - -License: BSD-2-Clause - -Sign-off: - -# Abstract - -A Stacks Improvement Proposal (SIP) is a design document that provides -information to the greater Stacks ecosystem's participants concerning the design -of the Stacks blockchain and its ongoing operation. Each SIP shall provide a -clear and concise description of features, processes, and/or standards for the -Stacks blockchain and its operators to adopt, with sufficient details provided -such that a reasonable practitioner may use the document to create an -independent but compatible implementation of the proposed improvement. - -SIPs are the canonical medium by which new features are proposed and described, -and by which input from the Stacks ecosystem participants is collected. The SIP -Ratification Process is also described in this document, and provides the means -by which SIPs may be proposed, vetted, edited, accepted, rejected, implemented, -and finally incorporated into the Stacks blockchain's design, governance, and -operational procedures. The set of SIPs that have been ratified shall -sufficiently describe the design, governance, and operationalization of the -Stacks blockchain, as well as the means by which future changes to its official -design, implementation, operation, and governance may be incorporated. - -# License and Copyright - -This SIP is made available under the terms of the BSD-2-Clause license, -available at https://opensource.org/licenses/BSD-2-Clause. This SIP’s copyright -is held by the Stacks Open Internet Foundation. - -# Specification - -Each SIP shall adhere to the same general formatting and shall be ratified -through the processes described by this document. - -## Introduction - -Blockchains are unique among distributed systems in that they also -happen to encode a social contract. By running a blockchain node, a user -implicitly agrees to be bound to the social contract's terms embedded within the -blockchain's software. These social contracts are elaborate constructions that -contain not only technical terms (e.g. "a block may be at most 1MB"), but also -economic terms (e.g. "only 21 million tokens may exist") and social terms (e.g. -"no money can leave this account" or "this transaction type was supported -before, but will now be ignored by the system") which the user agrees to uphold -by running a blockchain node. - -It stands to reason that the Stacks blockchain is made of more than just -software; it is also made of the people who run it. As such, the act of -developing and managing the Stacks blockchain network includes the act of -helping its people coordinate and agree on what the blockchain is and what it -should do. To this end, this document proposes a process by which the Stacks -blockchain's users can conduct themselves to be the stewards of the blockchain -network in perpetuity. - -The goals of this process are to ensure that anyone may submit a SIP in good -faith, that each SIP will receive fair and speedy good-faith consideration by -other people with the relevant expertise, and that any discussions and -decision-making on each SIP's ratification shall happen in public. To achieve -these ends, this document proposes a standard way of presenting a Stacks -Improvement Proposal (SIP), and a standard way of ratifying one. - -Each SIP document contains all of the information needed to propose a -non-trivial change to the way in which the Stacks blockchain operates. This -includes both technical considerations, as well as operational and governance -considerations. This document proposes a formal document structure based on both -request-for-comments (RFC) practices in the Internet Engineering Task Force -(IETF), as well as existing blockchain networks. - -SIPs must be ratified in order to be incorporated into the definition of what -the Stacks blockchain is, what it does, and how it operates. This document -proposes a ratification process based on existing governance processes from -existing open source projects (including Python, Bitcoin, Ethereum, and Zcash), -and makes provisions for creating and staffing various roles that people must -take on to carry out ratification (e.g. committees, editors, working groups and -so on). - -This document uses the word “users” to refer specifically to people who -participate in the greater Stacks ecosystem. This includes, but is not limited -to, people who mine blocks, people who contribute code, people who run nodes, -people who develop applications that rely on the Stacks blockchain, people who -use such applications, people involved in the project governance, and people -involved in operating software deployments. - -## SIP Format - -All SIPs shall be formatted as markdown files. Each section shall be -annotated as a 2nd-level header (e.g. `##`). Subsections may be added with -lower-level headers. - -Each SIP shall contain the following sections, in the given order: - -- _Preamble_. This section shall provide fields useful for categorizing the SIP. - The required fields in all cases shall be: - - _SIP Number_. Each SIP receives a unique number once it has been accepted - for consideration for ratification (see below). This number is assigned to - a SIP; its author does not provide it. - - _Title_. A concise description of the SIP, no more than 20 words long. - - _Author_. A list of names and email addresses of the SIP's author(s). - - _Consideration_. What class of SIP this is (see below). - - _Type_. The SIP track for consideration (see below). - - _Status_. This SIP's point in the SIP workflow (see below). - - _Created_. The ISO 8601 date when this SIP was created. - - _License_. The content license for the SIP (see below for permitted - licenses). - - _Sign-off_. The list of relevant persons and their titles who have worked to - ratify the SIP. This field is not filled in entirely until ratification, - but is incrementally filled in as the SIP progresses through the ratification - process. -- Additional SIP fields, which are sometimes required, include: - - _Layer_. The logical layer of the Stacks blockchain affected. Must be one - - of the following: - - _Consensus (soft fork)_. For backwards-compatible proposals for - transaction-processing. - - _Consensus (hard fork)_. For backwards-incompatible proposals for - transaction-processing. - - _Peer Services_. For proposals to the peer-to-peer network protocol - stack. - - _API/RPC_. For proposals to the Stacks blockchain's official - programmatic interfaces. - - _Traits_. For proposals for new standardized Clarity trait definitions. - - _Applications_. For proposals for standardized application protocols - that interface with the Stacks blockchain. - - _Discussions-To_. A mailing list where ongoing discussion of the SIP takes - place. - - _Comments-Summary_. The comments summary tone. - - _Comments-URI_. A link to the Stacks blockchain wiki for comments. - - _License-Code_. Abbreviation for code under a different license than the SIP - proposal. - - _Post-History_. Dates of posting the SIP to the Stacks mailing list, or a - link to a thread with the mailing list. - - _Requires_. A list of SIPs that must be implemented prior to this SIP. - - _Replaces_. A list of SIPs that this SIP replaces. - - _Superceded-By_. A list of SIPs that replace this SIP. - -- _Abstract_. This section shall provide a high-level summary of the proposed - improvement. It shall not exceed 5000 words. -- _Copyright_. This section shall provide the copyright license that governs the - use of the SIP content. It must be one of the approved set of licenses (see -below). -- _Introduction_. This section shall provide a high-level summary of the - problem(s) that this SIP proposes to solve, as well as a high-level -description of how the proposal solves them. This section shall emphasize its -novel contributions, and briefly describe how they address the problem(s). Any -motivational arguments and example problems and solutions belong in this -section. -- _Specification_. This section shall provide the detailed technical - specification. It may include code snippits, diagrams, performance -evaluations, and other supplemental data to justify particular design decisions. -However, a copy of all external supplemental data (such as links to research -papers) must be included with the SIP, and must be made available under an -approved copyright license. -- _Related Work_. This section shall summarize alternative solutions that address - the same or similar problems, and briefly describe why they are not adequate -solutions. This section may reference alternative solutions in other blockchain -projects, in research papers from academia and industry, other open-source -projects, and so on. This section must be accompanied by a bibliography of -sufficient detail such that someone reading the SIP can find and evaluate the -related works. -- _Backwards Compatibility_. This section shall address any - backwards-incompatiblity concerns that may arise with the implementation of -this SIP, as well as describe (or reference) technical mitigations for breaking -changes. This section may be left blank for non-technical SIPs. -- _Activation_. This section shall describe the timeline, falsifiable criteria, - and process for activating the SIP once it is ratified. This applies to both -technical and non-technical SIPs. This section is used to unambiguously -determine whether or not the SIP has been accepted by the Stacks users once it -has been submitted for ratification (see below). -- _Reference Implementations_. This section shall include one or more references - to one or more production-quality implementations of the SIP, if applicable. -This section is only informative — the SIP ratification process is independent -of any engineering processes (or other processes) that would be followed to -produce implementations. If a particular implementation process is desired, -then a detailed description of the process must be included in the Activation -section. This section may be updated after a SIP is ratified in order to -include an up-to-date listing of any implementations or embodiments of the SIP. - -Additional sections may be included as appropriate. - -### Supplemental Materials - -A SIP may include any supplemental materials as -appropriate (within reason), but all materials must have an open format -unencumbered by legal restrictions. For example, an LibreOffice `.odp` -slide-deck file may be submitted as supplementary material, but not a Keynote -`.key` file. - -When submitting the SIP, supplementary materials must be present within the same -directory, and must be named as `SIP-XXXX-YYY.ext`, where: - -- `XXXX` is the SIP number, -- `YYY` is the serial number of the file, starting with 1, -- `.ext` is the file extension. - -## SIP Types - -The types of SIPs are as follows: - -- _Consensus_. This SIP type means that all Stacks blockchain implementations - would need to adopt this SIP to remain compatible with one another. If this is -the SIP type, then the SIP preamble must have the Layer field set to either -_Consensus (soft fork)_ or _Consensus (hard fork)_. -- _Standard_. This SIP type means that the proposed change affects one or more - implementations, but does not affect network consensus. If this is the SIP -type, then the SIP preamble must have the Layer field set to indicate which -aspect(s) of the Stacks blockchain are affected by the proposal. -- _Operation_. This SIP type means that the proposal concerns the operation of the - Stacks blockchain -- in particular, it concerns node operators and miners. -The difference between this SIP type and the Standard type is that this type -does not change any existing protocols. -- _Meta_. This SIP type means that the proposal concerns the SIP ratification - process. Such a SIP is a proposal to change the way SIPs are handled. -- _Informational_. This is a SIP type that provides useful information, but does - not require any action to be taken on the part of any user. - -New types of SIPs may be created with the ratification of a Meta-type SIP under -the governance consideration (see below). SIP types may not be removed. - -## SIP Considerations - -A SIP's consideration determines the particular steps needed to ratify the SIP -and incorporate it into the Stacks blockchain. Different SIP considerations have -different criteria for ratification. A SIP can have more than one consideration, -since a SIP may need to be vetted by different users with different domains of -expertise. - - -- _Technical_. The SIP is technical in nature, and must be vetted by users with - the relevant technical expertise. -- _Economic_. The SIP concerns the blockchain's token economics. This not only - includes the STX token, but also any on-chain tokens created within smart -contracts. SIPs that are concerned with fundraising methods, grants, bounties, -and so on also belong in this SIP track. -- _Governance_. The SIP concerns the governance of the Stacks blockchain, - including the SIP process. This includes amendments to the SIP Ratification -Process, as well as structural considerations such as the creation (or removal) -of various committees, editorial bodies, and formally recognized special -interest groups. In addition, governance SIPs may propose changes to the way by -which committee members are selected. -- _Ethics_. This SIP concerns the behaviors of office-holders in the SIP - Ratification Process that can affect its widespread adoption. Such SIPs -describe what behaviors shall be deemed acceptable, and which behaviors shall be -considered harmful to this end (including any remediation or loss of privileges -that misbehavior may entail). SIPs that propose formalizations of ethics like -codes of conduct, procedures for conflict resolution, criteria for involvement -in governance, and so on would belong in this SIP consideration. -- _Diversity_. This SIP concerns proposals to grow the set of users, with an - emphasis on including users who are traditionally not involved with -open-source software projects. SIPs that are concerned with evangelism, -advertising, outreach, and so on must have this consideration. - -Each SIP consideration shall have a dedicated Advisory Board that ultimately -vets SIPs under their consideration for possible ratification in a timely -fashion (see below). New considerations may be created via the ratification of -a Meta-type SIP under the governance consideration. - -## SIP Workflow - -As a SIP is considered for ratification, it passes through multiple statuses as -determined by one or more committees (see next section). A SIP may have exactly -one of the following statuses at any given time: - -- _Draft_. The SIP is still being prepared for formal submission. It does not yet - have a SIP number. -- _Accepted_. The SIP text is sufficiently complete that it constitutes a - well-formed SIP, and is of sufficient quality that it may be considered for -ratification. A SIP receives a SIP number when it is moved into the Accepted -state by SIP Editors. -- _Recommended_. The people responsible for vetting the SIPs under the - consideration(s) in which they have expertise have agreed that this SIP should -be implemented. A SIP must be Accepted before it can be Recommended. -- _Activation-In-Progress_. The SIP has been tentatively approved by the Steering - Committee for ratification. However, not all of the criteria for ratification -have been met according to the SIP’s Activation section. For example, the -Activation section might require miners to vote on activating the SIPs’ -implementations, which would occur after the SIP has been transferred into -Activation-In-Progress status but before it is transferred to Ratified status. -- _Ratified._ The SIP has been activated according to the procedures described in - its Activation section. Once ratified, a SIP remains ratified in perpetuity, -but a subsequent SIP may supersede it. If the SIP is a Consensus-type SIP, and -then all Stacks blockchain implementations must implement it. A SIP must be -Recommended before it can be Ratified. Moving a SIP into this state may be done -retroactively, once the SIP has been activated according to the terms in its -Activation section. -- _Rejected_. The SIP does not meet at least one of the criteria for ratification - in its current form. A SIP can become Rejected from any state, except -Ratified. If a SIP is moved to the Rejected state, then it may be re-submitted -as a Draft. -- _Obsolete_. The SIP is deprecated, but its candidacy for ratification has not - been officially withdrawn (e.g. it may warrant further discussion). An -Obsolete SIP may not be ratified, and will ultimately be Withdrawn. -- _Replaced_. The SIP has been superseded by a different SIP. Its preamble must - have a Superseded-By field. A Replaced SIP may not be ratified, nor may it be -re-submitted as a Draft-status SIP. It must be transitioned to a Withdrawn -state once the SIP(s) that replace it have been processed. -- _Withdrawn_. The SIP's authors have ceased working on the SIP. A Withdrawn SIP - may not be ratified, and may not be re-submitted as a Draft. It must be -re-assigned a SIP number if taken up again. - - -The act of ratifying a SIP is the act of transitioning it to the Ratified status --- that is, moving it from Draft to Accepted, from Accepted to Recommended, and -Recommended to Activation-In-Progress, and from Activation-In-Progress to -Ratified, all without the SIP being transitioned to Rejected, Obsolete, -Replaced, or Withdrawn status. A SIP's current status is recorded in its Status -field in its preamble. - -## SIP Committees - -The act of deciding the status of a SIP is handled by a set of designated -committees. These committees are composed of users who dedicate their time and -expertise to curate the blockchain, ratifying SIPs on behalf of the rest of the -ecosystem’s users. - -There are three types of committee: - -- _Steering Committee (SC)_. The roles of the SC are to select Recommended-status - SIPs to be activated, to determine whether or not a SIP has been activated and -thus ratified, and to formally recognize Consideration Advisory Boards (see -below). -- _Consideration Advisory Boards_. The roles of the Consideration Advisory Boards - are to provide expert feedback on SIPs that have been moved to Accepted status -in a timely manner, and to transition SIPs to Recommended status if they meet -the Board's consideration criteria, and Rejected status otherwise. -- _SIP Editors_. The role of the SIP Editors is to identify SIPs in the Draft - status that can be transitioned to Accepted status. A SIP editor must be able -to vet a SIP to ensure that it is well-formed, that it follows the ratification -workflow faithfully, and that it does not overlap with any already-Accepted SIPs -or SIPs that have since become Recommended or Ratified. - -Any user may serve on a committee. However, all Stacks committee members must -abide by the SIP Code of Conduct and must have a history of adhering to it. -Failure to adhere to the Code of Conduct shall be grounds for immediate removal -from a committee, and a prohibition against serving on any future committees. - -### Compensation - -Compensation for carrying out committee duties is outside of the scope of this -document. This document does not create a provision for compensation for -committee participation, but it does not forbid it either. - -### Steering Committee Duties - -The Steering Committee's overarching duty is to oversee the evolution of the -Stacks blockchain’s design, operation, and governance, in a way that is -technically sound and feasible, according to the rules and procedures described -in this document. The SC shall be guided by and held accountable by the greater -community of users, and shall make all decisions with the advice of the relevant -Consideration Advisory Boards. - -The SC’s role is that of a steward. The SC shall select SIPs for ratification -based on how well they serve the greater good of the Stacks users. Given the -nature of blockchains, the SC's particular responsibilities pertaining to -upgrading the blockchain network are meant to ensure that upgrades happen in a -backwards-compatible fashion if at all possible. While this means that more -radical SIPs may be rejected or may spend a long amount of time in Recommended -status, it also minimizes the chances of an upgrade leading to widespread -disruption (the minimization of which itself serves the greater good). - -#### Membership - -The initial Steering Committee shall be comprised of at least three members: -two from the Stacks Open Internet Foundation, and one -from the greater Stacks blockchain community (independent of the Stacks -Foundation). - -A provisional Steering Committee will be appointed by the Stacks Open Internet Foundation Board -before the launch of the Stacks blockchain’s mainnet (see the "Activation" section). -Once this SIP activates, the Stacks Open Internet Foundation shall select its -representatives in a manner of their choosing within 90 days after activation. -The committee may be expanded later to include more seats. Once this SIP -activates, the provisional SC will consult with the community to -ratify a SIP that implements a voting procedure whereby -Stacks community members can select the individual who will serve on the -community SC seat. - -#### Qualifications - -Members of this committee must have deep domain expertise -pertinent to blockchain development, and must have excellent written -communication skills. It is highly recommended that members should have authored -at least one ratified technical-consideration SIP before joining this committee. - -#### Responsibilities - -The Steering Committee shall be responsible for the following -tasks. - -##### Recognizing Consideration Advisory Boards. - -The members of the Steering Committee -must bear in mind that they are not infallible, and that they do not know everything -there is to know about what is best for the broader user community. To the -greatest extent practical, the SC shall create and foster the development of -Consideration Advisory Boards in order make informed decisions on subjects that -in which they may not be experts. - -Any group of users can form an unofficial working group to help provide feedback -to SIPs, but the SC shall have the power to recognize such groups formally as a -Consideration Advisory Board via at least a two-thirds majority vote. The SC -shall simultaneously recognize one of it’s member to serve as the interim -chairperson while the Advisory Board forms. A SC member cannot normally serve on -a Consideration Advisory Board concurrently with serving on the SC, unless -granted a limited exception by a unanimous vote by the SC (e.g. in order to -address the Board’s business while a suitable chairperson is found). Formally -recognizing Consideration Advisory Boards shall occur in Public Meetings (see -below) no more than once per quarter. - -Once recognized, Consideration Advisory Boards may not be dissolved or -dismissed, unless there are no Accepted or Recommended SIPs that request their -consideration. If this is the case, then the SC may vote to rescind recognition -of a Consideration Advisory Board with a two-thirds majority at one of its -Public Meetings. - -In order to identify users who would form a Consideration Advisory Board, users -should organize into an unofficial working group and submit a SIP to petition -that SC recognize the working group as a Consideration Advisory Board. This -petition must take the form of a Meta-type SIP, and may be used to select the -initial chairperson and define the Board's domain(s) of expertise, bylaws, -membership, meeting procedures, communication channels, and so on, independent -of the SC. The SC would only be able to ratify or reject the SIP. - -The SC shall maintain a public index of all Consideration Advisory Boards that -are active, including contact information for the Board and a summary of what -kinds of expertise the Board can offer. This index is meant to be used by SIP -authors to help route their SIPs towards the appropriate reviewers before being -taken up by the SC. - -##### Voting on Technical SIPs - -The Steering Committee shall select Recommended SIPs -for ratification by moving them to Activation-In-Progress status. All -technical-consideration SIPs shall require an 80% vote. If it is a -Consensus-type SIP for a hard fork, then a unanimous vote shall be required. If -a SIP is voted on and is not moved to Activation-in-Progress, then it shall be -moved to Rejected status, and the SC shall provide a detailed explanation as to -why they made their decision (see below). - -##### Voting on Non-technical SIPs - -Not all SIPs are technical in nature. All -non-technical SIPs shall require only a two-thirds majority vote to transition -it to Activation-In-Progress status. The SC members must provide a public -explanation for the way it voted as supplementary materials with the ratified -non-technical SIP (see below). If the SC votes to move a non-technical SIP to -Activation-In-Progress status, but does not receive the requisite number of -votes, then the SIP shall be transferred to Rejected status, and the SC shall -provide a detailed explanation as to why they made their decision (see below). - -##### Overseeing SIP Activation and Ratification - -Once a SIP is in Activation-In-Progress status, -the SC shall be responsible for overseeing the procedures and criteria in the -SIP’s Activation section. The Activation section of a SIP can be thought of as -an “instruction manual” and/or “checklist” for the SC to follow to determine if -the SIP has been accepted by the Stacks users. The SC shall strictly adhere to -the process set forth in the Activation section. If the procedure and/or -criteria of the Activation section cannot be met, then the SC may transfer the -SIP to Rejected status and ask the authors to re-submit the SIP with an updated -Activation section. - -Once all criteria have been unambiguously meet and all activation procedures -have been followed, the SC shall transition the SIP to Ratified status. The SC -shall keep a log and provide a record of the steps they took in following a -SIP’s Activation section once the SIP is in Activation-In-Progress status, and -publish them alongside the Ratified SIP as supplemental material. - -Due to the hands-on nature of the Activation section, the SC may deem it -appropriate to reject a SIP solely on the quality of its Activation section. -Reasonable grounds for rejection include, but are not limited to, ambiguous -instructions, insufficiently-informative activation criteria, too much work on -the SC members’ parts, the lack of a prescribed activation timeout, and so on. - -Before the Stacks mainnet launches, the SC shall ratify a SIP that, when -activated according to the procedures outlined in its Activation section, will -allow Stacks blockchain miners to signal their preferences for the activation of -particular SIPs within the blocks that they mine. This will enable the greater -Stacks community of users to have the final say as to which SIPs activate and -become ratified. - -##### Feedback on Recommended SIPs - -The Steering Committee shall give a full, fair, -public, and timely evaluation to each SIP transitioned to Recommended status by -Consideration Advisory Boards. A SIP shall only be considered by the SC if the -Consideration Advisory Board chairpeople for each of the SIP's considerations -have signed-off on the SIP (by indicating as such on the SIP's preamble). - -The SC may transition a SIP to Rejected status if it disagrees with the -Consideration Advisory Boards' recommendation. The SC may transition a SIP to -Obsolete status if it finds that the SIP no longer addresses a relevant concern. -It may transition the SIP to a Replaced status if it considers a similar, -alternative SIP that is more likely to succeed. In all cases, the SC shall -ensure that a SIP does not remain in Recommended status for an unreasonable -amount of time. - -The SC shall maintain a public record of all feedbacks provided for each SIP it -reviews. - -If a SIP is moved to Rejected, Obsolete, or Replaced status, the SIP authors may -appeal the process by re-submitting it in Draft status once the feedback has -been addressed. The appealed SIP must cite the SC’s feedback as supplemental -material, so that SIP Editors and Consideration Advisory Boards are able to -verify that the feedback has, in fact, been addressed. - -##### Public Meetings - -The Steering Committee shall hold and record regular public -meetings at least once per month. The SC may decide the items of business for -these meetings at its sole discretion, but it shall prioritize business -pertaining to the ratification of SIPs, the recognition of Consideration -Advisory Boards, and the needs of all outstanding committees. That said, any -user may join these meetings as an observer, and the SC shall make a good-faith -effort to address public comments from observers as time permits. - -The SC shall appoint up to two dedicated moderators from the user community for -its engineering meetings, who shall vet questions and commentary from observers -in advance (possibly well before the meeting begins). If there is more than one -moderator, then the moderators may take turns. In addition, the SC shall appoint -a dedicated note-taker to record the minutes of the meetings. All of these -appointees shall be eligible to receive a fixed, regular bounty for their work. - -### Consideration Advisory Board Duties - -There is an Advisory Board for each SIP consideration, with a designated -chairperson responsible for maintaining copies of all discussion and feedback on -the SIPs under consideration. - -#### Membership - -All Consideration Advisory Boards begin their life as unofficial -working groups of users who wish to review inbound SIPs according to their -collective expertise. If they wish to be recognized as an official -Consideration Advisory Board, they shall submit a SIP to the Steering Committee -per the procedure described in the Steering Committee’s duties. Each -Consideration Advisory Board shall be formally created by the SC with a -designated member serving as its first interim chairperson. After this, the -Consideration Advisory Board may adopt its own bylaws for selecting members and -chairpeople. However, members should have domain expertise relevant to the -consideration. - -#### Members - -shall serve on their respective Consideration Advisory Boards so long as -they are in good standing with the SIP Code of Conduct and in accordance to the -individual Board’s bylaws. A user may serve on at most three Consideration -Advisory Boards concurrently. - -#### Qualifications - -Each Consideration Advisory Board member shall have sufficient -domain expertise to provide the Steering Committee with feedback pertaining to a -SIP's consideration. Members shall possess excellent written communication -skills. - -#### Responsibilities - -Each Consideration Advisory Board shall be responsible for the -following. - -##### Chairperson - -Each Consideration Advisory Board shall appoint a chairperson, who -shall serve as the point of contact between the rest of the Board and the -Steering Committee. If the chairperson becomes unresponsive, the SC may ask the -Board to appoint a new chairperson (alternatively, the Board may appoint a new -chairperson on its own and inform the SC). The chairperson shall be responsible -for maintaining the Board’s public list of members’ names and contact -information as a supplementary document to the SIP that the SC ratified to -recognize the Board. - -##### Consideration Track - -Each Consideration Advisory Board shall provide a clear and -concise description of what expertise it can offer, so that SIP authors may -solicit it with confidence that it will be helpful. The chairperson shall make -this description available to the Steering Committee and to the SIP Editors, so -that both committees can help SIP authors ensure that they receive the most -appropriate feedback. - -The description shall be provided and updated by the chairperson to the SC so -that the SC can provide a public index of all considerations a SIP may possess. - -##### Feedback - -to SIP Authors Each Consideration Advisory Board shall provide a full, -fair, public, and timely evaluation of any Accepted-status SIP that lists the -Board's consideration in its preamble. The Board may decide to move each SIP to -a Recommended status or a Rejected status based on whether or not the Board -believes that the SIP is feasible, practical, and beneficial to the greater -Stacks ecosystem. - -Any feedback created shall be made public. It is the responsibility of the Board -to store and publish all feedbacks for the SIPs it reviews. It shall forward -copies of this feedback to both the SIP authors. - -##### Consultation with the Steering Committee - -The Steering Committee may need to -follow up with the Consideration Advisory Board in order to clarify its position -or solicit its advice on a particular SIP. For example, the SC may determine -that a Recommended SIP needs to be considered by one or more additional Boards -that have not yet been consulted by the SIP authors. - -The Board shall respond to the SC's request for advice in a timely manner, and -shall prioritize feedback on SIPs that are under consideration for ratification. - -### SIP Editor Duties - -By far the largest committee in the SIP process is the SIP Editor Committee. -The SIP Editors are responsible for maintaining the "inbound funnel" for SIPs -from the greater Stacks community. SIP Editors ensure that all inbound SIPs are -well-formed, relevant, and do not duplicate prior work (including rejected -SIPs). - -#### Membership - -Anyone may become a SIP Editor by recommendation from an existing SIP -Editor, subject to the “Recruitment” section below. - -#### Qualifications - -A SIP Editor must demonstrate proficiency in the SIP process and -formatting requirements. A candidate SIP Editor must demonstrate to an existing -SIP Editor that they can independently vet SIPs. - -#### Responsibilities - -SIP Editors are concerned with shepherding SIPs from Draft -status to Accepted status, and for mentoring community members who want to get -involved with the SIP processes (as applicable). - -##### Getting Users Started - -SIP Editors should be open and welcoming towards -enthusiastic users who want to help improve the greater Stacks ecosystem. As -such, SIP Editors should encourage users to submit SIPs if they have good ideas -that may be worth implementing. - -In addition, SIP Editors should respond to public requests for help from -community members who want to submit a SIP. They may point them towards this -document, or towards other supplemental documents and tools to help them get -started. - -##### Feedback - -When a SIP is submitted in Draft status, a SIP Editor that takes the -SIP into consideration should provide fair and full feedback on how to make the -SIP ready for its transition to Accepted status. - -To do this, the SIP Editor should: - -- Verify that the SIP is well-formed according to the criteria in this document -- Verify that the SIP has not been proposed before -- Verify as best that they can that the SIP is original work -- Verify that the SIP is appropriate for its type and consideration -- Recommend additional Considerations if appropriate -- Ensure that the text is clear, concise, and grammatically-correct English -- Ensure that there are appropriate avenues for discussion of the SIP listed in - the preamble. - -The SIP Editor does not need to provide public feedback to the SIP authors, but -should add their name(s) to the Signed-off field in the SIP preamble once the -SIP is ready to be Accepted. - -##### Acceptance - -Once a SIP is moved to Accepted, the SIP Editor shall assign it the -smallest positive number not currently used to identify any other SIP. Once that -number is known, the SIP Editor shall set the SIP's status to Accepted, set the -number, and commit the SIP to the SIP repository in order to make it visible to -other SIP Editors and to the Consideration Advisory Boards. - -##### Recruitment - -Each SIP Editor must list their name and contact information in an -easy-to-find location in the SIP repository, as well list of each SIP Editor -they recommended. In so doing, the SIP Editors shall curate an “invite tree” -that shows which Editors recommended which other Editors. - -A SIP Editor may recommend another user to be a SIP Editor no more than once per -month, and only if they have faithfully moved at least one SIP to Accepted -status in the last quarter. If a SIP Editor does not participate in editing a -SIP for a full year and a day, then they may be removed from the SIP Editor -list. The SC may remove a SIP Editor (and some or all of the users he or she -recommended) if they find that the SIP Editor has violated the SIP Code of -Conduct. - -Newly-Accepted SIPs, new SIP Editor recruitment, and SIP Editor retirement shall -be submitted as pull requests by SIP Editors to the SIP repository. - -## SIP Workflow - -The lifecycle of a SIP is summarized in the flow-chart below: - -``` - ------------------ - | Draft | <-------------------------. Revise and resubmit - ------------------ | - | -------------------- - Submit to SIP Editor -------------> | Rejected | - | -------------------- - | ^ - V | - ------------------ | - | Accepted | -------------------------/ | /--------------------------------. - ------------------ | | - | -------------------- | - Review by Consideration ----------> | Rejected | | - Advisory Board(s) -------------------- | - | ^ | - V | | - ------------------------- | | - | Recommended | -----------------/ | /------------------------------->| - ------------------------- | | - | -------------------- | - Vote by the Steering -----------> | Rejected | | - Committee for activation -------------------- | - | ^ | - V | | - -------------------------- | | - | Activation-in-Progress | -----------------/ | /------------------------------->| - -------------------------- | | - | --------------------- | - All activation ------------------> | Rejected | | - criteria are met | --------------------- ------------------ | - | |----------------------------------> | Obsolete | | - V | --------------------- ------------------ | - ------------------ *---> | Replaced | --------------->|<-----------* - | Ratified | --------------------- | - ------------------ V - ------------------- - | Withdrawn | - ------------------- -``` - -When a SIP is transitioned to Rejected, it is not deleted, but is preserved in -the SIP repository so that it can be referenced as related or prior work by -other SIPs. Once a SIP is Rejected, it may be re-submitted as a Draft at a later -date. SIP Editors may decide how often to re-consider rejected SIPs as an -anti-spam measure, but the Steering Committee and Consideration Advisory Boards -may opt to independently re-consider rejected SIPs at their own discretion. - -## Public Venues for Conducting Business - -The canonical set of SIPs in all state shall be recorded in the same medium that -the canonical copy of this SIP is. Right now, this is in the Github repository -https://github.com/stacksorg/sips, but may be changed before this SIP is -ratified. New SIPs, edits to SIPs, comments on SIPs, and so on shall be -conducted through Github's facilities for the time being. - -In addition, individual committees may set up and use public mailing lists for -conducting business. The Stacks Open Internet Foundation shall provide a means -for doing so. Any discussions on the mailing lists that lead to non-trivial -contributions to SIPs should be referenced by these SIPs as supplemental -material. - -### Github-specific Considerations - -All SIPs shall be submitted as pull requests, and all SIP edits (including status -updates) shall be submitted as pull requests. The SC, or one or more -individuals or entities appointed by the SC, shall be responsible for merging -pull requests to the main branch. - -## SIP Copyright & Licensing - -Each SIP must identify at least one acceptable license in its preamble. Source -code in the SIP can be licensed differently than the text. SIPs whose reference -implementation(s) touch existing reference implementation(s) must use the same -license as the existing implementation(s) in order to be considered. Below is a -list of recommended licenses. - -- BSD-2-Clause: OSI-approved BSD 2-clause license -- BSD-3-Clause: OSI-approved BSD 3-clause license -- CC0-1.0: Creative Commons CC0 1.0 Universal -- GNU-All-Permissive: GNU All-Permissive License -- GPL-2.0+: GNU General Public License (GPL), version 2 or newer -- LGPL-2.1+: GNU Lesser General Public License (LGPL), version 2.1 or newer - -# Related Work - -The governance process proposed in this SIP is inspired by the Python PEP -process [1], the Bitcoin BIP2 process [2], the Ethereum Improvement Proposal [3] -processes, the Zcash governance process [4], and the Debian GNU/Linux -distribution governance process [5]. This SIP describes a governance process -where top-level decision-making power is vested in a committee of elected -representatives, which distinguishes it from Debian (which has a single elected -project leader), Python (which has a benevolent dicator for life), and Bitcoin -and ZCash (which vest all decision ratification power solely in the blockchain -miners). The reason for a top-level steering committee is to ensure that -decision-making power is not vested in a single individual, but also to ensure -that the individuals responsible for decisions are accountable to the community -that elects them (as opposed to only those who have the means to participate -in mining). This SIP differs from Ethereum's governance -process in that the top-level decision-making body (the "Core Devs" in Ethereum, -and the Steering Committee in Stacks) is not only technically proficient to evaluate -SIPs, but also held accountable through an official governance -process. - -[1] https://www.python.org/dev/peps/pep-0001/ - -[2] https://github.com/bitcoin/bips/blob/master/bip-0002.mediawiki - -[3] https://eips.ethereum.org/ - -[4] https://www.zfnd.org/governance/ - -[5] https://debian-handbook.info/browse/stable/sect.debian-internals.html - -# Activation - -This SIP activates once following tasks have been carried out: - -- The provisional Steering Committee must be appointed by the Stacks Open Internet - Foundation Board. -- Mailing lists for the initial committees must be created. -- The initial Consideration Advisory Boards must be formed, if there is interest - in doing so before this SIP activates. -- A public, online SIP repository must be created to hold all non-Draft SIPs, their edit - histories, and their feedbacks. -- A directory of Consideration Advisory Boards must be established (e.g. within - the SIP repository). -- A SIP Code of Conduct should be added as a supplemental document -- The Stacks blockchain mainnet must launch. - -# Reference Implementation - -Not applicable. - -# Frequently Asked Questions - -NOTE: this section will be expanded as necessary before ratification +This SIP is now located in the [stacksgov/sips repository](https://github.com/stacksgov/sips/blob/main/sips/sip-000/sip-000-stacks-improvement-proposal-process.md) as part of the [Stacks Community Governance organization](https://github.com/stacksgov). diff --git a/sip/sip-001-burn-election.md b/sip/sip-001-burn-election.md index 7d022e2f8..19a4aca21 100644 --- a/sip/sip-001-burn-election.md +++ b/sip/sip-001-burn-election.md @@ -1,1024 +1,5 @@ -# SIP 001 Burn Election +# SIP-001 Burn Election -## Preamble +This document formerly contained SIP-001 before the Stacks 2.0 mainnet launched. -Title: Burn Election - -Author: Jude Nelson , Aaron Blankstein - -Status: Draft - -Type: Standard - -Created: 1/1/2019 - -License: BSD 2-Clause - -## Abstract - -This proposal describes a mechanism for single-leader election using -_proof-of-burn_ (PoB). Proof of burn is a mechanism for bootstrapping a new -blockchain on top of an existing blockchain by rendering the tokens unspendable -(i.e. "burning" them). - -Proof of burn is concerned with deciding which Stacks block miner (called _leader_ in this text) is elected for -producing the next block, as well as deciding how to resolve -conflicting transaction histories. The protocol assigns a score to each leader -based on the fraction of tokens it burned, which is used to -(1) probabilistically select the next leader proportional to its normalized score -and to (2) rank conflicting transaction histories by their total number of epochs -to decide which one is the canonical transaction history. - -## Introduction - -Blockstack's first-generation blockchain operates in such a way that each transaction is -in 1-to-1 correspondence with a Bitcoin transaction. The reason for doing this -is to ensure that the difficulty of reorganizing Blockstack's blockchain is just -as hard as reorganizing Bitcoin's blockchain -- a [lesson -learned](https://www.usenix.org/node/196209) from when the system was originally -built on Namecoin. - -This SIP describes the proof-of-burn consensus algorithm in -Blockstack's second-generation blockchain (the _Stacks -blockchain_). The Stacks blockchain makes the following improvements over the -first-generation blockchain: - -### 1. High validation throughput - -The number of Stacks transactions processed is decoupled from the -transaction processing rate of the underlying _burn chain_ (Bitcoin). Before, each Stacks -transaction was coupled to a single Bitcoin transaction. In the Stacks -blockchain, an _entire block_ of Blockstack transactions corresponds to a -Bitcoin transaction. This significantly improves cost/byte ratio for processing -Blockstack transactions, thereby effectively increasing its throughput. - -### 2. Low-latency block inclusion - -Users of the first version of the Stacks blockchain encounter high latencies for -on-chain transactions -- in particular, they must wait for an equivalent -transaction on the burn chain to be included in a block. This can take -minutes to hours. - -The Stacks blockchain adopts a _block streaming_ model whereby each leader can -adaptively select and package transactions into their block as they arrive in the -mempool. This ensures users learn when a transaction is included in a block -on the order of _seconds_. - -### 3. An open leadership set - -The Stacks blockchain uses proof of burn to decide who appends the next -block. The protocol (described in this SIP) ensures that anyone can become -a leader, and no coordination amongst leaders is required to produce a block. -This preserves the open-leadership property from the existing blockchain (where -Blockstack blockchain miners were also Bitcoin miners), but realizes it through -an entirely different mechanism that enables the properties listed here. - -### 4. Participation without mining hardware - -Producing a block in the Stacks blockchain takes negligeable energy on top of -the burn blockchain. Would-be miners append blocks by _burning_ an existing cryptocurrency -by rendering it unspendable. The rate at which the cryptocurrency is destroyed -is what drives block production in the Stacks blockchain. As such, anyone who -can acquire the burn cryptocurrency (e.g. Bitcoin) can participate in mining, -_even if they can only afford a minimal amount_. - -### 5. Fair mining pools - -Related to the point above, it is difficult to participate in block mining in -a proof-of-work blockchain. This is because a would-be miner needs to -lock up a huge initial amount of capital in dedicated mining hardware, -and a miner receives few or no rewards for blocks that are not incorporated -into the main chain. Joining a mining pool is a risky alternative, -because the pool operator can simply abscond with the block reward or -dole out block rewards in an unfair way. - -The Stacks blockchain addresses this problem by providing -a provably-fair way to mine blocks in a pool. To -implement mining pools, users aggregate their individually-small burns -to produce a large burn, which in turn gives them all a -non-negligeable chance to mine a block. The leader election protocol -is aware of these burns, and rewards users proportional to their -contributions _without the need for a pool operator_. This both -lowers the barrier to entry in participating in mining, and removes -the risk of operating in traditional mining pools. - -In addition to helping lots of small burners mine blocks, fair mining pools -are also used to give different block leaders a way to hedge their bets on their -chain tips: they can burn some cryptocurrency to competing chain tips and -receive some of the reward if their preferred chain tip loses. This is -important because it gives frequent leaders a way to reduce the variance of -their block rewards. - -### 6. Ability to migrate to a separate burn chain in the future - -A key lesson learned in the design of the first-generation Stacks blockchain is that -the network must be portable, in order to survive systemic failures such as peer -network collapse, 51% attacks, and merge miners dominating the hash power. -The proof of burn mining system will preserve this feature, -so the underlying burn chain can be "swapped out" at a later date if need be and -ultimately be replaced by a dedicated set of Stacks leaders. - -### Assumptions - -Given the design goals, the Stacks leader election protocol makes the following -assumptions: - -* Deep forks in the burn chain are exponentially rarer as a function - of their length. - -* Deep forks in the burn chain occur for reasons unrelated to the - Stacks protocol execution. That is, miners do not attempt to - manipulate the execution of the Stacks protocol by reorganizing the - burn chain (but to be clear, burn chain miners may participate in the Stacks -chain as well). - -* Burn chain miners do not censor all Stacks transactions (i.e. liveness is - possible), but may censor some of them. In particular, Stacks transactions -on the burn chain will be mined if they pay a sufficiently high transaction -fee. - -* At least 2/3 of the Stacks leader candidates, measured by burn - weight, are correct and honest. If there is a _selfish mining_ coalition, -then we assume that 3/4 of the Stacks leader candidates are honest (measured -again by burn weight) and that honestly-produced Stacks blocks propagate to the honest -coalition at least as quickly as burn blocks (i.e. all honest peers receive the -latest honest Stacks block data within one epoch of it being produced). - -## Protocol overview - -Like existing blockchains, the Stacks blockchain encodes a cryptocurrency and -rules for spending its tokens. Like existing cryptocurrencies, the Stacks -blockchain introduces new tokens into circulation each time a new block is -produced (a _block reward_). This encourages peers to participate in gathering transactions and -creating new blocks. Peers that do so so called _leaders_ of the blocks that -they produce (analogous to "miners" in existing cryptocurrencies). - -Blocks are made of one or more _transactions_, which encode valid state -transitions in each correct peer. Users create and broadcast transactions to the peer -network in order to (among other things) spend the tokens they own. The current -leader packages transactions into a single block during its epoch -- in this -way, a block represents all transactions processed during one epoch in the -Stacks chain. - -Like existing cryptocurrencies, users compete with one another for _space_ -in the underlying blockchain's peers for storing their transactions. This competition is -realized through transaction fees -- users include an extra Stacks token payment in -their transactions to encourage leaders to incorporate their transactions -first. Leaders receive the transaction fees of the transactions they -package into their blocks _in addition_ to the tokens minted by producing them. - -Blocks are produced in the Stacks blockchain in cadence with the underlying burn -chain. Each time the burn chain network produces a block, at most one Stacks block -will be produced. In doing so, the burn chain acts as a decentralized -rate-limiter for creating Stacks blocks, thereby preventing DDoS attacks on its -peer network. Each block discovery on the burn chain triggers a new _epoch_ in the Stacks -blockchain, whereby a new leader is elected to produce the next Stacks block. - -With the exception of a designated _genesis block_, each block in the Stacks -blockchain has exactly one "parent" block. This parent relationship is a -partial ordering of blocks, where concurrent blocks (and their descendents) -are _forks_. - -If the leader produces a block, it must have an already-accepted block as its -parent. A block is "accepted" if it has been successfully -processed by all correct peers and exists in at least one Stacks blockchain fork. -The genesis block is accepted on all forks. - -Unlike most existing blockchains, Stacks blocks are not produced atomically. -Instead, when a leader is elected, the leader may dynamically package -transactions into a sequence of _microblocks_ as they are received from users. -Logically speaking, the leader produces one block; it just does not need to -commit to all the data it will broadcast when its tenure begins. This strategy -was first described in the [Bitcoin-NG](https://www.usenix.org/node/194907) system, -and is used in the Stacks blockchain with some modifications -- in particular, a -leader may commit to _some_ transactions that _must_ be broadcast during its -tenure, and may opportunistically stream additional transactions in microblocks. - -### Novel properties enabled by Proof of Burn - -Each Stacks block is anchored to the burn chain by -way of a cryptographic hash. That is, the burn chain's canonical transaction -history contains the hashes of all Stacks blocks ever produced -- even ones that -were not incorporated into any fork of the Stacks blockchain. Moreover, extra -metadata about the block, such as parent/child linkages, are -are written to the burn chain. This gives the -Stacks blockchain three properties that existing blockchains do not possess: - -* **Global knowledge of time** -- Stacks blockchain peers each perceive the passage of time - consistently by measuring the growth of the underlying burn chain. In -particular, all correct Stacks peers that have same view of the burn chain can -determine the same total order of Stacks blocks and leader epochs. Existing blockchains do not -have this, and their peers do not necessarily agree on the times when blocks were produced. - -* **Global knowledge of blocks** -- Each correct Stacks peer with the same - view of the burn chain will also have the same view of the set of Stacks -blocks that *exist*. Existing blockchains do not have this, but instead -rely on a well-connected peer network to gossip all blocks. - -* **Global knowledge of cumulative work** -- Each correct Stacks peer with the same view - of the burn chain will know how much cumulative cryptocurrency was destroyed -and how long each competing fork is. Existing -blockchains do not have this -- a private fork can coexist with all public -forks and be released at its creators' discression (often with harmful effects -on the peer network). - -The Stacks blockchain leverages these properties to implement three key features: - -* **Mitigate block-withholding attacks**: Like all single-leader blockchains, - the Stacks blockchain allows the existence of multiple blockchain forks. -These can arise whenever a leader is selected but does not produce a block, or -produces a block that is concurrent with another block. The -design of the Stacks blockchain leverages the fact that all _attempts_ to produce -a block are known to all leaders in advance in order to detect and mitigate -block-withholding attacks, including selfish mining. It does not prevent these -attacks, but it makes them easier to detect and offers peers more tools to deal -with them than are available in existing systems. - -* **Ancilliary proofs enhance chain quality**: In the Stacks blockchain, peers can enhance - their preferred fork's chain quality by _contributing_ burnt tokens to their -preferred chain tip. This in turn helps ensure chain liveness -- small-time -participants (e.g. typical users) can help honest leaders that commit to -the "best" chain tip get elected, and punish dishonest -leaders that withhold blocks or build off of other chain tips. -Users leverage this property to construct _fair mining pools_, where users can -collectively generate a proof to select a chain tip to build off of and receive a proportional share of the -block reward without needing to rely on any trusted middlemen to do so. - -* **Ancilliary proofs to hedge bets**: Because anyone can produce a proof of burn -in favor of any chain tip, leaders can hedge their bets on their preferred chain tips by distributing -their proofs across _all_ competing chain tips. Both fair mining pools and generating -proofs over a distribution of chain tips are possible only because all peers have -knowledge of all existing chain tips and the proofs behind them. - -## Leader Election - -The Stacks blockchain makes progress by selecting successive leaders to produce -blocks. It does so by having would-be leaders submit their candidacies -by burning an existing cryptocurrency. -A leader is selected to produce a block based on two things: - -* the amount of cryptocurrency burned and energy expended relative to the other candidates -* an unbiased source of randomness - -A new leader is selected whenever the burn chain produces a new block -- the -arrival of a burn chain block triggers a leader election, and terminates the -current leader's tenure. - -The basic structure for leader election through proof of burn is that -for some Stacks block _N_, the leader is selected via some function of -that leader's total cryptocurrency burnt in a previous block _N'_ on the -underlying burn chain. In such a system, if a candidate _Alice_ wishes to be a leader of a -Stacks block, she issues a burn transaction in the underlying burn -chain which both destroys some cryptocurrency. -The network then uses cryptographic sortition to choose a -leader in a verifiably random process, weighted by the sums of the burn amounts. -The block in which this burn transaction is -broadcasted is known as the "election block" for Stacks block _N_. - -Anyone can submit their candidacy as a leader by issuing a burn transaction on -the underlying burn chain, and have a non-zero chance of being selected by the -network as the leader of a future block. - -### Committing to a chain tip - -The existence of multiple chain tips is a direct consequence of the -single-leader design of the Stacks blockchain. -Because anyone can become a leader, this means that even misbehaving -leaders can be selected. If a leader crashes before it can propagate -its block data, or if it produces an invalid block, then no block will -be appended to the leader's selected chain tip during its epoch. Also, if a -leader is operating off of stale data, then the leader _may_ produce a -block whose parent is not the latest block on the "best" fork, in which case -the "best" fork does not grow during its epoch. These -kinds of failures must be tolerated by the Stacks blockchain. - -A consequence of tolerating these failures is that the Stacks blockchain -may have multiple competing forks; one of which is considered the canonical fork -with the "best" chain tip. However, a well-designed -blockchain encourages leaders to identify the "best" fork and append blocks to -it by requiring them to _irrevocably commit_ to the -chain tip they will build on for their epoch. -This commitment must be tied to an expenditure of some -non-free resource, like energy, storage, bandwidth, or (in this blockchain's case) an -existing cryptocurrency. The intuition is that if the leader does _not_ build on -the "best" fork, then it commits to and loses that resource at a loss. - -Committing to a _chain tip_, but not necessarily new data, -is used to encourage both safety and liveness in other blockchains -today. For example, in Bitcoin, miners search for the inverse of a hash that -contains the current chain tip. If they fail to win that block, they -have wasted that energy. As they must attempt deeper and deeper forks -to initiate a double-spend attacks, producing a competing fork becomes an exponentially-increasing energy -expenditure. The only way for the leader to recoup their losses is for the -fork they work on to be considered by the rest of the network as the "best" fork -(i.e. the one where the tokens they minted are spendable). While this does -not _guarantee_ liveness or safety, penalizing leaders that do not append blocks to the -"best" chain tip while rewarding leaders that do so provides a strong economic -incentive for leaders to build and append new blocks to the "best" fork -(liveness) and to _not_ attempt to build an alternative fork that reverts -previously-committed blocks (safety). - -It is important that the Stacks blockchain offers the same encouragement. -In particular, the ability for leaders to intentionally orphan blocks in order -to initiate double-spend attacks at a profit is an undesirable safety violation, -and leaders that do so must be penalized. This property is enforced by -making a leader announce their chain tip commitment _before they know if their -blocks are included_ -- they can only receive Stacks tokens if the block for -which they submitted a proof of burn is accepted into the "best" fork. - -### Election Protocol - -To encourage safety and liveness when appending to the blockchain, the leader -election protocol requires leaders to burn cryptocurrency and spend energy before they know -whether or not they will be selected. To achieve this, the protocol for electing a leader -runs in three steps. Each leader candidate submits two transactions to the burn chain -- one to register -their public key used for the election, and one to commit to their token burn and chain tip. -Once these transactions confirm, a leader is selected and the leader can -append and propagate block data. - -Block selection is driven by a _verifiable random function_ (VRF). Leaders submit transactions to -register their VRF proving keys, and later attempt to append a block by generating a -VRF proof over their preferred chain tip's _seed_ -- an unbiased random string -the leader learns after their tip's proof is committed. The resulting VRF proof is used to -select the next block through cryptographic sortition, as well as the next seed. - -The protocol is designed such that a leader can observe _only_ the burn-chain -data and determine the set of all Stacks blockchain forks that can plausibly -exist. The on-burn-chain data gives all peers enough data to identify all plausible -chain tips, and to reconstruct the proposed block parent relationships and -block VRF seeds. The on-burn-chain data does _not_ indicate whether or not a block or a seed is -valid, however. - -#### Step 1: Register key - -In the first step of the protocol, each leader candidate registers itself for a -future election by sending a _key transaction_. In this transaction, the leader -commits to the public proving key that will be used by the leader candidate to -generate the next seed for the chain tip they will build off of. - -The key transactions must be sufficiently confirmed on the burn chain -before the leader can commit to a chain tip in the next step. For example, the -leader may need to wait for 10 epochs before it can begin committing to a chain -tip. The exact number will be protocol-defined. - -The key transaction can be used at any time to commit to a chain tip, once -confirmed. This is because the selection of the next block cannot be determined -in advance. However, a key can only be used once. - -#### Step 2: Burn & Commit - -Once a leader's key transaction is confirmed, the leader will be a candidate for election -for a subsequent burn block in which it must send a _commitment transaction_. -This transaction burns the leader's cryptocurrency (proof of burn) -and registers the leader's preferred chain tip and new VRF seed -for selection in the cryptographic sortition. - -This transaction commits to the following information: - -* the amount of cryptocurrency burned to produce the block -* the chain tip that the block will be appended to -* the proving key that will have been used to generate the block's seed -* the new VRF seed if this leader is chosen -* a digest of all transaction data that the leader _promises_ to include in their block (see - "Operation as a leader"). - -The seed value is the cryptographic hash of the chain tip's seed (which is available on the burn chain) -and this block's VRF proof generated with the leader's proving key. The VRF proof -itself is stored in the Stacks block header off-chain, but its hash -- the seed -for the next sortition -- is committed to on-chain. - -The burn chain block that contains the candidates' commitment transaction -serves as the election block for the leader's block (i.e. _N_), and is used to -determine which block commitment "wins." - -#### Step 3: Sortition - -In each election block, there is one election across all candidate leaders (across -all chain tips). The next block is determined with the following algorithm: - -```python -# inputs: -# * BLOCK_HEADER -- the burn chain block header, which contains the PoW nonce -# -# * BURNS -- a mapping from public keys to proof of burn scores and block hashes, -# generated from the valid set of commit & burn transaction pairs. -# -# * PROOFS -- a mapping from public keys to their verified VRF proofs from -# their election transactions. The domains of BURNS and PROOFS -# are identical. -# -# * SEED -- the seed from the previous winning leader -# -# outputs: -# * PUBKEY -- the winning leader public key -# * BLOCK_HASH -- the winning block hash -# * NEW_SEED -- the new public seed - -def make_distribution(BURNS, BLOCK_HEADER): - DISTRIBUTION = [] - BURN_OFFSET = 0 - BURN_ORDER = dict([(hash(PUBKEY + BLOCK_HEADER.nonce), - (PUBKEY, BURN_AMOUNT, BLOCK_HASH)) - for (PUBKEY, (BURN_AMOUNT, BLOCK_HASH)) in BURNS.items()]) - for (_, (PUBKEY, BURN_AMOUNT, BLOCK_HASH)) in sorted(BURN_ORDER.items()): - DISTRIBUTION.append((BURN_OFFSET, PUBKEY, BLOCK_HASH)) - BURN_OFFSET += BURN_AMOUNT - return DISTRIBUTION - -def select_block(SEED, BURNS, PROOFS, BURN_BLOCK_HEADER.nonce): - if len(BURNS) == 0: - return (None, None, hash(BURN_BLOCK_HEADER.nonce+ SEED)) - - DISTRIBUTION = make_distribution(BURNS) - TOTAL_BURNS = sum(BURN_AMOUNT for (_, (BURN_AMOUNT, _)) in BURNS) - SEED_NORM = num(hash(SEED || BURN_BLOCK_HEADER.nonce)) / TOTAL_BURNS - LAST_BURN_OFFSET = -1 - for (INDEX, (BURN_OFFSET, PUBKEY, BLOCK_HASH)) in enumerate(DISTRIBUTION): - if LAST_BURN_OFFSET <= SEED_NORM and SEED_NORM < BURN_OFFSET: - return (PUBKEY, BLOCK_HASH, hash(PROOFS[PUBKEY])) - LAST_BURN_OFFSET = BURN_OFFSET - return (DISTRIBUTION[-1].PUBKEY, DISTRIBUTION[-1].BLOCK_HASH, hash(PROOFS[DISTRIBUTION[-1].PUBKEY])) -``` - -Only one leader will win an election. It is not guaranteed that the block the -leader produces is valid or builds off of the best Stacks fork. However, -once a leader is elected, all peers will know enough information about the -leader's decisions that the block data can be submitted and relayed by any other -peer in the network. Crucially, the winner of the sortition will be apparent to -any peer without each candidate needing to submit their blocks beforehand. - -The distribution is sampled using the _previous VRF seed_ and the _current block -PoW solution_. This ensures that no one -- not even the burn chain miner -- knows -which public key in the proof of burn score distribution will be selected with the PoW seed. - -Leaders can make their burn chain transactions and -construct their blocks however they want. So long as the burn chain transactions -and block are broadcast in the right order, the leader has a chance of winning -the election. This enables the implementation of many different leaders, -such as high-security leaders where all private keys are kept on air-gapped -computers and signed blocks and transactions are generated offline. - -#### On the use of a VRF - -When generating the chain tip commitment transaction, a correct leader will need to obtain the -previous election's _seed_ to produce its proof output. This seed, which is -an unbiased public random value known to all peers (i.e. the hash of the -previous leader's VRF proof), is inputted to each leader candidate's VRF using the private -key it committed to in its registration transaction. The new seed for the next election is -generated from the winning leader's VRF output when run on the parent block's seed -(which itself is an unbiased random value). The VRF proof attests that only the -leader's private key could have generated the output value, and that the value -was deterministically generated from the key. - -The use of a VRF ensures that leader election happens in an unbiased way. -Since the input seed is an unbiased random value that is not known to -leaders before they commit to their public keys, the leaders cannot bias the outcome of the election -by adaptively selecting proving keys. -Since the output value of the VRF is determined only from the previous seed and is -pseudo-random, and since the leader already -committed to the key used to generate it, the leader cannot bias the new -seed value once they learn the current seed. - -Because there is one election per burn chain block, there is one valid seed per -epoch (and it may be a seed from a non-canonical fork's chain tip). However as -long as the winning leader produces a valid block, a new, unbiased seed will be -generated. - -In the event that an election does not occur in an epoch, or the leader -does not produce a valid block, the next seed will be -generated from the hash of the current seed and the epoch's burn chain block header -hash. The reason this is reasonably safe in practice is because the resulting -seed is still unpredictable and impractical (but not infeasible) to bias. This is because the burn chain miners are -racing each other to find a hash collision using a random nonce, and miners who -want to attempt to bias the seed by continuing to search for nonces that both -bias the seed favorably and solve the burn chain block risk losing the mining race against -miners who do not. For example, a burn chain miner would need to wait an -expected two epochs to produce two nonces and have a choice between two seeds. -At the same time, it is unlikely that there will be epochs -without a valid block being produced, because (1) attempting to produce a block -is costly and (2) users can easily form burning pools to advance the -state of the Stacks chain even if the "usual" leaders go offline. - -As an added security measure, the distribution into which the previous epoch's -VRF seed will index will be randomly structured using the VRF seed and the PoW -nonce. This dissuades PoW miners from omitting or including burn transactions -in order to influence where the VRF seed will index into the weight -distribution. Since the PoW miner is not expected to be able -to generate more than one PoW nonce per epoch, the burn chain miners won't know -in advance which leader will be elected. - -## Operation as a leader - -The Stacks blockchain uses a hybrid approach for generating block data: it can -"batch" transactions and it can "stream" them. Batched transactions are -anchored to the commitment transaction, meaning that the leader issues a _leading -commitment_ to these transactions. The leader can only receive the block reward -if _all_ the transactions committed to in the commitment transaction -are propagated during its tenure. The -downside of batching transactions, however, is that it significantly increases latency -for the user -- the user will not know that their committed transactions have been -accepted until the _next_ epoch begins. - -In addition to sending batched transaction data, a Stacks leader can "stream" a -block over the course of its tenure by selecting transactions from the mempool -as they arrive and packaging them into _microblocks_. These microblocks -contain small batches of transactions, which are organized into a hash chain to -encode the order in which they were processed. If a leader produces -microblocks, then the new chain tip the next leader builds off of will be the -_last_ microblock the new leader has seen. - -The advantage of the streaming approach is that a leader's transaction can be -included in a block _during_ the current epoch, reducing latency. -However, unlike the batch model, the streaming approach implements a _trailing commitment_ scheme. -When the next leader's tenure begins, it must select either one of the current leader's -microblocks as the chain tip (it can select any of them), or the current -leader's on-chain transaction batch. In doing so, an epoch change triggers a -"micro-fork" where the last few microblocks of the current leader may be orphaned, -and the transactions they contain remain in the mempool. The Stacks protocol -incentivizes leaders to build off of the last microblock they have seen (see -below). - -The user chooses which commitment scheme a leader should apply for her -transactions. A transaction can be tagged as "batch only," "stream only," or -"try both." An informed user selects which scheme based on whether or not they -value low-latency more than the associated risks. - -To commit to a chain tip, each correct leader candidate first selects the transactions they will -commit to include their blocks as a batch, constructs a Merkle tree from them, and -then commits the Merkle tree root of the batch -and their preferred chain tip (encoded as the hash of the last leader's -microblock header) within the commitment transaction in the election protocol. -Once the transactions are appended to the burn chain, the leaders execute -the third round of the election protocol, and the -sortition algorithm will be run to select which of the candidate leaders will be -able to append to the Stacks blockchain. Once selected, the new leader broadcasts their -transaction batch and then proceeds to stream microblocks. - -### Building off the latest block - -Like existing blockchains, the leader can selet any prior block as its preferred -chain tip. In the Stacks blockchain, this allows leaders to tolerate block loss by building -off of the latest-built ancestor block's parent. - -To encourage leaders to propagate their batched transactions if they are selected, a -commitment to a block on the burn chain is only considered valid if the peer -network has (1) the transaction batch, and (2) the microblocks the leader sent -up to the next leader's chain tip commitment on the same fork. A leader will not receive any compensation -from their block if any block data is missing -- they eventually must propagate the block data -in order for their rewards to materialize (even though this enables selfish -mining; see below). - -The streaming approach requires some additional incentives to -encourage leaders to build off of the latest known chain tip (i.e. the latest -microblock sent by the last leader). In particular, the streaming model enables -the following two safety risks that are not present in the batching approach: - -* A leader who gets elected twice in a row can adaptively orphan its previous - microblocks by building off of its first tenures' chain tip, thereby -double-spending transactions the user may believe are already included. - -* A leader can be bribed during their tenure to omit transactions that are - candidates for streaming. The price of this bribe is much smaller than the -cost to bribe a leader to not send a block, since the leader only stands to lose -the transaction fees for the targeted transaction and all subsequently-mined -transactions instead of the entire block reward. Similarly, a leader can -be bribed to mine off of an earlier microblock chain tip than the last one it has seen -for less than the cost of the block reward. - -To help discourage both self-orphaning and "micro-bribes" to double-spend or -omit specific transactions or trigger longer-than-necessary micro-forks, leaders are -rewarded only 40% of their transaction fees in their block reward (including -those that were batched). They receive -60% of the previous leader's transaction fees. This result was shown in the -Bitcoin-NG paper to be necessary to ensure that honest behavior is the most -profitable behavior in the streaming model. - -The presence of a batching approach is meant to raise the stakes for a briber. -Users who are worried that the next leader could orphan their transactions if -they were in a microblock would instead submit their transactions to be batched. -Then, if a leader selects them into its tenure's batch, the leader would -forfeit the entire block reward if even one of the batched transactions was -missing. This significantly increases the bribe cost to leaders, at the penalty -of higher latency to users. However, for users who need to send -transactions under these circumstances, the wait would be worth it. - -Users are encouraged to use the batching model for "high-value" transactions and -use the streaming model for "low-value" transactions. In both cases, the use -of a high transaction fee makes their transactions more likely to be included in -the next batch or streamed first, which additionally raises the bribe price for -omitting transactions. - -### Leader volume limits - -A leader propagates blocks irrespective of the underlying burn chain's capacity. -This poses a DDoS vulnerability to the network: a high-transaction-volume -leader may swamp the peer network with so many -transactions and microblocks that the rest of the nodes cannot keep up. When the next -epoch begins and a new leader is chosen, it would likely orphan many of the high-volume -leader's microblocks simply because its view of the -chain tip is far behind the high-volume leader's view. This hurts the -network, because it increases the confirmation time of transactions -and may invalidate previously-confirmed transactions. - -To mitigate this, the Stack chain places a limit on the volume of -data a leader can send during its epoch (this places a _de facto_ limit -on the number of transactions in a Stack block). This cap is enforced -by the consensus rules. If a leader exceeds this cap, the block is invalid. - -### Batch transaction latency - -The fact that leaders execute a leading commmitment to batched transactions means that -it takes at least one epoch for a user to know if their transaction was -incorporated into the Stacks blockchain. To get around this, leaders are -encouraged to to supply a public API endpoint that allows a user to query -whether or not their transaction is included in the burn (i.e. the leader's -service would supply a Merkle path to it). A user can use a set of leader -services to deduce which block(s) included their transaction, and calculate the -probability that their transaction will be accepted in the next epoch. -Leaders can announce their API endpoints via the [Blockstack Naming -Service](https://docs.blockstack.org/core/naming/introduction.html). - -The specification for this transaction confirmation API service is the subject -of a future SIP. Users who need low-latency confirmations today and are willing -to risk micro-forks and intentional orphaning can submit their transactions for -streaming. - -## Burning pools - -Proof-of-burn mining is not only concerned with electing leaders, but also concerned with -enhancing chain quality. For this reason, the Stacks chain not -only rewards leaders who build on the "best" fork, but also each peer who -supported the "best" fork by burning cryptocurrency in support of the winning leader. -The leader that commits to the winning chain tip and the peers who also burn for -that leader collectively share in the block's reward, proportional to how much -each one burned. - -### Encouraging honest leaders - -The reason for allowing users to support leader candidates at all is to help -maintain the chain's liveness in the presence of leaders who follow the -protocol correctly, but not honestly. These include leaders who delay -the propagation of blocks and leaders who refuse to mine certain transactions. -By giving users a very low barrier to entry to becoming a leader, and by giving -other users a way to help a known-good leader candidate get selected, the Stacks blockchain -gives users a first-class stake in deciding which transactions to process -as well as incentivizes them to maintain chain liveness in the face of bad -leaders. In other words, leaders stand to make more make money with -the consent of the users. - -Users support their preferred leader by submitting a burn transaction that contains a -proof of burn and references its leader candidate's chain tip commitment. These user-submitted -burns count towards the leader's total score for the election, thereby increasing the chance -that they will be selected (i.e. users submit their transactions alongside the -leader's block commitment). Users who submit proofs for a leader that wins the election -will receive some Stacks tokens alongside the leader (but users whose leaders -are not elected receive no reward). Users are rewarded alongside leaders by -granting them a share of the block's coinbase. - -Allowing users to vote in support of leaders they prefer gives users and leaders -an incentive to cooperate. Leaders can woo users to submit proofs for them by committing -to honest behavior, and users can help prevent dishonest (but more profitable) -leaders from getting elected. Moreover, leaders cannot defraud users who submit -proofs in their support, since users are rewarded by the election protocol itself. - -### Fair mining - -Because all peers see the same sequence of burns in the Stacks blockchain, users -can easily set up distributed mining pools where each user receives a fair share -of the block rewards for all blocks the pool produces. The selection of a -leader within the pool is arbitrary -- as long as _some_ user issues a key -transaction and a commitment transaction, the _other_ users in the pool can -throw their proofs of burn behind a chain tip. Since users who submitted proofs for the winning -block are rewarded by the protocol, there is no need for a pool operator to -distribute rewards. Since all users have global visibility into all outstanding -proofs, there is no need for a pool operator to direct users to work on a -particular block -- users can see for themselves which block(s) are available by -inspecting the on-chain state. - -Users only need to have a way to query what's going into a block when one of the pool -members issues a commitment transaction. This can be done easily for batched -transactions -- the transaction sender can prove that their transaction is -included by submitting a Merkle path from the root to their transaction. For -streamed transactions, leaders have a variety of options for promising users -that they will stream a transaction, but these techniques are beyond the scope of this SIP. - -### Minimizing reward variance - -Leaders compete to elect the next block by burning more cryptocurrency and/or -spending more energy. However, if they lose the election, they lose the cryptocurrency they burned. -This makes for a "high variance" pay-out proposition that puts leaders in a -position where they need to maintain a comfortable cryptocurrency buffer to -stay solvent. - -To reduce the need for such a buffer, making proofs of burn to support competing chain tips -enables leaders to hedge their bets by generating proofs to support _all_ plausible -competing chain tips. Leaders have the option of submitting proofs in support for a -_distribution_ of competing chain tips at a lower cost than committing to many -different chain tips as leaders. This gives them the ability to receive some -reward no matter who wins. This also reduces the barrier to -entry for becoming a leader in the first place. - -### Leader support mechanism - -There are a couple important considerations for the mechanism by which peers -submit proofs for their preferred chain tips. - -* Users and runner-up leaders are rewarded strictly fewer tokens -for committing to a chain tip that does not get selected. This is - important because leaders and users are indistinguishable -on-chain. Leaders should not be able to increase their expected reward by sock-puppeting, -and neither leaders nor users should get an out-sized reward for voting for -invalid blocks or blocks that will never be appended to the canonical fork. - -* It must be cheaper for a leader to submit a single expensive commitment than it is - to submit a cheap commitment and a lot of user-submitted proofs. This is -important because it should not be possible for a leader to profit more from -adaptively increasing their proof submissions in response to other leaders'. - -The first property is enforced by the reward distribution rules (see below), -whereby a proof commitment only receives a reward if its block successfully extended the -"canonical" fork. The second property is given "for free" because the underlying burn chain -assesses each participant a burn chain transaction fee. Users and leaders incur an ever-increasing -cost of trying to adaptively out-vote other leaders by submitting more and more -transactions. Further, peers who want to support a leader candidate must send their burn transactions _in the -same burn chain block_ as the commitment transaction. This limits the degree to -which peers can adaptively out-bid each other to include their -commitments. - -## Reward distribution - -New Stacks tokens come into existence on a fork in an epoch where a leader is -selected, and are granted to the leader if the leader produces a valid block. -However, the Stacks blockchain pools all tokens created and all transaction fees received and -does not distribute them until a large number of epochs (a _lockup period_) has -passed. The tokens cannot be spent until the period passes. - -### Sharing the rewards among winners - -Block rewards (coinbases and transaction fees) are not granted immediately, -but are delayed for a lock-up period. Once the lock-up period passes, -the exact reward distribution is as follows: - -* Coinbases: The coinbase (newly-minted tokens) for a block is rewarded to the leader who - mined the block, as well as to all individuals who submitted proofs-of-burn in -support of it. Each participant (leaders and supporting users) recieves a -portion of the coinbase proportional to the fraction of total tokens destroyed. - -* Batched transactions: The transaction fees for batched transactions are - distributed exclusively to the leader who produced the block, provided that -the block has enough transactions. - - To discourage mining empty blocks, an anchored block must be _F_% "full" for the - leader to receive its transaction fees. A block's "fullness" is measured by - how much transaction-computing capacity the block has consumed (see SIP 006). - Failure to mine a block that is at least _F_% full will be penalized: - if the miner does not fill the block to at least _F_% capacitiy, then the - miner will receive _P * M_ STX instead of the transaction fees, where: - - * _0 < M_ is the minimum allowable transaction fee rate, - * _0 < P < F_ is the fraction of the block that the miner was able to fill. - - Note that _P * M_ is strictly less than the lowest possible sum of the - transaction fees of any _F_%-full block. - - This is in the service of implementing the fee auction strategy described in [1]. - However, unlike in [1], no transaction fee smoothing will take place -- the - leader receives all of the anchored block's transaction fees. - -* Streamed transactions: the transaction fees for streamed transactions are - distributed according to a 60/40 split -- the leader that validated the -transactions is awarded 60% of the transaction fees, and the leader that builds -on top of them is awarded 40%. This ensures that leaders are rewarded for -processing and validating transactions correctly _while also_ incentivizing the -subsequent leader to include them in their block, instead of orphaning them. - -## Recovery from data loss - -Stacks block data can get lost after a leader commits to it. However, the burn -chain will record the chain tip, the batched transactions' hash, and the leader's public -key. This means that all existing forks will be -known to the Stacks peers that share the same view of the burn chain (including -forks made of invalid blocks, and forks that include blocks whose data was lost -forever). - -What this means is that regardless of how the leader operates, its -chain tip commitment strategy needs a way to orphan a fork of any length. In -correct operation, the network recovers from data loss by building an -alternative fork that will eventually become the "best" fork, -thereby recovering from data loss and ensuring that the system continues -to make progress. Even in the absence of malice, the need for this family of strategies -follows directly from a single-leader model, where a peer can crash before -producing a block or fail to propagate a block during its tenure. - -However, there is a downside to this approach: it enables **selfish mining.** A -minority coalition of leaders can statistically gain more Stacks tokens than they are due from -their burns by attempting to build a hidden fork of blocks, and releasing it -once the honest majority comes within one block height difference of the hidden -fork. This orphans the majority fork, causing them to lose their Stacks tokens -and re-build on top of the minority fork. - -### Seflish mining mitigation strategies - -Fortunately, all peers in the Stacks blockchain have global knowledge of state, - time, and block-commit transactions. Intuitively, this gives the Stacks blockchain some novel tools -for dealing with selfish leaders: - -* Since all nodes know about all blocks that have been committed, a selfish leader coalition - cannot hide its attack forks. The honest leader coalition can see -the attack coming, and evidence of the attack will be preserved in the burn -chain for subsequent analysis. This property allows honest leaders -to prepare for and mitigate a pending attack by burning more -cryptocurrency, thereby reducing the fraction -of votes the selfish leaders wield below the point where selfish mining is profitable (subject to network -conditions). - -* Since all nodes have global knowledge of the passage of time, honest leaders - can agree on a total ordering of all chain tip commits. In certain kinds of -selfish mining attacks, this gives honest leaders the ability to identify and reject an attack fork -with over 50% confidence. In particular, honest leaders who have been online long -enough to measure the expected block propagation time would _not_ build on top of -a chain tip whose last _A > 1_ blocks arrived late, even if that chain tip -represents the "best" fork, since this would be the expected behavior of a selfish miner. - -* Since all nodes know about all block commitment transactions, the long tail of small-time participants -(i.e. users who support leaders) can collectively throw their resources behind -known-honest leaders' transactions. This increases the chance that honest leaders will -be elected, thereby increasing the fraction of honest voting power and making it -harder for a selfish leader to get elected. - -* All Stacks nodes relay all blocks that correspond to on-chain commitments, -even if they suspect that they came from the attacker. If an honest leader finds two chain tips of equal -length, it selects at random which chain tip to build off of. This ensures that -the fraction of honest voting that builds on top of the attack fork versus the honest fork -is statistically capped at 50% when they are the same length. - -None of these points _prevent_ selfish mining, but they give honest users and -honest leaders the tools to make selfish mining more difficult to pull off than in -PoW chains. Depending on user activity, they also make economically-motivated -leaders less likely to participate in a selfish miner cartel -- doing so always produces evidence, -which honest leaders and users can act on to reduce or eliminate -their expected rewards. - -Nevertheless, these arguments are only intuitions at this time. A more -rigorous analysis is needed to see exactly how these points affect the -profitibility of selfish mining. Because the Stacks blockchain represents a -clean slate blockchain design, we have an opportunity to consider the past -several years of research into attacks and defenses against block-hiding -attacks. This section will be updated as our understanding evolves. - -## Fork Selection - -Fork selection in the Stacks blockchain requires a metric to determine which -chain, between two candidates, is the "best" chain. For Stacks, **the fork with -the most blocks is the best fork.** That is, the Stacks blockchain measures the -quality of block _N_'s fork by the total amount of _blocks_ which _confirm_ -block _N_. - -Using chain length as the fork choice rule makes it time-consuming for alternative forks to -overtake the "canonical" fork, no matter how many burn tokens the alternative-fork miners have at their disposal. -In order to carry out a deep fork of _K_ blocks, the majority coalition of participants needs to spend -at least _K_ epochs working on the new fork. We consider this acceptable -because it also has the effect of keeping the chain history relatively stable, -and makes it so every participant can observe (and prepare for) any upcoming -forks that would overtake the canonical history. However, a minority -coalition of dishonest leaders can create short-lived forks by continuously -building forks (i.e. in order to selfishly mine), driving up the confirmation -time for transactions in the honest fork. - -This fork choice rule implies a time-based transaction security measurement. A -transaction _K_ blocks in the past will take at least _K_ epochs to reverse. -The expected cost of doing so can be calculated given the total amount of burned -tokens put into producing blocks, and the expected fraction of the -totals controlled by the attacker. Note that the attacker is only guaranteed to -reverse a transaction _K_ blocks back if they consistently control over 50% of the total -amount of tokens burned. - -## Implementation - -The Stacks blockchain leader election protocol will be written in Rust. - -## Bitcoin Wire Formats - -The election process described in this SIP will be implemented for the Stacks blockchain -on top of the Bitcoin blockchain. There are three associated operations, with the following -wire formats: - -### Leader Block Commit - -Leader block commits require at least two Bitcoin outputs. The first output is an `OP_RETURN` -with the following data: - -``` - 0 2 3 35 67 71 73 77 79 80 - |------|--|-------------|---------------|------|------|-----|-----|-----| - magic op block hash new seed parent parent key key burn parent - block txoff block txoff modulus -``` - -Where `op = [` and: - -* `block_hash` is the header block hash of the Stacks anchored block. -* `new_seed` is the next value for the VRF seed -* `parent_block` is the burn block height of this block's parent. -* `parent_txoff` is the vtxindex for this block's parent's block commit. -* `key_block` is the burn block height of the miner's VRF key registration -* `key_txoff` is the vtxindex for this miner's VRF key registration -* `burn_parent_modulus` is the burn block height at which this leader block commit - was created modulo `BURN_COMMITMENT_WINDOW` (=6). That is, if the block commit is - included in the intended burn block then this value should be equal to: - `(commit_burn_height - 1) % 6`. This field is used to link burn commitments from - the same miner together even if a commitment was included in a late burn block. - -The second output is the burn commitment. It must send funds to the canonical burn address. - -The first input of this Bitcoin operation must have the same address as the second output -of the VRF key registration. - -### Leader VRF Key Registrations - -Leader VRF key registrations require at least two Bitcoin outputs. The first output is an `OP_RETURN` -with the following data: - -``` - 0 2 3 23 55 80 - |------|--|---------------|-----------------------|---------------------------| - magic op consensus hash proving public key memo -``` - -Where `op = ^` and: - -* `consensus_hash` is the current consensus hash for the burnchain state of the Stacks blockchain -* `proving_public_key` is the 32-byte public key used in the miner's VRF proof -* `memo` is a field for including a miner memo - -The second output is the address that must be used as an input in any of the miner's block commits. - -### User Support Burns - -User support burns require at least two Bitcoin outputs. The first output is an `OP_RETURN` -with the following data: - -``` - 0 2 3 22 54 74 78 80 - |------|--|---------------|-----------------------|------------------|--------|---------| - magic op consensus hash proving public key block hash 160 key blk key - (truncated by 1) vtxindex -``` - -Where `op = _` and: - -* `consensus_hash` is the current consensus hash for the burnchain state of the Stacks blockchain -* `proving_public_key` is the 32-byte public key used in the miner's VRF proof -* `block_hash_160` is the hash_160 of the Stacks anchored block -* `key_blk` is the burn block height of the VRF key used in the miner's VRF proof -* `key_vtxindex` is the vtxindex of the VRF key used in the miner's VRF proof - -The second output is the burn commitment. It must send funds to the canonical burn address. - -## References - -[1] Basu, Easley, O'Hara, and Sirer. [Towards a Functional Market for Cryptocurrencies.](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3318327) - -## Appendix - -### Definitions - -**Burn chain**: the blockchain whose cryptocurrency is destroyed in burn-mining. - -**Burn transaction**: a transaction on the burn chain that a Stacks miner issues -in order to become a candidate for producing a future block. The transaction -includes the chain tip to append to, and the proof that cryptocurrency was -destroyed. - -**Chain tip**: the location in the blockchain where a new block can be appended. -Every valid block is a valid chain tip, but only one chain tip will correspond -to the canonical transaction history in the blockchain. Miners are encouraged -to append to the canonical transaction history's chain tip when possible. - -**Cryptographic sortition** is the act of selecting the next leader to -produce a block on a blockchain in an unbiased way. The Stacks blockchain uses -a _verifiable random function_ to carry this out. - -**Election block**: the block in the burn chain at which point a leader is -chosen. Each Stacks block corresponds to exactly one election block on the burn -chain. - -**Epoch**: a discrete configuration of the leader and leader candidate state in -the Stacks blockchain. A new epoch begins when a leader is chosen, or the -leader's tenure expires (these are often, but not always, the same event). - -**Fork**: one of a set of divergent transaction histories, one of which is -considered by the blockchain network to be the canonical history -with the "best" chain tip. - -**Fork choice rule**: the programmatic rules for deciding how to rank forks to select -the canonical transaction history. All correct peers that process the same transactions -with the same fork choice rule will agree on the same fork ranks. - -**Leader**: the principal selected to produce the next block in the Stacks -blockchain. The principal is called the block's leader. - -**Reorg** (full: _reorganization_): the act of a blockchain network switching -from one fork to another fork as its collective choice of the canonical -transaction history. From the perspective of an external observer, -such as a wallet, the blockchain appears to have reorganized its transactions. +This SIP is now located in the [stacksgov/sips repository](https://github.com/stacksgov/sips/blob/main/sips/sip-001/sip-001-burn-election.md) as part of the [Stacks Community Governance organization](https://github.com/stacksgov). diff --git a/sip/sip-002-smart-contract-language.md b/sip/sip-002-smart-contract-language.md index 601b0d7d5..26503048b 100644 --- a/sip/sip-002-smart-contract-language.md +++ b/sip/sip-002-smart-contract-language.md @@ -1,747 +1,5 @@ -# Abstract +# SIP-002 Smart Contract Language -In order to support applications which require validation of some -pieces of their logic, we present a smart contracting language for use -with the Stacks blockchain. This smart contracting language can be -used on the Stacks blockchain to support programatic control over -digital assets within the Stacks blockchain (e.g., BNS names, Stacks -tokens, etc.) +This document formerly contained SIP-002 before the Stacks 2.0 mainnet launched. -While application-chains may use any smart-contract language that they -like, this smart contracting language's VM will be a part of -blockstack-core, and, as such, any blockstack-core node will be able to -validate application chains using this smart contracting language with -a simple configuration change. - -This smart contracting language permits static analysis of any legal -smart contract to determine runtime costs. This smart contracting -language is not only Turing-incomplete (a requirement for such static -analysis to be guaranteed successful), but readily permits other kinds -of proofs to be made about the code as well. - -# Design - -A smart contract is composed of two parts: - -1. A data-space, which is a set of tables of data which only the - smart contract may modify -2. A set of functions which operate within the data-space of the - smart contract, though they may call public functions from other smart - contracts. - -Users call smart contracts' public functions by broadcasting a -transaction on the blockchain which invokes the public function. - -This smart contracting language differs from most other smart -contracting languages in two important ways: - -1. The language _is not_ intended to be compiled. The LISP language - described in this document is the specification for correctness. -2. The language _is not_ Turing complete. This allows us to guarantee - that static analysis of programs to determine properties like - runtime cost and data usage can complete successfully. - -## Specifying Contracts - -A smart contract definition is specified in a LISP language with the -following limitations: - -1. Recursion is illegal and there is no `lambda` function. -2. Looping may only be performed via `map`, `filter`, or `fold` -3. The only atomic types are booleans, integers, fixed length - buffers, and principals -4. There is additional support for lists of the atomic types, however - the only variable length lists in the language appear as function - inputs (i.e., there is no support for list operations like append - or join). -5. Variables may only be created via `let` binding and there - is no support for mutating functions like `set`. -6. Defining of constants and functions are allowed for simplifying - code using `define-private` statement. However, these are purely - syntactic. If a definition cannot be inlined, the contract will be - rejected as illegal. These definitions are also _private_, in that - functions defined this way may only be called by other functions - defined in the given smart contract. -7. Functions specified via `define-public` statements are _public_ - functions. -8. Functions specified via `define-read-only` statements are _public_ - functions and perform _no_ state mutations. Any attempts to - modify contract state by these functions or functions called by - these functions will result in an error. - -Public functions return a Response type result. If the function returns -an `ok` type, then the function call is considered valid, and any changes -made to the blockchain state will be materialized. If the function -returns an `err` type, it will be considered invalid, and will have _no -effect_ on the smart contract's state. So if function `foo.A` calls -`bar.B`, and `bar.B` returns an `ok`, but `foo.A` returns an `err`, no -effects from calling `foo.A` materialize--- including effects from -`bar.B`. If, however, `bar.B` returns an `err` and `foo.A` returns an `ok`, -there may be some database effects which are materialized from -`foo.A`, but _no_ effects from calling `bar.B` will materialize. - -Unlike functions created by `define-public`, which may only return -Response types, functions created with `define-read-only` may return -any type. - -## List Operations - -* Lists may be multi-dimensional (i.e., lists may contain other lists), however each - entry of this list must be of the same type. -* `filter` `map` and `fold` functions may only be called with user-defined functions - (i.e., functions defined with `(define-private ...)`, `(define-read-only ...)`, or - `(define-public ...)`) or simple native functions (e.g., `+`, `-`, `not`). -* Functions that return lists of a different size than the input size - (e.g., `(append-item ...)`) take a required _constant_ parameter that indicates - the maximum output size of the function. This is enforced with a runtime check. - -## Inter-Contract Calls - -A smart contract may call functions from other smart contracts using a -`(contract-call?)` function. - -This function returns a Response type result-- the return value of the called smart -contract function. Note that if a called smart contract returns an -`err` type, it is guaranteed to not alter any smart contract state -whatsoever. Of course, any transaction fees paid for the execution -of that function will not be returned. - -We distinguish 2 different types of `contract-call?`: - -* Static dispatch: the callee is a known, invariant contract available -on-chain when the caller contract is being deployed. In this case, the -callee's principal is provided as first argument, followed by the name -of the method and its arguments: - -```scheme -(contract-call? - 'SC3H92H297DX3YDPFHZGH90G8Z4NPH4VE8E83YWAQ.registrar - register-name - name-to-register) -``` - -This approach must always be preferred, when adequate. -It makes static analysis easier, and eliminates the -potential for reentrancy bugs when the contracts are -being published (versus when being used). - -* Dynamic dispatch: the callee is passed as an argument, and typed -as a trait reference (). - -```scheme -(define-public (swap (token-a ) - (amount-a uint) - (owner-a principal) - (token-b ) - (amount-b uint) - (owner-b principal))) - (begin - (unwrap! (contract-call? token-a transfer-from? owner-a owner-b amount-a)) - (unwrap! (contract-call? token-b transfer-from? owner-b owner-a amount-b)))) -``` - -Traits can either be locally defined: - -```scheme -(define-trait can-transfer-tokens ( - (transfer-from? (principal principal uint) (response uint))) -``` - -Or imported from an existing contract: - -```scheme -(use-trait can-transfer-tokens - 'SC3H92H297DX3YDPFHZGH90G8Z4NPH4VE8E83YWAQ.contract-defining-trait.can-transfer-tokens) -``` - -Looking at trait conformance, callee contracts have two different paths. -They can either be "compatible" with a trait by defining methods -matching some of the methods defined in a trait, or explicitely declare -conformance using the `impl-trait` statement: - -```scheme -(impl-trait 'SC3H92H297DX3YDPFHZGH90G8Z4NPH4VE8E83YWAQ.contract-defining-trait.can-transfer-tokens) -``` - -Explicit conformance should be prefered when adequate. -It acts as a safeguard by helping the static analysis system to detect -deviations in method signatures before contract deployment. - -The following limitations are imposed on contract calls: - -1. On static dispatches, callee smart contracts _must_ exist at the time of creation. -2. No cycles may exist in the call graph of a smart contract. This - prevents recursion (and re-entrancy bugs). Such structures can - be detected with static analysis of the call graph, and will be - rejected by the network. -3. `contract-call?` are for inter-contract calls only. Situations - where the caller is also the callee will result in abortion of - the ongoing transaction. - -## Principals and Owner Verification - -The language provides a primitive for checking whether or not the -smart contract transaction was signed by a particular -_principal_. Principals are a specific type in the smart contracting -language which represent a spending entity (roughly equivalent to a -Stacks address). The signature itself is not checked by the smart -contract, but by the VM. A smart contract function can use a globally -defined variable to obtain the current principal: - -```scheme -tx-sender -``` - -The `tx-sender` variable does not change during inter-contract -calls. This means that if a transaction invokes a function in a given -smart contract, that function is able to make calls into other smart -contracts without that variable changing. This enables a wide variety -of applications, but it comes with some dangers for users of smart -contracts. However, as mentioned before, the static analysis -guarantees of our smart contracting language allow clients to know a -priori which functions a given smart contract will ever call. - -Another global variable, `contract-caller`, _does_ change during -inter-contract calls. In particular, `contract-caller` is the contract -principal corresponding to the most recent invocation of `contract-call?`. -In the case of a "top-level" invocation, this variable is equal to `tx-sender`. - -Assets in the smart contracting language and blockchain are -"owned" by objects of the principal type, meaning that any object of -the principal type may own an asset. For the case of public-key hash -and multi-signature Stacks addresses, a given principal can operate on -their assets by issuing a signed transaction on the blockchain. _Smart -contracts_ may also be principals (reprepresented by the smart -contract's identifier), however, there is no private key associated -with the smart contract, and it cannot broadcast a signed transaction -on the blockchain. - -In order to allow smart contracts to operate on assets it owns, smart -contracts may use the special function: - -```scheme -(as-contract (...)) -``` - -This function will execute the closure (passed as an argument) with -the `tx-sender` and `contract-caller` set to the _contract's_ -principal, rather than the current sender. It returns the return value -of the provided closure. A smart contract may use the special variable -`contract-principal` to refer to its own principal. - -For example, a smart contract that implements something like a "token -faucet" could be implemented as so: - -```scheme -(define-public (claim-from-faucet) - (if (is-none? (map-get claimed-before (tuple (sender tx-sender)))) - (let ((requester tx-sender)) ;; set a local variable requester = tx-sender - (map-insert! claimed-before (tuple (sender requester)) (tuple (claimed true))) - (as-contract (stacks-transfer! requester 1)))) - (err 1)) -``` - -Here, the public function `claim-from-faucet`: - -1. Checks if the sender has claimed from the faucet before -2. Assigns the tx sender to a requester variable -3. Adds an entry to the tracking map -4. Uses `as-contract` to send 1 microstack - -The primitive function `is-contract?` can be used to determine -whether a given principal corresponds to a smart contract. - -## Stacks Transfer Primitives - -To interact with Stacks balances, smart contracts may call the -`(stacks-transfer!)` function. This function will attempt to transfer -from the current principal to another principal: - - -```scheme -(stacks-transfer! - to-send-amount - recipient-principal) -``` - -This function itself _requires_ that the operation have been signed by -the transferring principal. The `integer` type in our smart contracting -language is an 16-byte signed integer, which allows it to specify the -maximum amount of microstacks spendable in a single Stacks transfer. - -Like any other public smart contract function, this function call -returns an `ok` if the transfer was successful, and `err` otherwise. - -## Data-Space Primitives - -Data within a smart contract's data-space is stored within -`maps`. These stores relate a typed-tuple to another typed-tuple -(almost like a typed key-value store). As opposed to a table data -structure, a map will only associate a given key with exactly one -value. Values in a given mapping are set or fetched using: - -1. `(map-get map-name key-tuple)` - This fetches the value - associated with a given key in the map, or returns `none` if there - is no such value. -2. `(map-set! map-name key-tuple value-tuple)` - This will set the - value of `key-tuple` in the data map -3. `(map-insert! map-name key-tuple value-tuple)` - This will set - the value of `key-tuple` in the data map if and only if an entry - does not already exist. -4. `(map-delete! map-name key-tuple)` - This will delete `key-tuple` - from the data map - -We chose to use data maps as opposed to other data structures for two -reasons: - -1. The simplicity of data maps allows for both a simple implementation -within the VM, and easier reasoning about functions. By inspecting a -given function definition, it is clear which maps will be modified and -even within those maps, which keys are affected by a given invocation. -2. The interface of data maps ensures that the return types of map -operations are _fixed length_, which is a requirement for static -analysis of smart contracts' runtime, costs, and other properties. - -A smart contract defines the data schema of a data map with the -`define-map` call. The `define-map` function may only be called in the -top-level of the smart-contract (similar to `define-private`). This -function accepts a name for the map, and a definition of the structure -of the key and value types. Each of these is a list of `(name, type)` -pairs, and they specify the input and output type of `map-get`. -Types are either the values `'principal`, `'integer`, `'bool` or -the output of a call to `(buffer n)`, which defines an n-byte -fixed-length buffer. - -This interface, as described, disallows range-queries and -queries-by-prefix on data maps. Within a smart contract function, -you cannot iterate over an entire map. - -### Record Type Syntax - -To support the use of _named_ fields in keys and values, our language -allows the construction of named tuples using a function `(tuple ...)`, -e.g., - -``` -(define-constant imaginary-number-a (tuple (real 1) (i 2))) -(define-constant imaginary-number-b (tuple (real 2) (i 3))) - -``` - -This allows for creating named tuples on the fly, which is useful for -data maps where the keys and values are themselves named tuples. To -access a named value of a given tuple, the function `(get #name -tuple)` will return that item from the tuple. - -### Time-shifted Evaluations - -The Stacks language supports _historical_ data queries using the -`(at-block)` function: - -``` -(at-block 0x0101010101010101010101010101010101010101010101010101010101010101 - ; returns owner principal of name represented by integer 12013 - ; at the time of block 0x010101... - (map-get name-map 12013)) -``` - -This function evaluates the supplied closure as if evaluated at the end of -the supplied block, returning the resulting value. The supplied -closure _must_ be read-only (is checked by the analysis). - -The supplied block hash must correspond to a known block in the same -fork as the current block, otherwise a runtime error will occur and the -containing transaction will _fail_. Note that if the supplied block -pre-dates any of the data structures being read within the closure (i.e., -the block is before the block that constructed a data map), a runtime -error will occur and the transaction will _fail_. - -## Library Support and Syntactic Sugar - -There are a number of ways that the developer experience can be -improved through the careful addition of improved syntax. For example, -while the only atomic types supported by the smart contract language -are integers, buffers, booleans, and principals, so if a developer -wishes to use a buffer to represent a fixed length string, we should -support syntax for representing a buffer literal using something like -an ASCII string. Such support should also be provided by transaction -generation libraries, where buffer arguments may be supplied strings -which are then automatically converted to buffers. There are many -possible syntactic improvements and we expect that over the course -of developing the prototype, we will have a better sense for which -of those improvements we should support. Any such synactic changes -will appear in an eventual language specification, but we believe -them to be out of scope for this proposal. - -# Static Analysis - -One of the design goals of our smart contracting language was the -ability to statically analyze smart contracts to obtain accurate -upper-bound estimates of transaction costs (i.e., runtime and storage -requirements) as a function of input lengths. By limiting the types -supported, the ability to recurse, and the ability to iterate, we -believe that the language as presented is amenable to such static -analysis based on initial investigations. - -The essential step in demonstrating the possibility of accurate and -useful analysis of our smart contract definitions is demonstrating -that any function within the language specification has an output -length bounded by a constant factor of the input length. If we can -demonstrate this, then statically computing runtime or space -requirements involves merely associating each function in the language -specification with a way to statically determine cost as a function of -input length. - -Notably, the fact that the cost functions produced by static analysis -are functions of _input length_ means the following things: - -1. The cost of a cross-contract call can be "memoized", such - that a static analyzer _does not_ need to recompute any - static analysis on the callee when analyzing a caller. -2. The cost of a given public function on a given input size - _is always the same_, meaning that smart contract developers - do not need to reason about different cases in which a given - function may cost more or less to execute. - -## Bounding Function Output Length - -Importantly, our smart contracting language does not allow the -creation of variable length lists: there are no `list` or -`cons` constructors, and buffer lengths must be statically -defined. Under such requirements (and given that recursion is -illegal), determining the output lengths of functions is rather -directly achievable. To see this, we'll examine trying to compute the -output lengths for the only functions allowed to iterate in the -language: - -``` -outputLen(map f list) := Len(list) * outputLen(f t) -outputLen(filter f list) := Len(list) -outputLen(fold f list s) := Len(s) -``` - -Many functions within the language will output values larger than the -function's input, _however_, these outputs will be bound by -statically inferable constants. For example, the data function -_map-get_ will always return an object whose size is equal -to the specified value type of the map. - -A complete proof for the static runtime analysis of smart contracts -will be included with the implementation of the language. - -# Deploying the Smart Contract - -Smart contracts on the Stacks blockchain will be deployed directly as -source code. The goal of the smart contracting language is that the -code of the contract defines the _ground truth_ about the intended -functionality of the contract. While seemingly banal, many systems -chose instead to use a compiler to translate from a friendly -high-level language to a lower-level language deployed on the -blockchain. Such an architecture is needlessly dangerous. A bug in -such a compiler could lead to a bug in a deployed smart contract when -no such bug exists in the original source. This is problematic for -recovery --- a hard fork to "undo" any should-have-been invalid -transactions would be contentious and potentially create a rift in the -community, especially as it will not be easy to deduce which contracts -exactly were affected and for how long. In contrast, bugs in the VM -itself present a more clear case for a hard fork: the smart contract -was defined correctly, as everyone can see directly on the chain, but -illegal transactions were incorrectly marked as valid. - -# Virtual Machine API - -From the perspective of other components of `blockstack-core`, the -smart contracting VM will provide the following interface: - -``` -connect-to-database(db) - -publish-contract( - contract-source-code) - - returns: contract-identifier - -execute-contract( - contract-identifier, - transaction-name, - sender-principal, - transaction-arguments) - - returns: true or false if the transaction executed successfully -``` - -## Invocation and Static Analysis - -When processing a client transaction, a `blockstack-core` node will do -one of two things, depending on whether that transaction is a contract -function invocation, or is attempting to publish a new smart contract. - -### Contract function invocation - -Any transaction which invokes a smart contract will be included in the -blockchain. This is true even for transactions which are -_invalid_. This is because _validating_ an invalid transaction is not -a free operation. The only exceptions to this are transactions which -do not pay more than either a minimum fee or a storage fee -corresponding to the length of the transaction. Transactions which do -not pay a storage fee and clear the minimum transaction fee are -dropped from the mempool. - -To process a function invocation, `blockstack-core` does the following: - -1. Get the balance of the sender's account. If it's less than the tx fee, -then `RETURN INVALID`. -2. Otherwise, debit the user's account by the tx fee. -3. Look up the contract by hash. If it does not exist, then `RETURN - INVALID`. -4. Look up the contract's `define-public` function and compare the - tx's arguments against it. If the tx does not call an existing - method, or supplies invalid arguments, then `RETURN INVALID`. -5. Look up the cost to execute the given function, and if it is greater - than the paid tx fee, `RETURN INVALID`. -6. Execute the public function code and commit the effects of running - the code and `RETURN OK` - -### Publish contract - -A transaction which creates a new smart contract must pay a fee which -funds the static analysis required to determine the cost of the new -smart contract's public functions. To process such a transaction, -`blockstack-core` will: - -1. Check the sender's account balance. If zero, then `RETURN INVALID` -2. Check the tx fee against the user's balance. If it's higher, then `RETURN INVALID` -3. Debit the tx fee from the user's balance. -4. Check the syntax, calculating the fee of verifying each code - item. If the cost of checking the next item exceeds the tx fee, or - if the syntax is invalid, then `RETURN INVALID`. -5. Build the AST, and assign a fee for adding each AST item. If the - cost of adding the next item to the tree exceeds the tx fee (or if - the AST gets too big), then `RETURN INVALID`. -6. Walk the AST. Each step in the walk incurs a small fee. Do the - following while the tx fee is higher than the total cost incurred - by walking to the next node in the AST: - a. If the next node calls a contract method, then verify that - the contract exists and the method arguments match the contract's - `define-public` signature. If not, then `RETURN INVALID`. - b. Compute the runtime cost of each node in the AST, adding it - to the function's cost analysis. -7. Find all `define-map` calls to find all tables that need to - exist. Each step in this incurs a small fee. -8. Create all the tables if the cost of creating them is smaller than - the remaining tx fee. If not, then RETURN INVALID. -9. `RETURN OK` - -## Database Requirements and Transaction Accounting - -The smart contract VM needs to interact with a database somewhat -directly: the effects of an `map-insert!` or `map-set!` call are -realized later in the execution of the same transaction. The database -will need to support fairly fine-grained rollbacks as some contract -calls within a transaction's execution may fail, triggering a -rollback, while the transaction execution continues and successfully -completes other database operations. - -The database API provided to the smart contract VM, therefore, must be -capable of (1) quickly responding to `map-get` queries, which are -essentially simply key-value _gets_ on the materialized view of the -operation log. The operation log itself is simply a log of the -`map-insert!` and `map-set!` calls. In addition to these -operations, the smart contract VM will be making token transfer calls. -The databasse log should track those operations as well. - -In order to aid in accounting for the database operations created by a -given transaction, the underlying database should store, with each -operation entry, the corresponding transaction identifier. This will -be expanded in a future SIP to require the database to store enough -information to reconstruct each block, such that the blocks can be -relayed to bootstrapping peers. - -# Clarity Type System - -## Types - -The Clarity language uses a strong static type system. Function arguments -and database schemas require specified types, and use of types is checked -during contract launch. The type system does _not_ have a universal -super type. The type system contains the following types: - -* `(tuple (key-name-0 key-type-0) (key-name-1 key-type-1) ...)` - - a typed tuple with named fields. -* `(list max-len entry-type)` - a list of maximum length `max-len`, with - entries of type `entry-type` -* `(response ok-type err-type)` - object used by public functions to commit - their changes or abort. May be returned or used by other functions as - well, however, only public functions have the commit/abort behavior. -* `(optional some-type)` - an option type for objects that can either be - `(some value)` or `none` -* `(buff max-len)` := byte buffer or maximum length `max-len`. -* `principal` := object representing a principal (whether a contract principal - or standard principal). -* `bool` := boolean value (`true` or `false`) -* `int` := signed 128-bit integer -* `uint` := unsigned 128-bit integer - -## Type Admission - -**UnknownType**. The Clarity type system does not allow for specifying -an "unknown" type, however, in type analysis, unknown types may be -constructed and used by the analyzer. Such unknown types are used -_only_ in the admission rules for `response` and `optional` types -(i.e., the variant types). - -Type admission in Clarity follows the following rules: - -* Types will only admit objects of the same type, i.e., lists will only -admit lists, tuples only admit tuples, bools only admit bools. -* A tuple type `A` admits another tuple type `B` iff they have the exact same - key names, and every key type of `A` admits the corresponding key type of `B`. -* A list type `A` admits another list type `B` iff `A.max-len >= B.max-len` and - `A.entry-type` admits `B.entry-type`. -* A buffer type `A` admits another buffer type `B` iff `A.max-len >= B.max-len`. -* An optional type `A` admits another optional type `B` iff: - * `A.some-type` admits `B.some-type` _OR_ `B.some-type` is an unknown type: - this is the case if `B` only ever corresponds to `none` -* A response type `A` admits another response type `B` if one of the following is true: - * `A.ok-type` admits `B.ok-type` _AND_ `A.err-type` admits `B.err-type` - * `B.ok-type` is unknown _AND_ `A.err-type` admits `B.err-type` - * `B.err-type` is unknown _AND_ `A.ok-type` admits `B.ok-type` -* Principals, bools, ints, and uints only admit types of the exact same type. - -Type admission is used for determining whether an object is a legal argument for -a function, or for insertion into the database. Type admission is _also_ used -during type analysis to determine the return types of functions. In particular, -a function's return type is the least common supertype of each type returned from any -control path in the function. For example: - -``` -(define-private (if-types (input bool)) - (if input - (ok 1) - (err false))) -``` - -The return type of `if-types` is the least common supertype of `(ok -1)` and `(err false)` (i.e., the most restrictive type that contains -all returns). In this case, that type `(response int bool)`. Because -Clarity _does not_ have a universal supertype, it may be impossible to -determine such a type. In these cases, the functions are illegal, and -will be rejected during type analysis. - -# Measuring Transaction Costs for Fee Collection - -Our smart contracting language admits static analysis to determine -many properties of transactions _before_ executing those -transactions. In particular, it allows for the VM to count the total -number of runtime operations required, the maximum amount of database -writes, and the maximum number of calls to any expensive primitive -functions like database reads or hash computations. Translating that -information into transaction costs, however, requires more than simply -counting those operations. It requires translating the operations into -a single cost metric (something like gas in Ethereum). Then, clients -can set the fee rate for that metric, and pay the corresponding -transaction fee. Notably, unlike Turing-complete smart contracting -languages, any such fees are known _before_ executing the transaction, -such that clients will no longer need to estimate gas fees. They will, -however, still need to estimate fee rates (much like Bitcoin clients -do today). - -Developing such a cost metric is an important task that has -significant consequences. If the metric is a bad one, it could open up -the possibility of denial-of-service attacks against nodes in the -Stacks network. We leave the development of a cost metric to another -Stacks Improvement Proposal, as we believe that such a metric should -be designed by collecting real benchmarking data from something close -to a real system (such measurements will likely be collected through -a combination of hand-crafted benchmarks and fuzzing test suites). - -### Maximum Operation Costs and Object Sizes - -Even with a cost metric, it is a good idea to set maximums for the -cost of an operation, and the size of objects (like -buffers). Developing good values for constants such as maximum number -of database reads or writes per transaction, maximum size of buffers, -maximum number of arguments to a tuple, maximum size of a smart -contract definition, etc. is a process much like developing a -cost metric--- this is something best done in tandem with the -production of a prototype. However, we should note that we do intend -to set such limits. - - -# Example: Simple Naming System - -To demonstrate the expressiveness of this smart contracting language, -let's look at an example smart contract which implements a simple -naming system with just two kinds of transactions: _preorder_ and -_register_. The requirements of the system are as follows: - -1. Names may only be owned by one principal -2. A register is only allowed if there is a corresponding preorder - with a matching hash -3. A register transaction must be signed by the same principal who - paid for the preorder -4. A preorder must have paid at least the price of the name. Names - are represented as integers, and any name less than 100000 costs - 1000 microstacks, while all other names cost 100 microstacks. -5. Preorder hashs are _globally_ unique. - -In this simple scheme, names are represented by integers, but in -practice, a buffer would probably be used. - -```scheme -(define-constant burn-address '1111111111111111111114oLvT2) -(define-private (price-function name) - (if (< name 1e5) 1000 100)) - -(define-map name-map - { name: uint } { buyer: principal }) -(define-map preorder-map - { name-hash: (buff 160) } - { buyer: principal, paid: uint }) - -(define-public (preorder - (name-hash (buffer 20)) - (name-price integer)) - (if (and (is-ok? (stacks-transfer! - name-price burn-address)) - (map-insert! preorder-map - (tuple (name-hash name-hash)) - (tuple (paid name-price) - (buyer tx-sender)))) - (ok 0) - (err 1))) - -(define-public (register - (recipient-principal principal) - (name integer) - (salt integer)) - (let ((preorder-entry - (map-get preorder-map - (tuple (name-hash (hash160 name salt))))) - (name-entry - (map-get name-map (tuple (name name))))) - (if (and - ;; must be preordered - (not (is-none? preorder-entry)) - ;; name shouldn't *already* exist - (is-none? name-entry) - ;; preorder must have paid enough - (<= (price-funcion name) - (default-to 0 (get paid preorder-entry))) - ;; preorder must have been the current principal - (eq? tx-sender - (expects! (get buyer preorder-entry) (err 1))) - (map-insert! name-table - (tuple (name name)) - (tuple (owner recipient)))) - (ok 0) - (err 1)))) -``` - - -Note that Blockstack PBC intends to supply a full BNS (Blockstack -Naming System) smart contract, as well as formal proofs that certain -desirable properties hold (e.g. "names are globally unique", "a -revoked name cannot be updated or transferred", "names cost stacks -based on their namespace price function", "only the principal can -reveal a name on registration", etc.). +This SIP is now located in the [stacksgov/sips repository](https://github.com/stacksgov/sips/blob/main/sips/sip-002/sip-002-smart-contract-language.md) as part of the [Stacks Community Governance organization](https://github.com/stacksgov). diff --git a/sip/sip-003-peer-network.md b/sip/sip-003-peer-network.md index 41bacf04a..84ae5dfd2 100644 --- a/sip/sip-003-peer-network.md +++ b/sip/sip-003-peer-network.md @@ -1,1356 +1,5 @@ -# SIP 003 Peer Network +# SIP-003 Peer Network -## Preamble +This document formerly contained SIP-003 before the Stacks 2.0 mainnet launched. -Title: Peer Network - -Author: Jude Nelson - -Status: Draft - -Type: Standard - -Created: 2/27/2018 - -License: BSD 2-Clause - -## Abstract - -This SIP describes the design of the Stacks peer network, used for relaying -blocks, transactions, and routing information. The document describes both the -overall protocol design and rationale, and provides descriptions of each -message's wire format (where applicable). - -## Rationale - -The Stacks blockchain implements a peer-to-peer _reachability network_ in order -to ensure that each Stacks peer has a full copy of all blocks committed to on -the burn chain, and all unconfirmed transactions. A full replica of the chain -state is necessary for user security -- users must be able to determine what -their account states are in order to know that any transactions they send from -them are valid as they are sent. In addition, a full replica of all chain state -is desirable from a reliability perspective -- as long as there exists one -available replica, then it will be possible for new peers to bootstrap -themselves from it and determine the current state of the chain. As such, the -network protocol is designed to help peers build full replicas while remaining -resilient to disruptions and partitions. - -The Stacks peer network is designed with the following design goals in mind: - -* **Ease of reimplementation**. The rules for encoding and decoding messages - are meant to be as simple as possible to facilitate implementing ancilliary software -that depends on talking to the peer network. Sacrificing a little bit of space -efficiency is acceptable if it makes encoding and decoding simpler. - -* **Unstructured reachability**. The peer network's routing algorithm - prioritizes building a _random_ peer graph such that there are many -_distinct_ paths between any two peers. A random (unstructured) graph is -preferred to a structured graph (like a DHT) in order to maximize the number of next-hop -(neighbor) peers that a given peer will consider in its frontier. When choosing neighbors, a peer -will prefer to maximize the number of _distinct_ autonomous systems represented -in its frontier in order to help keep as many networks on the Internet connected -to the Stacks peer network. - -## Specification - -The following subsections describe the data structures and protocols for the -Stacks peer network. In particular, this document discusses _only_ the peer -network message structure and protocols. It does _not_ document the structure -of Stacks transactions and blocks. These structures are defined in SIP 005. - -### Encoding Conventions - -This section explains how this document will describe the Stacks messages, and -explains the conventions used to encode Stacks messages as a sequence of bytes. - -All Stacks network messages are composed of _scalars_, _byte buffers_ of fixed -length, _vectors_ of variable length, and _typed containers_ of variable length. - -A scalar is a number represented by 1, 2, 4, or 8 bytes, and is unsigned. -Scalars requiring 2, 4, and 8 bytes are encoded in network byte order (i.e. big-endian). - -Byte buffers have known length and are transmitted as-is. - -Vectors are encoded as length-prefixed arrays. The first 4 bytes of a vector -are a scalar that encodes the vector's length. As such, a vector may not have -more than 2^32 - 1 items. Vectors are recursively defined in terms of other -scalars, byte buffers, vectors, and typed containers. - -A typed container is encoded as a 1-byte type identifier, followed by zero or -more encoded structures. Typed containers are used in practice to encode -type variants, such as types of message payloads or types of transactions. -Typed containers are recursively-defined in terms of other scalars, byte -buffers, vectors, and type containers. Unlike a vector, there is no length -field for a typed container -- the parser will begin consuming the container's -items immediately following the 1-byte type identifier. - -**Example** - -Consider the following message definitions: - -``` -// a byte buffer -pub struct SomeBytes([u8; 10]); - -pub struct ExampleMessagePayload { - pub aa: u16, - pub bytes: SomeBytes -} - -// will encode to a typed container -pub enum PayloadVariant { - Foo(ExampleMessagePayload), - Bar(u32) -} - -pub const FOO_MESSAGE_TYPE_ID: u8 = 0x00; -pub const BAR_MESSAGE_TYPE_ID: u8 = 0x01; - -// top-level message that will encode to a sequence of bytes -pub struct ExampleMessage { - pub a: u8, - pub b: u16, - pub c: u32, - pub d: u64, - pub e: Vec, - pub payload: PayloadVariant, - pub payload_list: Vec -} -``` - -Consider the following instantiation of an `ExampleMessage` on a little-endian -machine (such as an Intel x86): - -``` -let msg = ExampleMessage { - a: 0x80, - b: 0x9091, - c: 0xa0a1a2a3, - d: 0xb0b1b2b3b4b5b6b7, - e: vec![0xc0c1c2c3c4c5c6c7, 0xd0d1d2d3d4d5d6d7, 0xe0e1e2e3e4e5e6e7], - payload: PayloadVariant::Foo( - ExampleMessagePayload { - aa: 0xaabb, - bytes: SomeBytes([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) - } - ), - payload_list: vec![ - PayloadVariant::Foo( - ExampleMessagePayload { - aa: 0xccdd, - bytes: SomeBytes([0xa, 0xb, 0xc, 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13]) - } - ), - PayloadVariant::Bar(0x00112233) - ] -}; -``` - -This message would serialize to the following bytes. Note that each line represents -a separate record to improve readability. - -``` -80 # msg.a -90 91 # msg.b in network byte order -a0 a1 a2 a3 # msg.c in network byte order -b0 b1 b2 b3 b4 b5 b6 b7 # msg.d in network byte order -00 00 00 03 # length of message.e, in network byte order (note that vector lengths are always 4 bytes) -c0 c1 c2 c3 c4 c5 c6 c7 # msg.e[0] in network byte order -d0 d1 d2 d3 d4 d5 d6 d7 # msg.e[1] in network byte order -e0 e1 e2 e3 e4 e5 e6 e7 # msg.e[2] in network byte order -00 # PayloadVariant::Foo type ID -aa bb # msg.payload.aa, where msg.payload is PayloadVariant::Foo(ExampleMessagePayload) -00 01 02 03 04 05 06 07 08 09 # msg.payload.bytes, where msg.payload is PayloadVariant::Foo(ExampleMessagePayload) -00 00 00 02 # length of msg.payload_list, in network byte order -00 # PayloadVariant::Foo type ID -cc dd # msg.payload_list[0].aa, where msg.payload_list[0] is a PayloadVariant::Foo(ExampleMessagePayload) -0a 0b 0c 0d 0e 0f 10 11 12 13 # msg.payload_list[0].bytes, where msg.payload_list[0] is a PayloadVariant::Foo(ExampleMessagePayload) -01 # PayloadVariant::Bar type ID -00 11 22 33 # msg.payload_list[1].0, where msg.payload_list[1] is a PayloadVariant::Bar(u32) in network byte order -``` - -### Byte Buffer Types - -The following byte buffers are used within Stacks peer messsages: - -``` -pub struct MessageSignature([u8; 65]); -``` - -This is a fixed-length container for storing a recoverable secp256k1 -signature. The first byte is the recovery code; the next 32 bytes are the `r` -parameter, and the last 32 bytes are the `s` parameter. Because there are up to -two valid signature values for a secp256k1 curve, only the signature with the _lower_ -value for `s` will be accepted. - -``` -pub struct PeerAddress([u8; 16]); -``` - -This is a fixed-length container for an IPv4 or an IPv6 address. - -``` -pub struct Txid([u8; 32]); -``` - -This is a container for a transaction ID. Transaction IDs are 32-byte -cryptographic hashes. - -``` -pub struct BurnchainHeaderHash([u8; 32]); -``` - -This is a container for the hash of a burn chain block header, encoded as a -32-byte cryptographic hash. - -``` -pub struct BlockHeaderHash([u8; 32]); -``` - -This is a container for the hash of a Stacks block or a Stacks microblock header -hash, encoded as a 32-byte cryptographic hash. - -``` -pub struct Secp256k1PublicKey([u8; 33]); -``` - -This is a compressed secp256k1 public key, as used in Bitcoin and other -cryptocurrencies. - -``` -pub struct DoubleSha256([u8; 32]); -``` - -This is a SHA256 hash applied twice to some data. - -``` -pub struct Sha512Trunc256([u8; 32]); -``` - -This is a container for a SHA512/256 hash. - -``` -pub struct TrieHash([u8; 32]); -``` - -This is a container for a MARF merkle hash (see SIP-004). - -``` -pub struct UrlString(Vec); -``` - -This is a container for an ASCII string that encodes a URL. It is encoded as -follows: -* A 1-byte length prefix -* The string's bytes, as-is. - -### Common Data Structures - -This section details common data structures used in multiple messages. - -**Neighbor Address** - -The network address of a Stacks peer is encoded as follows: - -``` -pub struct NeighborAddress { - /// The IPv4 or IPv6 address of this peer - pub addrbytes: PeerAddress, - - /// The port this peer listens on - pub port: u16, - - /// The RIPEMD160-SHA256 hash of the node's public key. - /// If this structure is used to advertise knowledge of another peer, - /// then this field _may_ be used as a hint to tell the receiver which - /// public key to expect when it establishes a connection. - pub public_key_hash: Hash160 -} -``` - -**Relay Data** - -Messages in the network preserve the order of peers that send them. This -information is encoded as follows: - -``` -pub struct RelayData { - /// The peer that relayed a message - pub peer: NeighborAddress, - - /// The sequence number of that message (see the Preamble structure below) - pub seq: u32, -} -``` - -### Messages - -All Stacks messages have three components: - -* A fixed-length **preamble** which describes some metadata about the peer's view of the - network. - -* A variable-length but bound-sized **relayers** vector which describes the order of peers that - relayed a message. - -* A variable-length **payload**, which encodes a specific peer message as a - typed container. - -All Stacks messages are represented as: - -``` -pub struct StacksMessage { - pub preamble: Preamble, - pub relayers: Vec, - pub payload: StacksMessageType -} -``` - -The preamble has the following fields. Descriptions of each field are provided -in-line. - -``` -pub struct Preamble { - /// A 4-byte scalar to encode the semantic version of this software. - /// The only valid value is 0x15000000 (i.e. version 21.0.0.0). - pub peer_version: u32, - - /// A 4-byte scalar to encode which network this peer belongs to. - /// Valid values are: - /// 0x15000000 -- this is "mainnet" - /// 0x15000001 -- this is "testnet" - pub network_id: u32, - - /// A 4-byte scalar to encode the message sequence number. A peer will - /// maintain a sequence number for each neighbor it talks to, and will - /// increment it each time it sends a new message (wrapping around if - /// necessary). - pub seq: u32, - - /// This is the height of the last burn chain block this peer processed. - /// If the peer is all caught up, this is the height of the burn chain tip. - pub burn_block_height: u64, - - /// This is the burn block hash calculated at the burn_block_height above. - /// It uniquely identifies a burn chain block. - pub burn_header_hash: BurnchainHeaderHash, - - /// This is the height of the last stable block height -- i.e. the largest - /// block height at which a block can be considered stable in the burn - /// chain history. In Bitcoin, this is at least 7 blocks behind block_height. - pub stable_burn_block_height: u64, - - /// This is the hash of the last stable block's header. - pub stable_burn_header_hash: BurnchainHeaderHash, - - /// This is a pointer to additional data that follows the payload. - /// This is a reserved field; for now, it should all be 0's. - pub additional_data: u32, - - /// This is a signature over the entire message (preamble and payload). - /// When generating this value, the signature bytes below must all be 0's. - pub signature: MessageSignature; - - /// This is the length of the message payload. - pub payload_len: u32; -} -``` - -A payload is a typed container, and may be any of the following enumerated -types: - -``` -pub enum StacksMessageType { - Handshake(HandshakeData), - HandshakeAccept(HandshakeAcceptData), - HandshakeReject, - GetNeighbors, - Neighbors(NeighborsData), - GetBlocksInv(GetBlocksData), - BlocksInv(BlocksInvData), - GetPoxInv(GetPoxInv), - PoxInv(PoxInvData), - BlocksAvailable(BlocksAvailableData), - MicroblocksAvailable(MicroblocksAvailableData), - Blocks(BlocksData), - Microblocks(MicroblocksData), - Transaction(StacksTransaction), - Nack(NackData), - Ping, - Pong -} -``` - -### Payloads - -**Handshake** - -Type identifier: 0 - -Structure: - -``` -pub struct HandshakeData { - /// Address of the peer sending the handshake - pub addrbytes: PeerAddress, - pub port: u16, - - /// Bit field of services this peer offers. - /// Supported bits: - /// -- SERVICE_RELAY = 0x0001 -- must be set if the node relays messages - /// for other nodes. - pub services: u16, - - /// This peer's public key - pub node_public_key: Secp256k1PublicKey, - - /// Burn chain block height at which this key will expire - pub expire_block_height: u64, - - /// HTTP(S) URL to where this peer's block data can be fetched - pub data_url: UrlString -} -``` - -**HandshakeAccept** - -Type identifier: 1 - -Structure: - -``` -pub struct HandshakeAcceptData { - /// The remote peer's handshake data - pub handshake: HandshakeData, - - /// Maximum number of seconds the recipient peer expects this peer - /// to wait between sending messages before the recipient will declare - /// this peer as dead. - pub heartbeat_interval: u32, -} -``` - -**HandshakeReject** - -Type identifier: 2 - -Structure: [empty] - -**GetNeighbors** - -Type identifier: 3 - -Structure: [empty] - -**Neighbors** - -Type identifier: 4 - -Structure: - -``` -pub struct NeighborsData { - /// List of neighbor addresses and public key hints. - /// This vector will be at most 128 elements long. - pub neighbors: Vec -} -``` - -**GetBlocksInv** - -Type identifier: 5 - -Structure: - -``` -pub struct GetBlocksInv { - /// The consensus hash at the start of the requested reward cycle block range - pub consensus_hash: ConsensusHash, - /// The number of blocks after to this consensus hash, including the block - /// that corresponds to this consensus hash. - pub num_blocks: u16 -} -``` - -Notes: - -* Expected reply is a `BlocksInv`. -* `consensus_hash` must correspond to a (burnchain) block at the start of a PoX reward cycle -* `num_blocks` cannot be more than the PoX reward cycle length (see SIP-007). - -**BlocksInv** - -Type identifier: 6 - -Structure: - -``` -pub struct BlocksInvData { - /// Number of bits represented in the bit vector below. - /// Represents the number of blocks in this inventory. - pub bitlen: u16, - - /// A bit vector of which blocks this peer has. bitvec[i] - /// represents the availability of the next 8*i blocks, where - /// bitvec[i] & 0x01 represents the availability of the (8*i)th block, and - /// bitvec[i] & 0x80 represents the availability of the (8*i+7)th block. - /// Each bit corresponds to a sortition on the burn chain, and will be set - /// if this peer has the winning block data - pub block_bitvec: Vec, - - /// A bit vector for which confirmed microblock streams this peer has. - /// The ith bit represents the presence/absence of the ith confirmed - /// microblock stream. It is in 1-to-1 correspondance with block_bitvec. - pub microblocks_bitvec: Vec -} -``` - -Notes: - -* `BlocksInvData.bitlen` will never exceed 4096 -* `BlocksInvData.block_bitvec` will have length `ceil(BlocksInvData.bitlen / 8)` -* `BlocksInvData.microblocks_bitvec` will have length `ceil(BlocksInvData.bitlen / 8)` - -**GetPoxInv** - -Type identifier: 7 - -Structure: - -``` -pub struct GetPoxInv { - /// The consensus hash at the _beginning_ of the requested reward cycle range - pub consensus_hash: ConsensusHash, - /// The number of reward cycles to request (number of bits to expect) - pub num_cycles: u16 -} -``` - -Notes: - -* Expected reply is a `PoxInv` -* `num_cycles` cannot be more than 4096 - -**PoxInv** - -Type identifier: 8 - -Structure: - -``` -pub struct PoxInvData { - /// Number of reward cycles encoded - pub bitlen: u16, - /// Bit vector representing the remote node's PoX vector. - /// A bit will be `1` if the node is certain about the status of the - /// reward cycle's PoX anchor block (it either cannot exist, or the - /// node has a copy), or `0` if the node is uncertain (i.e. it may exist - /// but the node does not have a copy if it does). - pub pox_bitvec: Vec -} - -Notes: -* `bitlen` should be at most `num_cycles` from the corresponding `GetPoxInv`. - -**BlocksAvailable** - -Type identifier: 9 - -Structure: - -``` -pub struct BlocksAvailableData { - /// List of blocks available - pub available: Vec<(ConsensusHash, BurnchainHeaderHash)>, -} -``` - -Notes: - -* Each entry in `available` corresponds to the availability of an anchored - Stacks block from the sender. -* `BlocksAvailableData.available.len()` will never exceed 32. -* Each `ConsensusHash` in `BlocksAvailableData.available` must be the consensus - hash calculated by the sender for the burn chain block identified by -`BurnchainHeaderHash`. - -**MicroblocksAvailable** - -Type identifier: 10 - -Structure: - -``` -// Same as BlocksAvailable -``` - -Notes: - -* Each entry in `available` corresponds to the availability of a confirmed - microblock stream from the sender. -* The same rules and limits apply to the `available` list as in - `BlocksAvailable`. - -**Blocks** - -Type identifier: 11 - -Structure: - -``` -pub struct BlocksData { - /// A list of blocks pushed, paired with the consensus hashes of the - /// burnchain blocks that selected them - pub blocks: Vec<(ConsensusHash, StacksBlock)> -} - -pub struct StacksBlock { - /// Omitted for brevity; see SIP 005 -} -``` - -**Microblocks** - -Type identifier: 12 - -Structure: - -``` -pub struct MicroblocksData { - /// "Index" hash of the StacksBlock that produced these microblocks. - /// This is the hash of both the consensus hash of the burn chain block - /// operations that selected the StacksBlock, as well as the StacksBlock's - /// hash itself. - pub index_anchor_hash: StacksBlockId, - /// A contiguous sequence of microblocks. - pub microblocks: Vec -} - -pub struct StacksMicroblock { - /// Omited for brevity; see SIP 005 -} -``` - -**Transaction** - -Type identifier: 13 - -Structure: - -``` -pub struct StacksTransaction { - /// Omitted for brevity; see SIP 005 -} -``` - -**Nack** - -Type identifier: 14 - -Structure: - -``` -pub struct NackData { - /// Numeric error code to describe what went wrong - pub error_code: u32 -} -``` - -**Ping** - -Type identifier: 15 - -Structure: - -``` -pub struct PingData { - /// Random number - nonce: u32 -} -``` - -**Pong** - -Type identifier: 16 - -Structure: - -``` -pub struct PongData { - /// Random number - nonce: u32 -} -``` - -**NatPunchRequest** - -Type identifier: 17 - -Structure: - -``` -/// a 4-byte nonce unique to this request -u32 -``` - -**NatPunchReply** - -Type identifier: 18 - -Structure: - -``` -pub struct NatPunchData { - /// The public IP address, as reported by the remote peer - pub addrbytes: PeerAddress, - /// The public port - pub port: u16, - /// The nonce from the paired NatPunchRequest - pub nonce: u32, -} -``` - -Notes: -* The `nonce` field in a `PongData` should match the `nonce` field sent by the - corresponding `Ping`. - - -## Protocol Description - -This section describes the algorithms that make up the Stacks peer-to-peer -network. In these descriptions, there is a distinct **sender peer** and a -distinct **receiver peer**. - -### Network Overview - -The Stacks peer network has a dedicated _control plane_ and _data plane_. They -listen on different ports, use different encodings, and fulfill different roles. - -The control-plane is implemented via sending messages using the encoding -described above. It is concerned with the following tasks: -* Identifying and connecting with other Stacks peer nodes -* Crawling the peer graph to discover a diverse set of neighbors -* Discovering peers' data-plane endpoints -* Synchronizing block and microblock inventories with other peers. - -The data-plane is implemented via HTTP(S), and is concerned with both fetching -and relaying blocks, microblocks, and transactions. - -Each Stacks node implements the control-plane protocol in order to help other -nodes discover where they can fetch blocks. However, Stacks nodes do _not_ need -to implement the data plane. They can instead offload some or all of this responsibility to -other Stacks nodes, Gaia hubs, and vanilla HTTP servers. The reason for this is -to **preserve compatibility with existing Web infrastructure** like cloud -storage and CDNs for doing the "heavy lifting" for propagating the blockchain -state. - -### Creating a Control-Plane Message - -All control-plane messages start with a `Preamble`. This allows peers to identify other peers -who (1) have an up-to-date view of the underlying burn chain, and (2) are part -of the same fork set. In addition, the `Preamble` allows peers to authenticate -incoming messages and verify that they are not stale. - -All control-plane messages are signed with the node's session private key using ECDSA on the -secp256k1 curve. To sign a `StacksMessage`, a peer uses the following algorithm: - -1. Serialize the `payload` to a byte string. -2. Set the `preamble.payload_len` field to the length of the `payload` byte string -3. Set the `preamble.seq` field to be the number of messages sent to - this peer so far. -4. Set the `preamble.signature` field to all 0's -5. Serialize the `preamble` to a byte string. -6. Calculate the SHA512/256 over the `preamble` and `payload` byte strings -7. Calculate the recoverable secp256k1 signature from the SHA256 - -### Receiving a Control-Plane Message - -Because all control-plane messages start with a fixed-length `Preamble`, a peer receives a -message by first receiving the `Preamble`'s bytes and decoding it. If the bytes -decode successfully, the peer _then_ receives the serialized payload, using the -`payload_len` field in the `Preamble` to determine how much data to read. To -avoid memory exhaustion, **the payload may not be more than 32 megabytes**. - -Once the preamble and payload message bytes are loaded, the receiver peer -verifies the message as follows: - -1. Calculate the SHA256 hash over the serialized `preamble` and the payload - bytes -2. Extract the recoverable signature from `preamble.signature` -3. Verify the signature against the sender peer's public key. -4. Verify that the `seq` field of the payload is greater than any - previously-seen `seq` value for this peer. -5. Parse the payload typed container bytes into a `Payload` - -### Error Handling - -If anything goes wrong when communicating with a peer, the receiver may reply -with a `Nack` message with an appropriate error code. Depending on the error, -the sender should try again, close the socket and re-establish the connection, or -drop the peer from its neighbor set altogether. In particular, if a peer -receives an _invalid_ message from a sender, the peer should blacklist the remote -peer for a time (i.e. ignore any future messages from it). - -Different aspects of the control-plane protocol will reply with different error codes to -convey exactly what went wrong. However, in all cases, if the preamble is -well-formed but identifies a different network ID, a version field -with a different major version than the local peer, or different stable -burn header hash values, then both the sender and receiver peers should blacklist each other. - -Because peers process the burn chain up to its chain tip, it is possible for -peers to temporarily be on different fork sets (i.e. they will have different -burn header hashes for the given chain tip, but will have the same values for -the stable burn header hashes at each other's `stable_block_height`'s). -In this case, both peers should take it as a hint to first check -that their view of the burn chain is consistent (if they have not done so -recently). They may otherwise process and react to each other's messages -without penalty. - -Peers are expected to be both parsimonious and expedient in their communication. -If a remote peer sends too many valid messages too quickly, the peer -may throttle or blacklist the remote peer. If a remote peer -is sending data too slowly, the recipient may terminate the connection in order -to free resources for serving more-active peers. - -### Connecting to a Peer's Control Plane - -Connecting to a peer's control-plane is done in a single round as follows: - -1. The sender peer creates a `Handshake` message with its address, services, - and public key and sends it to the receiver. -2. The receiver replies with a `HandshakeAccept` with its public key and - services. - -On success, the sender adds the receiver to its frontier set. The receiver may -do so as well, but this is not required. - -If the receiver is unable to process the `Handshake`, the receiver should -reply with a `HandshakeReject` and temporarily blacklist the sender for a time. -Different implementations may have different considerations for what constitutes -an invalid `Handshake` request. A `HandshakeReject` response should be used -only to indicate that the sender peer will be blacklisted. If the `Handshake` -request cannot be processed for a _recoverable_ reason, then the receiver should -reply with a `Nack` with the appropriate error code to tell the sender to try -again. - -When executing a handshake, a peer should _not_ include any other peers in the -`relayers` vector except for itself. The `relayers` field will be ignored. - -### Learning the public IP address - -Before the peer can participate in the control plane, it must know its -publicly-routable IP address so it can exchange it with its remote neighbors. -This is necessary, since other neighbors-of-neighbors will learn this peer's -public IP address from its remote neighbors, and thus must have a publicly-routable -address if they are going to handshake with it. - -The peer may take an operator-given public IP address. If no public IP address -is given, the peer will learn the IP address using the `NatPunchRequest` and -`NatPunchReply` messages as follows: - -1. The peer sends a `NatPunchRequest` to a randomly-chosen initial neighbor it has - already handshaked with. It uses a random nonce value. -2. The remote neighbor replies with a (signed) `NatPunchReply` message, with its - `addrbytes` and `port` set to what it believes the public IP is (based on the - underlying socket's peer address). -3. Upon receipt of the `NatPunchReply`, the peer will have confirmed its public - IP address, and will send it in all future `HandshakeAccept` messages. It - will periodically re-learn its IP address, if it was not given by the - operator. - -Because the peer's initial neighbors are chosen by the operator as being -sufficiently trustworthy to supply network information for network walks, it is -reasonable to assume that they can also be trusted to tell a bootstrapping peer -its public IP address. - -### Checking a Peer's Liveness - -A sender peer can check that a peer is still alive by sending it a `Ping` -message on the control-plane. The receiver should reply with a `Pong` message. Both the sender -and receiver peers would update their metrics for measuring each other's -responsiveness, but they do _not_ alter any information about each other's -public keys and expirations. - -Peers will ping each other periodically this way to prove that they are still alive. -This reduces the likelihood that they will be removed from each other's -frontiers (see below). - -### Exchanging Neighbors - -Peers exchange knowledge about their neighbors on the control-plane as follows: - -1. The sender peer creates a `GetNeighbors` message and sends it to the - receiver. -2. The receiver chooses up to 128 neighbors it knows about and replies to the - sender with them as a `Neighbors` message. It provides the hashes of their session public keys (if -known) as a hint to the sender, which the sender may use to further -authenticate future neighbors. -3. The sender sends `Handshake` messages to a subset of the replied neighbors, - prioritizing neighbors that are not known to the sender or have not been -recently contacted. -4. The various neighbors contacted reply either `HandshakeAccept`, - `HandshakeReject`, or `Nack` messages. The sender updates its frontier with -knowledge gained from the `HandshakeAccept` messages. - -On success, the sender peer adds zero or more of the replied peer addresses to -its frontier set. The receiver and its contacted neighbors do nothing but -update their metrics for the sender. - -If the sender receives an invalid `Neighbors` reply with more than 128 -addresses, the sender should blacklist the receiver. - -The sender is under no obligation to trust the public key hashes in the -`Neighbors` request. However, if the sender trusts the receiver, then they can -be used as hints on the expected public keys if the sender subsequently -attempts to connect with these neighbors. Deciding which nodes to trust -with replying true neighbor information is a peer-specific configuration option. - -The receiver may reply with a `Nack` if it does not wish to divulge its -neighbors. In such case, the sender should not ask this receiver for neighbors -again for a time. - -### Requesting Blocks on the Data-Plane - -Peers exchange blocks in a 2-process protocol: the sender first queries the -receiver for the blocks it has via the control-plane, -and then queries individual blocks and microblocks on the data-plane. - -On the control-plane, the sender builds up a locally-cached inventory of which -blocks the receiver has. To do so, the sender and receiver execute a two-phase -protocol to synchronize the sender's view of the receiver's block inventroy. -First, the sender downloads the receiver's knowledge of PoX reward cycles, -encoded as a bit vector where a `1` in the _ith_ position means that the -receiver is certain about the status of the PoX anchor block in the _ith_ reward -cycle (i.e. it either does not exist, or it does exist and the receiver has a -copy). It is `0` otherwise -- i.e. it may exist, but the receiver does not have -a copy. - -To synchronize the PoX anchor block knowledge, the sender and receiver do the following: - -1. The sender creates a `GetPoxInv` message for the range of PoX reward cycles - it wants, and sends it to the receiver. -2. If the receiver recognizes the consensus hash in the `GetPoxInv` message, it - means that the receiver agrees on all PoX state that the sender does, up to the -burn chain block that this consensus hash represents (note that the consensus -hash must correspond to a burn chain block at the start of a reward cycle). The receiver replies with -a `PoxInv` with its knowledge of all reward cycles at and after the reward cycle -identified by that consensus hash. -3. The sender and receiver continue to execute this protocol until the receiver - shares all of its PoX reward cycle knowledge, or it encounters a consensus -hash from the sender that it does not recognize. If the latter happens, the -receiver shall reply with a `Nack` with the appropriate error code. - -Once the sender has downloaded the PoX anchor block knowledge from the receiver, -it proceeds to fetch an inventory of all block and microblock knowledge from the -receiver for all PoX reward cycles that it agrees with the receiver on. That -is, it will fetch block and microblock inventory data for all reward cycles in -which the sender and receiver both have a `1` or both have a `0` in the _ith_ bit position, -starting from the first-ever reward cycle, and up to either the lowest reward cycle in -which they do not agree (or the end of the PoX vector, whichever comes first). -They proceed as follows: - -1. The sender creates a `GetBlocksInv` message for reward cycle _i_, - and sends it to the receiver. -2. If the receiver has processed the range of blocks represented by the `GetBlocksInv` - block range -- i.e. it recognizes the consensus hash in `GetBlocksInv` as -the start of a reward cycle -- then the receiver creates a `BlocksInv` message and replies -with it. The receiver's inventory bit vectors may be _shorter_ than the -requested range if the request refers to blocks at the burn chain tip. The -receiver sets the _ith_ bit in the blocks inventory if it has the corresponding -block, and sets the _ith_ bit in the microblocks inventory if it has the -corresponding _confirmed_ microblock stream. -3. The sender repeats the process for reward cycle _i+1_, so long as both it - and the receiver are both certain about the PoX anchor block for reward -cycle _i+1_, or both are uncertain. If this is not true, then the sender stops -downloading block and microblock inventory from the receiver, and will assume -that any blocks in or after this reward cycle are unavailable from the receiver. - -The receiver peer may reply with a `PoxInv` or `BlocksInv` with as few -inventory bits as it wants, but it must reply with at -least one inventory bit. If the receiver does not do so, -the sender should terminate the connection to the receiver and refrain from -contacting it for a time. - -While synchronizing the receiver's block inventory, the sender will fetch blocks and microblocks -on the data-plane once it knows that the receiver has them. -To do so, the sender and receiver do the following: - -1. The sender looks up the `data_url` from the receiver's `HandshakeAccept` message - and issues a HTTP GET request for each anchored block marked as present in -the inventory. -2. The receiver replies to each HTTP GET request with the anchored blocks. -3. Once the sender has received a parent and child anchor block, it will ask - for the microblock stream confirmed by the _child_ by asking for the -microblocks that the child confirms. It uses the _index hash_ of the child -anchored block to do so, which itself authenticates the last hash in the -confirmed microblock stream. -4. The receiver replies to each HTTP GET request with the confirmed microblock - streams. -5. As blocks and microblock streams arrive, the sender processes them to build - up its view of the chain. - -When the sender receives a block or microblock stream, it validates them against -the burn chain state. It ensures that the block hashes to a block-commit -message that won sortition (see SIP-001 and SIP-007), and it ensures that a confirmed -microblock stream connects a known parent and child anchored block. This means -that the sender **does not need to trust the receiver** to validate block data --- it can receive block data from any HTTP endpoint on the web. - -The receiver should reply blocks and confirmed microblock streams if it had -previously announced their availability in a `BlocksInv` message. -If the sender receives no data (i.e. a HTTP 404) -for blocks the receiver claimed to have, or if the sender receives invalid data or -an incomplete microblock stream, then the sender disconnects from the receiver -and blacklists it on the control-plane. - -The sender may not be contracting a peer node when it fetches blocks and -microblocks -- the receiver may send the URL to a Gaia hub in its -`HandshakeAcceptData`'s `data_url` field. In doing so, the receiver can direct -the sender to fetch blocks and microblocks from a well-provisioned, -always-online network endpoint that is more reliable than the receiver node. - -Blocks and microblocks are downloaded incrementally by reward cycle. -As the sender requests and receives blocks and microblocks for reward cycle _i_, -it learns the anchor block for reward cycle _i+1_ (if it exists at all), and -will only then be able to determine the true sequence of consensus hashes for -reward cycle _i+1_. As nodes do this, their PoX knowledge my change -- i.e. -they will become certain of the presences of PoX anchor blocks that they had -previously been uncertain of. As such, nodes periodically re-download each -other's PoX inventory vectors, and if they have changed -- i.e. the _ith_ bit flipped -from a `0` to a `1` -- the block and microblock inventory state representing blocks and -microblocks in or after reward cycle _i_ will be dropped and re-downloaded. - -### Announcing New Data - -In addition to synchronizing inventories, peers announce to one another -when a new block or confirmed microblock stream is available. If peer A has -crawled peer B's inventories, and peer A downloads or is forwarded a block or -confirmed microblock stream that peer B does not have, then peer A will send a -`BlocksAvailable` (or `MicroblocksAvailable`) message to peer B to inform it -that it can fetch the data from peer A's data plane. When peer B receives one -of these messages, it updates its copy of peer A's inventory and proceeds to -fetch the blocks and microblocks from peer A. If peer A serves invalid data, or -returns a HTTP 404, then peer B disconnects from peer A (since this indicates -that peer A is misbehaving). - -Peers do not forward blocks or confirmed microblocks to one another. Instead, -they only announce that they are available. This minimizes the aggregate -network bandwidth required to propagate a block -- a block is only downloaded -by the peers that need it. - -Unconfirmed microblocks and transactions are always forwarded to other peers in order to -ensure that the whole peer network quickly has a full copy. This helps maximize -the number of transactions that can be included in a leader's microblock stream. - -### Choosing Neighbors - -The core design principle of the Stacks peer network control-plane is to maximize the entropy -of the peer graph. Doing so helps ensure that the network's connectivity -avoids depending too much on a small number of popular peers and network edges. -While this may slow down message propagation relative to more structured peer graphs, -the _lack of_ structure is the key to making the Stacks peer network -resilient. - -This principle is realized through a randomized neighbor selection algorithm. -This algorithm curates the peer's outbound connections to other peers; inbound -connections are handled separately. - -The neighbor selection algorithm is designed to be able to address the following -concerns: - -* It helps a peer discover possible "choke points" in the network, and devise - alternative paths around them. -* It helps a peer detect network disruptions (in particular, BGP prefix hijacks) -- -observed as a sets of peers with the same network prefix suddenly not relaying -messages, or sets of paths through particular IP blocks no longer being taken. -* It helps a peer discover the "jurisdictional path" its messages could travel through, -which helps a peer route around hostile networks that would delay, block, or -track the messages. - -To achieve this, the Stacks peer network control-plane is structured as a K-regular random graph, -where _any_ peer may be chosen as a peer's neighbor. The network forms -a _reachability_ network, with the intention of being "maximally difficult" for a -network adversary to disrupt by way of censoring individual nodes and network -hops. A random graph topology is suitable for this, -since the possibility that any peer may be a neighbor means that the only way to -cut off a peer from the network is to ensure it never discovers another honest -peer. - -To choose their neighbors in the peer graph, peers maintain two views of the network: - -* The **frontier** -- the set of peers that have either sent a message to this - peer or have responded to a request at some point in the past. -The size of the frontier is significantly larger than -K. Peer records in the frontier set may expire and may be stale, but are only -evicted when the space is needed. The **fresh frontier set** is the subset of -the frontier set that have been successfully contacted in the past _L_ seconds. - -* The **neighbor set** -- the set of K peers that the peer will announce as its - neighbors when asked. The neighbor set is a randomized subset of the frontier. -Unlike the frontier set, the peer continuously refreshes knowledge of the state -of the neighbor sets' blocks and transactions in order to form a transaction and -block relay network. - -Using these views of the network, the peers execute a link-state routing -protocol whereby each peer determines each of its neighbors' neighbors, -and in doing so, builds up a partial view of the routing graph made up of -recently-visited nodes. Peers execute a route recording protocol whereby each -message is structured to record the _path_ it took -through the graph's nodes. This enables a peer to determine how often other peers -in its frontier, as well as the network links between them, are responsible for -relaying messages. This knowledge, in turn, is used to help the peer seek out -new neighbors and neighbor links to avoid depending on popular peers and -links too heavily. - -**Discovering Other Peers** - -To construct a K-regular random graph topology, peers execute a modified Metropolis-Hastings -random graph walk with delayed acceptance (MHRWDA) [1] to decide which peers belong to -their neighbor set and to grow their frontiers. - -A peer keeps track of which peers are neighbors of which other peers, and in -doing so, is able to calculate the degree of each peer as the number of that -peer's neighbors that report the peer in question as a neighbor. Given a currently-visited -peer _P_, a neighboring peer _N_ is walked to with probability proportional to -the ratio of their degrees. The exact formula is adapted from Algorithm 2 in -[1]. - -Once established, a peer tries to keep its neighbor set stable as long as the -neighbors are live. It does so by periodically pinging and re-handshaking with -its K neighbors in order to establish a minimum time between contacts. -As it communicates with neighbors, it will measure the health of each neighbor by measuring how often -it responds to a query. A peer will probabilistically evict a peer from its -neighbor set if its response rate drops too low, where the probability of -eviction is proportional both to the peer's perceived uptime and to the peer's -recent downtime. - -**Curating a Frontier** - -In addition to finding neighbors, a peer curates a frontier set to (1) maintain knowledge -of backup peers to contact in case a significant portion of their neighbors goes -offline, and (2) to make inferences about the global connectivity of the peer -graph. A peer can't crawl each and every other peer in the -frontier set (this would be too expensive), but a peer can infer over time which -nodes and edges are likely to be online by examining its fresh frontier set. - -The frontier set grows whenever new neighbors are discovered, but it is not -infinitely large. Frontier nodes are stored in a bound-sized hash table on disk. A neighbor -inserted deterministically into the frontier set by hashing its address with a -peer-specific secret and the values `0` through `7` in order to identify eight -slots into which its address can be inserted. If any of the resulting slots are -empty, the peer is added to the frontier. - -As more peers are discovered, it becomes possible that a newly-discovered peer cannot be inserted -determinstically. This will become more likely than not to happen once the -frontier set has `8 * sqrt(F)` slots full, where `F` is the maximum size of -the frontier (due to the birthday paradox). In such cases, a random existing peer in one of the slots is -chosen for possible eviction, but only if it is offline. The peer will attempt -to handshake with the existing peer before evicting it, and if it responds with -a `HandshakeAccept`, the new node is discarded and no eviction takes place. - -Insertion and deletion are deterministic (and in deletion's case, predicated on -a failure to ping) in order to prevent malicious remote peers from filling up -the frontier set with junk without first acquiring the requisite IP addresses -and learning the victim's peer-specific secret nonce. -The handshake-then-evict test is in place also to -prevent peers with a longer uptime from being easily replaced by short-lived peers. - -**Mapping the Peer Network** - -The Stacks protocol includes a route recording mechanism for peers to probe network paths. -This is used to measure how frequently peers and connections are used in the peer -graph. This information is encoded in the `relayers` vector in each message. - -When relaying data, the relaying peer must re-sign the message preamble and update its -sequence number to match each recipient peer's expectations on what the signature -and message sequence will be. In addition, the relaying peer appends the -upstream peer's address and previous sequence number in the -message's `relayers` vector. Because the `relayers` vector grows each time a -message is forwarded, the peer uses it to determine the message's time-to-live: -if the `relayers` vector becomes too long, the message is dropped. - -A peer that relays messages _must_ include itself at the end of the -`relayers` vector when it forwards a message. -If it does not do so, a correct downstream peer can detect this by checking that -the upstream peer inserted its previously-announced address (i.e. the IP -address, port, and public key it sent in its `HandshakeData`). If a relaying -peer does not update the `relayers` vector correctly, a downstream peer should -close the connection and possibly throttle the peer (blacklisting should not -be used since this can happen for benign reasons -- for example, a node on a -laptop may change IP addresses between a suspend/resume cycle). Nevertheless, -it is important that the `relayers` vector remains complete in order to detect and resist routing -disruptions in the Internet. - -Not all peers are relaying peers -- only peers that set the `SERVICE_RELAY` -bit in their handshakes are required to relay messages and list themselves in the `relayers` vector. -Peers that do not do this may nevertheless _originate_ an unsolicited `BlocksData`, -`MicroblocksData`, or `Transaction` message. However, its `relayers` vector _must_ be -empty. This option is available to protect the privacy of the originating peer, since -(1) network attackers seeking to disrupt the chain could do -so by attacking block and microblock originators, and -(2) network attackers seeking to go after Stacks users could do so if they knew -or controlled the IP address of the victim's peer. The fact that network -adversaries can be expected to harass originators who advertise their network -addresses serves to discourage relaying peers from stripping the -`relayers` vector from messsages, lest they become the target of an attack. - -A peer may transition between being a relaying peer and a non-relaying peer by -closing a connection and re-establishing it with a new handshake. A peer that -violates the protocol by advertising their `SERVICE_RELAY` bit and not -updating the `relayers` vector should be blacklisted by downstream -peers. - -A peer must not forward messages with invalid `relayers` vectors. In -particular, if a peer detects that its address (specicifically, it's public key -hash) is present in the `relayers` vector, or if the vector contains a cycle, -then the message _must_ be dropped. In addition, a peer that receives a message -from an upstream peer without the `SERVICE_RELAY` bit set that includes a -`relayers` vector _must_ drop the message. - -**Promoting Route Diversity** - -The peer network employs two heuristics to help prevent choke points from -arising: - -* Considering the AS-degree: the graph walk algorithm will consider a peer's -connectivity to different _autonomous systems_ -(ASs) when considering adding it to the neighbor set. - -* Sending data in rarest-AS-first order: the relay algorithm will probabilistically - rank its neighbors in order by how rare their AS is in the fresh frontier set. - -When building up its K neighbors, a peer has the opportunity to select neighbors -based on how popular their ASs are. To do this, the peer crawl N > K neighbors, and then -randomly disconnect from N - K of them. The probability that a peer will -be removed is proportional to (1) how popular its AS -is in the N neighbors, and (2) how unhealthy it is out of the neighbors in the -same AS. The peer will first select an AS to prune, and then select a neighbor -within that AS. This helps ensure that a relayed messasge is likely to be -forwarded to many different ASs quickly. - -To forward messages to as many different ASs as possible, the peer will -probabilistically prioritize neighbors to receive a forwarded message based on how _rare_ -their AS is in the fresh frontier set. This forwarding heuristic is -meant to ensure that a message quickly reaches many different networks in the -Internet. - -The rarest-AS-first heuristic is implemented as follows: - -1. The peer builds a table `N[AS]` that maps its fresh frontier set's ASs to the list of peers - contained within. `len(N[AS])` is the number of fresh frontier peers in `AS`, and - `sum(len(N[AS]))` for all `AS` is `K`. -2. The peer assigns each neighbor a probability of being selected to receive the - message next. The probability depends on `len(N[AS])`, where `AS` is the - autonomous system ID the peer resides in. The probability that a peer is - selected to receive the message is proportional to `1 - (len(N[AS]) + 1) / K`. -3. The peer selects a neighbor according to the distribution, forwards the message to it, and - removes the neighbor from consideration for this message. The peer repeats step 2 until all neighbors have - been sent the message. - -A full empirical evaluation on the effectiveness of these heuristics at encouraging -route diversity will be carried out before this SIP is accepted. - -**Proposal for Miner-Assisted Peer Discovery** - -Stacks miners are already incentivized to maintain good connectivity with one -another and with the peer network in order to ensure that they work on the -canonical fork. As such, a correct miner may, in the future, help the -control-plane network remain connected by broadcasting the root of a Merkle tree of a set -of "reputable" peers that are known by the miner to be well-connected, e.g. by -writing it to its block's coinbase payload. Other -peers in the peer network would include these reputable nodes in their frontiers -by default. - -A peer ultimately makes its own decisions on who its neighbors are, but by -default, a peer selects a miner-recommended peer only if over 75% of the mining power recommends -the peer for a long-ish interval (on the order of weeks). The 75% threshold -follows from selfish mining -- the Stacks blockchain prevents selfish mining as -long as at least 75% of the hash power is honest. If over 75% of the mining -power recommends a peer, then the peer has been recommended through an honest -process and may be presumed "safe" to include in the frontier set. - -A recommended peer would not be evicted from the frontier set unless it could -not be contacted, or unless overridden by a local configuration option. - -### Forwarding Data - -The Stacks peer network propagates blocks, microblocks, and -transactions by flooding them. In particular, a peer can send other peers -an unsolicited `BlocksAvailable`, `MicroblocksAvailable`, `BlocksData`, `MicroblocksData`, -and `Transaction` message. - -If the message has not been seen before by the peer, and the data is valid, then the peer -forwards it to a subset of its neighbors (excluding the one that sent the data). -If it has seen the data before, it does not forward -it. The process for determining whether or not a block or transaction is valid -is discussed in SIP 005. However, at a high level, the following -policies hold: - -* A `StacksBlock` can only be valid if it corresponds to block commit - transaction on the burn chain that won sortition. A peer may cache a -`StacksBlock` if it determines that it has not yet processed the sortition that -makes it valid. A `StacksBlock` is never forwarded by the recipient; -instead, the recipient peer sends a `BlocksAvailable` message to its neighbors. -* A `StacksMicroblock` can only be valid if it corresponds to a valid - `StacksBlock` or a previously-accepted `StacksMicroblock`. A peer may cache a -`StacksMicroblock` if it determines that a yet-to-arrive `StacksBlock` or -`StacksMicroblock` could make it valid in the near-term, but if the -`StacksMicroblock`'s parent `StacksBlock` is unknown, the -`StacksMicroblock` will _not_ be forwarded. -* A `Transaction` can only be valid if it encodes a legal state transition on - top of the peer's currently-known canonical Stacks blockchain tip. -A peer will _neither_ cache _nor_ relay a `Transaction` message if it cannot -determine that it is valid. - -#### Client Peers - -Messages can be forwarded to both outbound connections to other neighbors and to inbound -connections from clients -- i.e. remote peers that have this peer as a next-hop -neighbor. Per the above text, outbound neighbors are selected as the -next message recipients based on how rare their AS is in the frontier. - -Inbound peers are handled separately. In particular, a peer does not crawl -remote inbound connections, nor does it synchronize their peers' block inventories. -Inbound peers tend to be un-routable peers, such as those running behind NATs on -private, home networks. However, such peers can still send -unsolicited blocks, microblocks, and transactions to publicly-routable -peers, and those publicly-routable peers will need to forward them to both its -outbound neighbors as well as its own inbound peers. To do the latter, -a peer will selectively forward data to its inbound peers in a way that is -expected to minimize the number of _duplicate_ messages the other peers will -receive. - -To do this, each peer uses the `relayers` vector in each message -it receives from an inbound peer to keep track of which peers have forwarded -the same messages. It will then choose inbound peers to receive a forwarded message -based on how _infrequently_ the inbound recipient has sent duplicate messages. - -The intuition is that if an inbound peer forwards many messages -that this peer has already seen, then it is likely that the inbound per is also -connected to a (unknown) peer that is already able to forward it data. -That is, if peer B has an inbound connectino to peer A, and -peer A observes that peer B sends it messeges that it has already seen recently, -then peer A can infer that there exists an unknown peer C that is forwarding -messages to peer B before peer A can do so. Therefore, when selecting inbound -peers to receive a message, peer A can de-prioritize peer B based on the -expectation that peer B will be serviced by unknown peer C. - -To make these deductions, each peer maintains a short-lived (i.e. 10 minutes) -set of recently-seen message digests, as well as the list of which peers have sent -each message. Then, when selecting inbound peers to receive a message, the peer -calculates for each inbound peer a "duplicate rank" equal to the number of times -it sent an already-seen message. The peer then samples the inbound peers -proportional to `1 - duplicate_rank / num_messages_seen`. - -It is predicted that there will be more NAT'ed peers than public peers. -Therefore, when forwarding a message, a peer will select more inbound peers -(e.g. 16) than outbound peers (e.g. 8) when forwarding a new message. - -## Reference Implementation - -Implemented in Rust. The neighbor set size K is set to 16. The frontier set size -is set to hold 2^24 peers (with evictions becoming likely after insertions once -it has 32768 entries). - -[1] See https://arxiv.org/abs/1204.4140 for details on the MHRWDA algorithm. -[2] https://stuff.mit.edu/people/medard/rls.pdf +This SIP is now located in the [stacksgov/sips repository](https://github.com/stacksgov/sips/blob/main/sips/sip-003/sip-003-peer-network.md) as part of the [Stacks Community Governance organization](https://github.com/stacksgov). diff --git a/sip/sip-004-materialized-view.md b/sip/sip-004-materialized-view.md index d0cd86f71..def065a17 100644 --- a/sip/sip-004-materialized-view.md +++ b/sip/sip-004-materialized-view.md @@ -1,750 +1,5 @@ -# SIP 004 Cryptographic Committment to Materialized Views +# SIP-004 Cryptographic Committment to Materialized Views -## Preamble +This document formerly contained SIP-004 before the Stacks 2.0 mainnet launched. -Title: Cryptograhpic Commitment to Materialized Views - -Author: Jude Nelson - -Status: Draft - -Type: Standard - -Created: 7/15/2019 - -License: BSD 2-Clause - -## Abstract - -Blockchain peers are replicated state machines, and as such, must maintain a -materialized view of all of the state the transaction log represents in order to -validate a subsequent transaction. The Stacks blockchain in particular not only -maintains a materialized view of the state of every fork, but also requires -miners to cryptographically commit to that view whenever they mine a block. -This document describes a **Merklized Adaptive Radix Forest** (MARF), an -authenticated index data structure for efficiently encoding a -cryptographic commitment to blockchain state. - -The MARF's structure is part of the consensus logic in the Stacks blockchain -- -every Stacks peer must process the MARF the same way. Stacks miners announce -a cryptographic hash of their chain tip's MARF in the blocks they produce, and in -doing so, demonstrate to each peer and each light client that they have -applied the block's transactions to the peer's state correctly. - -The MARF represents blockchain state as an authenticated directory. State is -represented as key/value pairs. The MARF structure gives a peer the ability to -prove to a light client that a particular key has a particular value, given the -MARF's cryptographic hash. The proof has _O(log B)_ space for _B_ blocks, and -takes _O(log B)_ time complexity to produce and verify. In addition, it offers -_O(1)_ expected time and space complexity for inserts and queries. -The MARF proof allows a light client to determine: - -* What the value of a particular key is, -* How much cumulative energy has been spent to produce the key/value pair, -* How many confirmations the key/value pair has. - -## Rationale - -In order to generate a valid transaction, a blockchain client needs to be able -to query the current state of the blockchain. For example, in Bitcoin, a client -needs to query its unspent transaction outputs (UTXOs) in order to satisfy their -spending conditions in a new transaction. As another example, in Ethereum, a -client needs to query its accounts' current nonces in order to generate a valid -transaction to spend their tokens. - -Whether or not a blockchain's peers are required to commit to the current state -in the blocks themselves (i.e. as part of the consensus logic) is a -philosophical decision. We argue that it is a highly desirable in Blockstack's -case, since it affords light clients more security when querying the blockchain state than -not. This is because a client often queries state that was last updated several -blocks in the past (i.e. and is "confirmed"). If a blockchain peer can prove to -a client that a particular key in the state has a particular value, and was last -updated a certain number of blocks in the past, then the client can determine -whether or not to trust the peer's proof based on factors beyond simply trusting -the remote peer to be honest. In particular, the client can determine how -difficult it would be to generate a dishonest proof, in terms of the number of -blocks that would need to be maliciously crafted and accepted by the network. -This offers clients some protection against peers that would lie to them -- a -lying peer would need to spend a large amount of energy (and money) in order to -do so. - -Specific to Blockstack, we envision that many applications will run -their own Stacks-based blockchain peer networks that operate "on top" of the -Stacks blockchain through proof-of-burn. This means that the Blockstack -application ecosystem will have many parallel "app chains" that users may wish -to interact with. While a cautious power user may run validator nodes for each -app chain they are interested in, we expect that most users will not do so, -especially if they are just trying out the application or are casual users. In -order to afford these users better security than simply telling them to find a -trusted validating peer, it is essential that each Stacks peer commits to its -materialized view in each block. - -On top of providing better security to light clients, committing to the materialized -state view in each block has the additional benefit of helping the peer network -detect malfunctioning miners early on. A malfunctioning miner will calculate a -different materialized view using the same transactions, and with overwhelmingly -high probability, will also calculate a different state view hash. This makes -it easy for a blockchain's peers to reject a block produced in this manner -outright, without having to replay its transactions. - -### Design Considerations - -Committing to the materialized view in each block has a non-zero cost in terms -of time and space complexity. Given that Stacks miners use PoW to increase -their chances of winning a block race, the time required to calculate -the materialized view necessarily cuts into the time -required to solve the PoW puzzle -- it is part of the block validation logic. -While this is a cost borne by each miner, the fact that PoW mining is a zero-sum game -means that miners that are able to calculate the materialized view the fastest will have a -better chance of winning a block race than those who do not. This means that it -is of paramount importance to keep the materialized view digest calculation as -fast as possible, just as it is of paramount importance to make block -validation as fast and cheap as possible. - -The following considerations have a non-trivial impact on the design of the -MARF: - -**A transaction can read or write any prior state in the same fork.** This -means that the index must support fast random-access reads and fast -random writes. - -**The Stacks blockchain can fork, and a miner can produce a fork at any block -height in the past.** As argued in SIP 001, a Stacks blockchain peer must process -all forks and keep their blocks around. This also means that a peer needs to -calculate and validate the materialized view of each fork, no matter where it -occurs. This is also necessary because a client may request a proof for some -state in any fork -- in order to service such requests, the peer must calculate -the materialized view for all forks. - -**Forks can occur in any order, and blocks can arrive in any order.** As such, -the runtime cost of calculating the materialized view must be _independent_ of the -order in which forks are produced, as well as the order in which their blocks -arrive. This is required in order to avoid denial-of-service vulnerabilities, -whereby a network attacker can control the schedules of both -forks and block arrivals in a bid to force each peer to expend resources -validating the fork. It must be impossible for an attacker to -significantly slow down the peer network by maliciously varying either schedule. -This has non-trivial consequences for the design of the data structures for -encoding materialized views. - -## Specification - -The Stacks peer's materialized view is realized as a flat key/value store. -Transactions encode zero or more creates, inserts, updates, and deletes on this -key/value store. As a consequence of needing to support forks from any prior block, -no data is ever removed; instead, a "delete" on a particular key is encoded -by replacing the value with a tombstone record. The materialized view is the -subset of key/value pairs that belong to a particular fork in the blockchain. - -The Stacks blockchain separates the concern of maintaining _authenticated -index_ over data from storing a copy of the data itself. The blockchain peers -commit to the digest of the authenticated index, but can store the data however -they want. The authenticated index is realized as a _Merklized Adaptive Radix -Forest_ (MARF). The MARF gives Stacks peers the ability to prove that a -particular key in the materialized view maps to a particular value in a -particular fork. - -A MARF has two principal data structures: a _merklized adaptive radix trie_ -for each block and a _merklized skip-list_ that -cryptographically links merklized adaptive radix tries in prior blocks to the -current block. - -### Merklized Adaptive Radix Tries (ARTs) - -An _adaptive radix trie_ (ART) is a prefix tree where each node's branching -factor varies with the number of children. In particular, a node's branching -factor increases according to a schedule (0, 4, 16, 48, 256) as more and more -children are added. This behavior, combined with the usual sparse trie -optimizations of _lazy expansion_ and _path compression_, produce a tree-like -index over a set key/value pairs that is _shallower_ than a perfectly-balanced -binary search tree over the same values. Details on the analysis of ARTs can -be found in [1]. - -To produce an _index_ over new state introduced in this block, the Stacks peer -will produce an adaptive radix trie that describes each key/value pair modified. -In particular, for each key affected by the block, the Stacks peer will: -* Calculate the hash of the key to get a fixed-length trie path, -* Store the new value and this hash into its data store, -* Insert or update the associated value hash in the block's ART at the trie path, -* Calculate the new Merkle root of the ART by hashing all modified intermediate - nodes along the path. - -In doing so, the Stacks peer produces an authenticated index for all key/value -pairs affected by a block. The leaves of the ART are the hashes of the values, -and the hashes produced in each intermediate node and root give the peer a -way to cryptographically prove that a particular value is present in the ART -(given the root hash and the key). - -The Stacks blockchain employs _path compression_ and _lazy expansion_ -to efficiently represent all key/value pairs while minimizing the number of trie -nodes. That is, if two children share a common prefix, the prefix bytes are -stored in a single intermediate node instead of being spread across multiple -intermediate nodes (path compression). In the special case where a path suffix -uniquely identifies the leaf, the path suffix will be stored alongside the leaf -instead as a sequence of intermediate nodes (lazy expansion). As more and more -key/value pairs are inserted, intermediate nodes and leaves with multi-byte -paths will be split into more nodes. - -**Trie Structure** - -A trie is made up of nodes with radix 4, 16, 48, or 256, as well as leaves. In -the documentation below, these are called `node4`, `node16`, `node48`, -`node256`, and `leaf` nodes. An empty trie has a single `node256` as its root. -Child pointers occupy one byte. - -**Notation** - -The notation `(ab)node256` means "a `node256` who descends from its parent via -byte 0xab". - -The notation `node256[path=abcd]` means "a `node256` that has a shared prefix -with is children `abcd`". - -**Lazy Expansion** - -If a leaf has a non-zero-byte path suffix, and another leaf is inserted that -shares part of the suffix, the common bytes will be split off of the existing -leaf to form a `node4`, whose two immediate children are the two leaves. Each -of the two leaves will store the path bytes that are unique to them. For -example, consider this trie with a root `node256` and a single leaf, located at -path `aabbccddeeff00112233` and having value hash `123456`: - -``` -node256 - \ - (aa)leaf[path=bbccddeeff00112233]=123456 -``` - -If the peer inserts the value hash `98765` at path `aabbccddeeff998877`, the -single leaf's path will be split into a shared prefix and two distinct suffixes, -as follows: - -``` -insert (aabbccddeeff998877, 98765) - -node256 (00)leaf[path=112233]=123456 - \ / - (aa)node4[path-bbccddeeff] - \ - (99)leaf[path=887766]=98765 -``` - -Now, the trie encodes both `aabbccddeeff00112233=123456` and -`aabbccddeeff99887766=98765`. - -**Node Promotion** - -As a node with a small radix gains children, it will eventually need to be -promoted to a node with a higher radix. A `node4` will become a `node16` when -it receives its 5th child; a `node16` will become a `node48` when it receives -its 17th child, and a `node48` will become a `node256` when it receives its 49th -child. A `node256` will never need to be promoted, because it has slots for -child pointers with all possible byte values. - -For example, consider this trie with a `node4` and 4 children: - -``` -node256 (00)leaf[path=112233]=123456 - \ / - \ / (01)leaf[path=445566]=67890 - \ / / - (aa)node4[path=bbccddeeff]--- - \ \ - \ (02)leaf[path=778899]=abcdef - \ - (99)leaf[path=887766]=98765 -``` - -This trie encodes the following: - * `aabbccddeeff00112233=123456` - * `aabbccddeeff01445566=67890` - * `aabbccddeeff02778899=abcdef` - * `aabbccddeeff99887766=9876` - -Inserting one more node with a prefix `aabbccddeeff` will promote the -intermediate `node4` to a `node16`: - -``` -insert (aabbccddeeff03aabbcc, 314159) - -node256 (00)leaf[path=112233]=123456 - \ / - \ / (01)leaf[path=445566]=67890 - \ / / - (aa)node16[path=bbccddeeff]-----(03)leaf[path=aabbcc]=314159 - \ \ - \ (02)leaf[path=778899]=abcdef - \ - (99)leaf[path=887766]=98765 -``` - -The trie now encodes the following: - * `aabbccddeeff00112233=123456` - * `aabbccddeeff01445566=67890` - * `aabbccddeeff03aabbcc=314159` - * `aabbccddeeff02778899=abcdef` - * `aabbccddeeff99887766=9876` - -**Path Compression** - -Intermediate nodes, such as the `node16` in the previous example, store path -prefixes shared by all of their children. If a node is inserted that shares -some of this prefix, but not all of it, the path is "decompressed" -- a new -leaf is "spliced" into the compressed path, and attached to a `node4` whose two -children are the leaf and the existing node (i.e. the `node16` in this case) -whose shared path now contains the suffix unique to its children, but distinct -from the newly-spliced leaf. - -For example, consider this trie with the intermediate `node16` sharing a path -prefix `bbccddeeff` with its 5 children: - -``` -node256 (00)leaf[path=112233]=123456 - \ / - \ / (01)leaf[path=445566]=67890 - \ / / - (aa)node16[path=bbccddeeff]-----(03)leaf[path=aabbcc]=314159 - \ \ - \ (02)leaf[path=778899]=abcdef - \ - (99)leaf[path=887766]=98765 -``` - -This trie encodes the following: - * `aabbccddeeff00112233=123456` - * `aabbccddeeff01445566=67890` - * `aabbccddeeff03aabbcc=314159` - * `aabbccddeeff02778899=abcdef` - * `aabbccddeeff99887766=9876` - -If we inserted `(aabbcc001122334455, 21878)`, the `node16`'s path would be -decompressed to `eeff`, a leaf with the distinct suffix `1122334455` would be spliced -in via a `node4`, and the `node4` would have the shared path prefix `bbcc` with -its now-child `node16` and leaf. - -``` -insert (aabbcc00112233445566, 21878) - - (00)leaf[path=112233445566]=21878 - / -node256 / (00)leaf[path=112233]=123456 - \ / / - (aa)node4[path=bbcc] / (01)leaf[path=445566]=67890 - \ / / - (dd)node16[path=eeff]-----(03)leaf[path=aabbcc]=314159 - \ \ - \ (02)leaf[path=778899]=abcdef - \ - (99)leaf[path=887766]=98765 -``` - -The resulting trie now encodes the following: - * `aabbcc00112233445566=21878` - * `aabbccddeeff00112233=123456` - * `aabbccddeeff01445566=67890` - * `aabbccddeeff03aabbcc=314159` - * `aabbccddeeff02778899=abcdef` - * `aabbccddeeff99887766=9876` - -### Back-pointers - -The materialized view of a fork will hold key/value pairs for data produced by -applying _all transactions_ in that fork, not just the ones in the last block. As such, -the index over all key/value pairs in a fork is encoded in the sequence of -its block's merklized ARTs. - -To ensure that random reads and writes on the a fork's materialized view remain -fast no matter which block added them, a child pointer in an ART can point to -either a node in the same ART, or a node with the same path in a prior ART. For -example, if the ART at block _N_ has a `node16` whose path is `aabbccddeeff`, and 10 -blocks ago a leaf was inserted at path `aabbccddeeff99887766`, it will -contain a child pointer to the intermediate node from 10 blocks ago whose path is -`aabbccddeeff` and who has a child node in slot `0x99`. This information is encoded -as a _back-pointer_. To see it visually: - -``` -At block N - - -node256 (00)leaf[path=112233]=123456 - \ / - \ / (01)leaf[path=445566]=67890 - \ / / - (aa)node16[path=bbccddeeff]-----(03)leaf[path=aabbcc]=314159 - \ \ - \ (02)leaf[path=778899]=abcdef - \ - | - | - | -At block N-10 - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - | -node256 | /* back-pointer to block N-10 */ - \ | - \ | - \ | - (aa)node4[path=bbccddeeff] | - \ | - \ | - \ | - (99)leaf[path=887766]=98765 -``` - -By maintaining trie child pointers this way, the act of looking up a path to a value in -a previous block is a matter of following back-pointers to previous tries. -This back-pointer uses the _block-hash_ of the previous block to uniquely identify -the block. In order to keep the in-memory and on-disk representations of trie nodes succint, -the MARF structure uses a locally defined unsigned 32-bit integer to identify the previous -block, along with a local mapping of such integers to the respective block header hash. - -Back-pointers are calculated in a copy-on-write fashion when calculating the ART -for the next block. When the root node for the ART at block N+1 is created, all -of its children are set to back-pointers that point to the immediate children of -the root of block N's ART. Then, when inserting a key/value pair, the peer -walks the current ART to the insertion point, but whenever a -back-pointer is encountered, it copies the node it points to into the current -ART, and sets all of its non-empty child pointers to back-pointers. The peer -then continues traversing the ART until the insertion point is found (i.e. a -node has an unallocated child pointer where the leaf should go), copying -over intermediate nodes lazily. - -For example, consider the act of inserting `aabbccddeeff00112233=123456` into an -ART where a previous ART contains the key/value pair -`aabbccddeeff99887766=98765`: - -``` -At block N - - -node256 (00)leaf[path=112233]=123456 -^ \ / -| \ / -| \ / -| (aa)node4[path=bbccddeeff] -| ^ \ -| | \ -| /* 1. @root. */ | /* 2. @node4. */ \ /* 3. 00 is empty, so insert */ -| /* copy up, &*/ | /* copy up, & */ | -| /* make back-*/ | /* make back- */ | -| /* ptr to aa */ | /* ptr to 99 */ | -| | | -|- At block N-10 -|- - - - - - - - - - | - - - - - - - - - - - - - - - - - - -| | | -node256 | | - \ | | - \ | | - \ | | - (aa)node4[path=bbccddeeff] | - \ | - \ | - \| - (99)leaf[path=887766]=98765 -``` - -In step 1, the `node256` in block _N_ would have a back-pointer to the `node4` in -block _N - 10_ in child slot `0xaa`. While walking path `aabbccddeeff00112233`, -the peer would follow slot `0xaa` to the `node4` in block _N - 10_ and copy it -into block _N_, and would set its child pointer at `0x99` to be a back-pointer -to the `leaf` in block _N - 10_. It would then step to the `node4` it copied, -and walk path bytes `bbccddeeff`. When it reaches child slot `0x00`, the peer -sees that it is unallocated, and attaches the leaf with the unexpanded path -suffix `112233`. The back-pointer to `aabbccddeeff99887766=98765` is thus -preserved in block _N_'s ART. - -**Calculating the Root Hash with Back-pointers** - -For reasons that will be explained in a moment, the hash of a child node that is a -back-pointer is not calculated the usual way when calculating the root hash of -the Merklized ART. Instead of taking the hash of the child node (as would be -done for a child in the same ART), the hash of the _block header_ is used -instead. In the above example, the hash of the `leaf` node whose path is -`aabbccddeeff99887766` would be the hash of block _N - 10_'s header, whereas the -hash of the `leaf` node whose path is `aabbccddeeff00112233` would be the hash -of the value hash `123456`. - -The main reason for doing this is to keep block validation time down by a -significant constant factor. The block header hash is always kept in RAM, -but at least one disk seek is required to read the hash of a child in a separate -ART (and it often takes more than one seek). This does not sacrifice the security -of a Merkle proof of `aabbccddeeff99887766=98765`, but it does alter the mechanics -of calculating and verifying it. - -### Merklized Skip-list - -The second principal data structure in a MARF is a Merklized skip-list encoded -from the block header hashes and ART root hashes in each block. The hash of the -root node in the ART for block _N_ is derived not only from the hash of the -root's children, but also from the hashes of the block headers from blocks -`N - 1`, `N - 2`, `N - 4`, `N - 8`, `N - 16`, and so on. This constitutes -a _Merklized skip-list_ over the sequence of ARTs. - -The reason for encoding the root node's hash this way is to make it possible for -peers to create a cryptographic proof that a particular key maps to a particular -value when the value lives in a prior block, and can only be accessed by -following one or more back-pointers. In addition, the Merkle skip-list affords -a client _two_ ways to verify key-value pairs: the client only needs either (1) -a known-good root hash, or (2) the sequence of block headers for the Stacks -chain and its underlying burn chain. Having (2) allows the client to determine -(1), but calculating (2) is expensive for a client doing a small number of -queries. For this reason, both options are supported. - -#### Resolving Block Height Queries - -For a variety of reasons, the MARF structure must be able to resolve -queries mapping from block heights (or relative block heights) to -block header hashes and vice-versa --- for example, the Clarity VM -allows contracts to inspect this information. Most applicable to the -MARF, though, is that in order to find the ancestor hashes to include -in the Merklized Skip-list, the data structure must be able to find -the block headers which are 1, 2, 4, 8, 16, ... blocks prior in the -same fork. This could be discovered by walking backwards from the -current block, using the previous block header to step back through -the fork's history. However, such a process would require _O(N)_ steps -(where _N_ is the current block height). But, if a mapping exists for -discovering the block at a given block height, this process would instead -be _O(1)_ (because a node will have at most 32 such ancestors). - -But correctly implementing such a mapping is not trivial: a given -height could resolve to different blocks in different forks. However, -the MARF itself is designed to handle exactly these kinds of -queries. As such, at the beginning of each new block, the MARF inserts -into the block's trie two entries: - -1. This block's block header hash -> this block's height. -2. This block's height -> this block's block header hash. - -This mapping allows the ancestor hash calculation to proceed. - -### MARF Merkle Proofs - -A Merkle proof for a MARF is constructed using a combination of two types of -sub-proofs: _segment proofs_, and _shunt proofs_. A _segment proof_ is a proof -that a node belongs to a particular Merklized ART. It is simply a Merkle tree -proof. A _shunt proof_ is a proof that the ART for block _N_ is exactly _K_ -blocks away from the ART at block _N - K_. It is generated as a Merkle proof -from the Merkle skip-list. - -Calculating a MARF Merkle proof is done by first calculating a segment proof for a -sequence of path prefixes, such that all the nodes in a single prefix are in the -same ART. To do so, the node walks from the current block's ART's root node -down to the leaf in question, and each time it encounters a back-pointer, it -generates a segment proof from the _currently-visited_ ART to the intermediate -node whose child is the back-pointer to follow. If a path contains _i_ -back-pointers, then there will be _i+1_ segment proofs. - -Once the peer has calculated each segment proof, it calculates a shunt proof -that shows that the _i+1_th segment was reached by walking back a given number -of blocks from the _i_th segment by following the _i_th segment's back-pointer. -The final shunt proof for the ART that contains the leaf node includes all of -the prior block header hashes that went into producing its root node's hash. -Each shunt proof is a sequence of sequences of block header hashes and ART root -hashes, such that the hash of the next ART root node can be calculated from the -previous sequence. - -For example, consider the following ARTs: - -``` -At block N - - -node256 (00)leaf[path=112233]=123456 - \ / - \ / (01)leaf[path=445566]=67890 - \ / / - (aa)node16[path=bbccddeeff]-----(03)leaf[path=aabbcc]=314159 - \ \ - \ (02)leaf[path=778899]=abcdef - \ - | - | - | -At block N-10 - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - | -node256 | /* back-pointer to N - 10 */ - \ | - \ | - \ | - (aa)node4[path=bbccddeeff] | - \ | - \ | - \ | - (99)leaf[path=887766]=98765 -``` - -To generate a MARF Merkle proof, the client queries a Stacks peer for a -particular value hash, and then requests the peer generate a proof that the key -and value must have been included in the calculation of the current block's ART -root hash (i.e. the digest of the materialized view of this fork). - -For example, given the key/value pair `aabbccddeeff99887766=98765` and the hash -of the ART at block _N_, the peer would generate two segment proofs for the -following paths: `aabbccddeeff` in block _N_, and `aabbccddeeff99887766` in -block `N - 10`. - -``` -At block N - - -node256 - \ /* this segment proof would contain the hashes of all other */ - \ /* children of the root, except for the one at 0xaa. */ - \ - (aa)node16[path=bbccddeeff] - -At block N-10 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -node256 /* this segment proof would contain two sequences of hashes: */ - \ /* the hashes for all children of the root besides 0xaa, and */ - \ /* the hashes of all children of the node4, except 0x99. */ - \ - (aa)node4[path=bbccddeeff] - \ - \ - \ - (99)leaf[path=887766]=98765 -``` - -Then, it would calculate two shunt proofs. The first proof, called the "head shunt proof," -supplies the sequence of block hashes for blocks _N - 11, N - 12, N - 14, N - 18, N - 26, ..._ and the -hash of the children of the root node of the ART for block _N - 10_. This lets the -client calculate the hash of the root of the ART at block _N - 10_. The second -shunt proof (and all subsequent shunt proofs, if there are more back-pointers to -follow) is comprised of the hashes that "went into" calculating the hashes on the -skip-list from the next segment proof's root hash. - -In detail, the second shunt proof would have two parts: - -* the block header hashes for block _N - 9_ _N - 12_, _N - 16_, _N - 24_, ... -* the block header hashes for _N - 1_, _N - 2_, _N - 4_, _N - 16_, _N - 32_, ... - -The reason there are two sequences in this shunt proof is because "walking back" -from block _N_ to block _N - 10_ requires walking first to block _N - 8_ (i.e. -following the skip-list column for 2 ** 3), and then walking to block _N - 10_ -from _N - 8_ (i.e. following its skip-list column for 2 ** 1). The first segment -proof (i.e. with the leaf) lets the client calculate the hash of the children of -the ART root node in block _N - 10_, which when combined with the first part of -this shunt proof yields the ART root hash for _N - 8_. Then, the client -uses the hash of the children of the root node in the ART of block _N_ (calculated from the second segment -proof), combined with the root hash from node _N - 8_ and with the hashes -in the second piece of this shunt proof, to calculate the ART root hash for -block _N_. The proof is valid if this calculated root hash matches the root -hash for which it requested the proof. - -In order to fully verify the MARF Merkle proof, the client would verify that: - -* The first segment proof's path's bytes are equal to the hash of the key for - which the proof was requested. -* The first segment proof ends in a leaf node, and the leaf node contains the - hash of the value for which the proof was requested. -* Each segment proof is valid -- the root hash could only be calculated from the - deepest intermediate node in the segment, -* Each subsequent segment proof was generated from a prefix of the path - represented by the current segment proof, -* Each back-pointer at the tail of each segment (except the one that terminates - in the leaf -- i.e. the first one) was a number of blocks back that is equal - to the number of blocks skipped over in the shunt proof linking it to the next - segment. -* Each block header was included in the fork the client is querying, -* Each block header was generated from its associated ART root hash, -* (Optional, but encouraged): The burn chain block headers demonstrate that the - correct difficulty rules were followed. This step can be skipped if the -client somehow already knows that the hash of block _N_ is valid. - -Note that to verify the proof, the client would need to substitute the -_block header hash_ for each intermediate node at the tail of each segment -proof. The block header hash can either be obtained by fetching the block -headers for both the Stacks chain and burn chain _a priori_ and verifying that -they are valid, or by fetching them on-the-fly. The second strategy should only -be used if the client's root hash it submits to the peer is known out-of-band to -be the correct hash. - -The security of the proof is similar to SPV proofs in Bitcoin -- the proof is -valid assuming the client is able to either verify that the final header hash -represents the true state of the network, or the client is able to fetch the -true burn chain block header sequence. The client has some assurance that a -_given_ header sequence is the _true_ header sequence, because the header -sequence encodes the proof-of-work that went into producing it. A header -sequence with a large amount of proof-of-work is assumed to be infeasible for an -attacker to produce -- i.e. only the majority of the burn chain's network hash -power could have produced the header chain. Regardless of which data the client -has, the usual security assumptions about confirmation depth apply -- a proof -that a key maps to a given value is valid only if the transaction that set -it is unlikely to be reversed by a chain reorg. - -### Performance - -The time and space complexity of a MARF is as follows: - -* **Reads are _O(1)_** While reads may traverse multiple tries, they are always - descending the radix trie, and resolving back pointers is constant time. -* **Inserts and updates are _O(1)._** Inserts have the same complexity - as reads, though they require more work by constant factors (in - particular, hash recalculations). -* **Creating a new block is _O(log B)_.** Inserting a block requires - including the Merkle skip-list hash in the root node of the new - ART. This is _log B_ work, where _B_ is chain length. -* **Creating a new fork is _O(log B)_.** Forks do not incur any overhead relative - to appending a block to a prior chain-tip. -* **Generating a proof is _O(log B)_ for B blocks**. This is the cost of - reading a fixed number of nodes, combined with walking the Merkle skip-list. -* **Verifying a proof is _O(log B)_**. This is the cost of verifying a fixed - number of fixed-length segments, and verifying a fixed number of _O(log B)_ - shunt proof hashes. -* **Proof size is _O(log B)_**. A proof has a fixed number of segment proofs, - where each node has a constant size. It has _O(log B)_ hashes across all of - its shunt proofs. - -### Consensus Details - -The hash function used to generate a path from a key, as well as the hash -function used to generate a node hash, is SHA2-512/256. This was chosen because -it is extremely fast on 64-bit architectures, and is immune to length extension -attacks. - -The hash of an intermediate node is the hash over the following data: - -* a 1-byte node ID, -* the sequence of child pointer data (dependent on the type of node), -* the 1-byte length of the path prefix this node contains, -* the 0-to-32-byte path prefix - -A single child pointer contains: -* a 1-byte node ID, -* a 1-byte path character, -* the 32-byte block header hash of the pointed-to block - -A `node4`, `node16`, `node48`, and `node256` each have an array of 4, -16, 48, and 256 child pointers each. - -Children are listed in a `node4`, `node16`, and `node48`'s child pointer arrays in the -order in which they are inserted. While searching for a child in a `node4` or -`node16` requires a linear scan of the child pointer array, searching a `node48` is done -by looking up the child's index in its child pointer array using the -path character byte as an index into the `node48`'s 256-byte child pointer -index, and then using _that_ index to look up the child pointer. Children are -inserted into the child pointer array of a `node256` by using the 1-byte -path character as the index. - -The disk pointer stored in a child pointer, as well as the storage mechanism for -mapping hashes of values (leaves in the MARF) to the values themselves, are both -unspecified by the consensus rules. Any mechanism or representation is -permitted. - -## Implementation - -The implementation is in Rust, and is about 4,400 lines of code. It stores each -ART in a separate file, where each ART file contains the hash of the previous -block's ART's root hash and the locally-defined block identifiers. - -The implementation is crash-consistent. It builds up the ART for block _N_ in -RAM, dumps it to disk, and then `rename(2)`s it into place. - -The implementation uses a Sqlite3 database to map values to their hashes. A -read on a given key will first pass through the ART to find hash(value), and -then query the Sqlite3 database for the value. Similarly, a write will first -insert hash(value) and value into the Sqlite3 database, and then insert -hash(key) to hash(value) in the MARF. - -## References - -[1] https://db.in.tum.de/~leis/papers/ART.pdf +This SIP is now located in the [stacksgov/sips repository](https://github.com/stacksgov/sips/blob/main/sips/sip-004/sip-004-materialized-view.md) as part of the [Stacks Community Governance organization](https://github.com/stacksgov). diff --git a/sip/sip-005-blocks-and-transactions.md b/sip/sip-005-blocks-and-transactions.md index f82cbc137..eda0f3000 100644 --- a/sip/sip-005-blocks-and-transactions.md +++ b/sip/sip-005-blocks-and-transactions.md @@ -1,1396 +1,5 @@ -# SIP 005 Blocks, Transactions, and Accounts +# SIP-005 Blocks, Transactions, and Accounts -## Preamble +This document formerly contained SIP-005 before the Stacks 2.0 mainnet launched. -Title: Blocks, Transactions, and Accounts - -Authors: Jude Nelson , Aaron Blankstein - - -Status: Draft - -Type: Standard - -Created: 7/23/2019 - -License: BSD 2-Clause - -## Abstract - -This SIP describes the structure, validation, and lifecycle for transactions and blocks in -the Stacks blockchain, and describes how each peer maintains a materialized view -of the effects of processing all state-transitions encoded in the blockchain's sequence of -transactions. It presents the account model for the Stacks blockchain, and -describes how accounts authorize and pay for processing transactions on the -network. - -## Rationale - -The Stacks blockchain is a replicated state machine. -A _transaction_ encodes a single state-transition on the -Stacks blockchain. The Stacks blockchain's state evolves by materializing the -effects of a sequence of transactions -- i.e. by applying each transaction's encoded -state-transitions to the blockchain's state. - -Transactions in the Stacks blockchain encode various kinds of state-transitions, -the principal ones being: - -* To instantiate a smart contract (see SIP 002) -* To invoke a public smart contract function -* To transfer STX tokens between accounts -* To punish leaders who fork their microblock streams (see SIP 001) -* To allow leaders to perform limited on-chain signaling - -Processing transactions is not free. Each step in the process of validating and -executing the transaction incurs a non-zero computational cost. To incentivize -peers and leaders to execute transactions, the transaction's computational costs -are paid for by an _account_. - -An _account_ is the logical entity that executes and/or pays for transactions. A transaction's -execution is governed by three accounts, which may or may not be distinct: - -* The **originating account** is the account that creates and sends the - transaction. This is always an account owned by a user. Each transaction is -_authorized_ by its originating account. - -* The **paying account** is the account that is billed by the leader for the cost - of validating and executing the transaction. This is also always an account -owned by a user. If not identified in the transaction, the paying account and -the originating account are the same account. - -* The **sending account** is the account that identifies _who_ is - _currently_ executing the transaction. The sending account can - change during the course of transaction execution via the Clarity - function `as-contract`, which executes the provided code block as - the _current contract's_ account. Each transaction's initial sending - account is its originating account -- i.e. the account that - authorizes the transaction. Smart contracts determine the sending - account's principal using the `tx-sender` built-in function. - -This document frames accounts in the Stacks blockchain as the unit of agency for -processing transactions. The tasks -that a transaction carries out are used to inform the decisions on what -data goes into the transaction, as well as the data that goes into a block. -As such, understanding blocks and transactions in the Stacks blockchain first -requires understanding accounts. - -## Accounts - -Transactions in the Stacks blockchain originate from, are paid for by, and -execute under the authority of accounts. An account is fully -described by the following information: - -* **Address**. This is a versioned cryptographic hash that uniquely identifies the - account. The type of account (described below) determines what information is -hashed to derive the address. The address itself contains two or three fields: - * A 1-byte **version**, which indicates whether or not the address - corresponds to a mainnet or testnet account and what kind of hash algorithm -to use to generate its hash. - * A 20-byte **public key hash**, which is calculated using the address - version and the account's owning public key(s). - * A variable-length **name**. This is only used in contract accounts, and it - identifies the code body that belongs to this account. The name - may be up to 128 bytes. Accounts belonging to users do not have this field. - -* **Nonce**. This is a Lamport clock, used for ordering the transactions - originating from and paying from an account. The nonce ensures that a transaction -is processed at most once. The nonce counts the number of times -an account's owner(s) have authorized a transaction (see below). -The first transaction from an account will have a nonce value equal to 0, -the second will have a nonce value equal to 1, and so -on. A valid transaction authorization from this account's owner(s) must include the _next_ nonce -value of the account; when the transaction is accepted by the peer network, the -nonce is incremented in the materialized view of this account. - -* **Assets**. This is a mapping between all Stacks asset types and the - quantities of each type owned by the account. This includes the STX token, as -well as any other on-chain assets declared by a Clarity smart contract (i.e. -fungible and non-fungible tokens). - -All accounts for all possible addresses are said to exist, but nearly all of -them are "empty" -- they have a nonce value of 0, and their asset mappings -contain no entries. The state for an account is lazily materialized once -the Stacks peer network processes a transaction that _funds_ it. -That is, the account state is materialized only once a transaction's state-transition inserts -an entry into an account's assets mapping for some (possibly zero) quantity of some asset. -Even if the account depletes all asset holdings, it remains materialized. -Materialized accounts are distinguished from empty accounts in that the former -are all represented in a leader's commitment to its materialized view of the blockchain state -(described below). - -### Account Types - -The Stacks blockchain supports two kinds of accounts: - -* **Standard accounts**. These are accounts owned by one or more private keys. - Only standard accounts can originate and pay for transactions. A transaction originating -from a standard account is only valid if a threshold of its private keys sign -it. The address for a standard account is the hash of this threshold value and -all allowed public keys. Due to the need for backwards compatibility with -Stacks v1, there are four ways to hash an account's public keys and threshold, -and they are identical to Bitcoin's pay-to-public-key-hash, multisig -pay-to-script-hash, pay-to-witness-public-key-hash, and multisig -pay-to-witness-script-hash hashing algorithms (see appendix). - -* **Contract accounts**. These are accounts that are materialized whenever a -smart contract is instantiated. Each contract is paired with exactly one contract account. -It cannot authorize or pay for transactions, but may serve as the sending account -of a currently-executing transaction, via Clarity's `as-contract` function. A -contract's address's public key hash matches the public key hash of the standard -account that created it, and each contract account's address contains a name for -its code body. The name is unique within the set of code bodies instantiated by -the standard account. - -Both kinds of accounts may own on-chain assets. However, the nonce of a -contract account must always be 0, since it cannot be used to originate or pay -for a transaction. - -### Account Assets - -As described in SIP 002, the Stacks blockchain supports on-chain assets as a -first-class data type -- in particular, _fungible_ and _non-fungible_ assets are -supported. All assets (besides STX) are scoped to a particular contract, since -they are created by contracts. Within a contract, asset types are unique. -Therefore, all asset types are globally addressible via their identifier in the -contract and their fully-qualified contract names. - -Regardless of where asset types are declared, a particular instance of an asset -belongs to exactly one account at all times. Once a contract declares an asset type, -instances of that asset can be sent to and owned by other accounts. - -## Transactions - -Transactions are the fundamental unit of execution in the Stacks blockchain. -Each transaction is originated from a standard account, and is retained in -the Stacks blockchain history for eternity. Transactions are atomic -- they -either execute completely with respect to other transactions, or not at all. -Moreover, transactions are processed in the same total order by all Stacks -nodes. - -At its core, a transaction is an authorization statement (see below), -a snippet of executable Clarity code, and a list of -_post-conditions_ that must be true before the transaction is accepted. The -transaction body supplies the Stacks blockchain this code, as well as all of -the necessary metadata to describe how the transaction should be executed. -The various types of Stacks transactions encode different metadata, and -thus have different validation rules. - -All transactions are originated from a set of private keys that own a standard -account, even if it is not yet materialized. The owner(s) of these -private keys sign the transaction, attach a _transaction fee_ to it, and -relay it to the Stacks peer network. If the transaction is well-formed, -then it will be propagated to all reachable Stacks peers. -Eventually, assuming the transaction remains resident in the peers' memories -for long enough, a Stacks leader will select the transaction for inclusion -in the longest fork's next block. Once this happens, the state-transitions -encoded by the transaction are materialized in the blockchain state replicas in all -peers. - -### Transaction Authorizations - -The Stacks blockchain supports two ways to authorize a transaction: a -_standard_ authorization, and a _sponsored_ authorization. The distinction is -in whether or not the originating account is also the paying account. In a -transaction with a standard authorization, the origin and paying accounts are -the same. In a transaction with a sponsored authorization, the origin and -paying accounts are distinct, and both accounts must sign the transaction for it -to be valid (first the origin, then the spender). - -The intended use-case for sponsored authorizations is to enable developers -and/or infrastructure operators to pay for users to call into their -smart contracts, even if users do not have the STX to do so. The signing flow -for sponsored transactions would be to have the user first sign the transaction -with their origin account with the intent of it being sponsored (i.e. the user -must explicitly allow a sponsor to sign), and then have the sponsor sign with their paying -account to pay for the user's transaction fee. - -### Transaction Payloads - -The key difference between Stacks transaction payloads is what functionality is -available to them from the Clarity VM (and by extension, what side-effects are -materializable). The reasons for distinguishing between these types of -transactions are to make static analysis cheaper for certain common use-cases, -and to provide greater security for the user(s) that own the account. - -#### Type-0: Transferring an Asset - -A type-0 transaction may only transfer a single asset from one account to -another. It may not directly execute Clarity code. A type-0 -transaction can only send STX. It cannot have post-conditions -(see below). - -#### Type-1: Instantiating a Smart Contract - -A type-1 transaction has unrestricted access to the Clarity VM, -and when successfully evaluated, will materialize a new smart contract -account. Type-1 transactions are meant to instantiate smart contracts, and to -call into multiple smart contract functions and/or access their state -atomically. - -#### Type-2: Calling an Existing Smart Contract - -A type-2 transaction has restricted access to the Clarity VM. A -type-2 transaction may only contain a single public function call (via -`contract-call?`), and may only supply Clarity `Value`s as its -arguments. These transactions do _not_ materialize a contract account. - -The intended use-case for a type-2 transaction is to invoke an existing public -smart contract function. Because they have such restricted access to the -Clarity VM, they are much cheaper to execute compared to a type-1 transaction. - -#### Type-3: Punishing an Equivocating Stacks Leader - -A type-3 transaction encodes two well-formed, signed, but conflicting -microblock headers. That is, the headers are different, but have the same -sequence number and/or parent block hash. If mined before the block reward -matures, this transaction will cause the offending leader to lose their block reward, -and cause the sender of this transaction to receive a fraction of the lost -coinbase as a reward for catching the bad behavior. -This transaction has no access to the Clarity VM. - -#### Type-4: Coinbase - -A type-4 transaction encodes a 32-byte scratch space for a block leader's own -use, such as signaling for network upgrades or announcing a digest of a set of -available peers. This transaction must be the first transaction in an anchored -block in order for the block to be considered well-formed. This transaction -has no access to the Clarity VM. Only one coinbase transaction may be mined per -epoch. - -### Transaction Post-Conditions - -A key use-case of smart contracts is to allow programmatic control over the -assets in one or more accounts. However, where there is programmatic control, -there are bound to be bugs. In the world of smart contract programming, bugs -(intentional or not) can have severe consequences to the user's well-being. -In particular, bugs can destroy a user's assets and cause them to lose wealth. -Transaction post-conditions are a feature meant to limit the damage a bug can -do in terms of destroying a user's assets. - -Post-conditions are intended to be used to force a transaction to abort if the -transaction would cause a principal to send an asset in a way that is not to -the user's liking. For example, a user may append a post-condition saying that -upon successful execution, their account's STX balance should have decreased by no more -than 1 STX (excluding the fee). If this is not the case, then the transaction would abort -and the account would only pay the transaction fee of processing it. -As another example, a user purchasing a BNS name may append a post-condition saying that upon -successful execution, the seller will have sent the BNS name. If it -did not, then the transaction aborts, the account is not billed for the name, -and the selling account receives no payment. - -Each transaction includes a field that describes zero or more post-conditions -that must all be true when the transaction finishes running. Each -post-condition is a quad that encodes the following information: - -* The **principal** that sent the asset. It can be a standard or contract address. -* The **asset name**, i.e. the name of one of the assets in the originating - account's asset map. -* The **comparator**, described below. -* The **literal**, an integer or boolean value used to compare instances of the - asset against via the condition. The type of literal depends on both the - type of asset (fungible or non-fungible) and the comparator. - -The Stacks blockchain supports the following two types of comparators: - -* **Fungible asset changes** -- that is, a question of _how much_ of a - fungible asset was sent by a given account when the transaction ran. - The post-condition can assert that the quantity of tokens increased, - decreased, or stayed the same. -* **Non-fungible asset state** -- that is, a question of _whether or not_ an - account sent a non-fungible asset when the transaction ran. - -In addition, the Stacks blockchain supports an "allow" or "deny" mode for -evaluating post-conditions: in "allow" mode, other asset transfers not covered -by the post-conditions are permitted, but in "deny" mode, no other asset -transfers are permitted besides those named in the post-conditions. - -Post-conditions are meant to be added by the user (or by the user's wallet -software) at the moment they sign with their origin account. Because the -user defines the post-conditions, the user has the power to protect themselves -from buggy or malicious smart contracts proactively, so even undiscovered bugs -cannot steal or destroy their assets if they are guarded with post-conditions. -Well-designed wallets would provide an intuitive user interface for -encoding post-conditions, as well as provide a set of recommended mitigations -based on whether or not the transaction would interact with a known-buggy smart contract. - -Post-conditions may be used in conjunction with only contract-calls and smart contract -instantiation transaction payloads. - -#### Post-Condition Limitations - -Post-conditions do not consider who _currently owns_ an asset when the -transaction finishes, nor do they consider the sequence of owners an asset -had during its execution. It only encodes who _sent_ an asset, and how much. -This information is much cheaper to track, and requires no -I/O to process (rocessing time is _O(n)_ in the number of post-conditions). -Users who want richer post-conditions are encouraged to deploy their own -proxy contracts for making such queries. - -### Transaction Encoding - -A transaction includes the following information. Multiple-byte fields are -encoded as big-endian. - -* A 1-byte **version number**, identifying whether or not the transaction is - meant as a mainnet or testnet transaction. -* A 4-byte **chain ID**, identifying which Stacks chain this transaction is - destined for. -* A **transaction authorization** structure, described below, which encodes the - following information (details are given in a later section): - * The address of the origin account. - * The signature(s) and signature threshold for the origin account. - * The address of the sponsor account, if this is a sponsored transaction. - * The signature(s) and signature threshold for the sponsor account, if given. - * The **fee rate** to pay, denominated in microSTX/compute unit. -* A 1-byte **anchor mode**, identifying how the transaction should be mined. It - takes one of the following values: - * `0x01`: The transaction MUST be included in an anchored block - * `0x02`: The transaction MUST be included in a microblock - * `0x03`: The leader can choose where to include the transaction. -* A 1-byte **post-condition mode**, identifying whether or not post-conditions - must fully cover all transferred assets. It can take the following values: - * `0x01`: This transaction may affect other assets not listed in the - post-conditions. - * `0x02`: This transaction may NOT affect other assets besides those listed - in the post-conditions. -* A length-prefixed list of **post-conditions**, describing properties that must be true of the - originating account's assets once the transaction finishes executing. It is encoded as follows: - * A 4-byte length, indicating the number of post-conditions. - * A list of zero or more post-conditions, whose encoding is described below. -* The **transaction payload**, described below. - -#### Version Number - -The version number identifies whether or not the transaction is a mainnet or -testnet transaction. A mainnet transaction MUST have its highest bit cleared, and a -testnet transaction MUST have the highest bit set (i.e. `version & 0x80` must be -non-zero for testnet, and zero for mainnet). The lower 7 bits are ignored for -now. - -#### Chain ID - -The chain ID identifies which instance of the Stacks blockchain this transaction -is destined for. Because the main Stacks blockchain and Stacks app chains -(described in a future SIP) share the same transaction wire format, this field -is used to distinguish between each chain's transactions. Transactions for the -main Stacks blockchain MUST have a chain ID of `0x00000000`. - -#### Transaction Authorization - -Each transaction contains a transaction authorization structure, which is used -by the Stacks peer to identify the originating account and sponsored account, to -determine the maximum fee rate the spending account will pay, and to -and determine whether or not it is allowed to carry out the encoded state-transition. -It is encoded as follows: - -* A 1-byte **authorization type** field that indicates whether or not the - transaction has a standard or sponsored authorization. - * For standard authorizations, this value MUST be `0x04`. - * For sponsored authorizations, this value MUST be `0x05`. -* One or two **spending conditions**, whose encoding is described below. If the - transaction's authorization type byte indicates that it is a standard -authorization, then there is one spending condition. If it is a sponsored -authorization, then there are two spending conditions that follow. - -_Spending conditions_ are encoded as follows: - -* A 1-byte **hash mode** field that indicates how the origin account authorization's public - keys and signatures should be used to calculate the account address. Four -modes are supported, in the service of emulating the four hash modes supported -in Stacks v1 (which uses Bitcoin hashing routines): - * `0x00`: A single public key is used. Hash it like a Bitcoin P2PKH output. - * `0x01`: One or more public keys are used. Hash them as a Bitcoin multisig P2SH redeem script. - * `0x02`: A single public key is used. Hash it like a Bitcoin P2WPKH-P2SH - output. - * `0x03`: One or more public keys are used. Hash them as a Bitcoin - P2WSH-P2SH output. -* A 20-byte **public key hash**, which is derived from the public key(s) according to the - hashing routine identified by the hash mode. The hash mode and public key -hash uniquely identify the origin account, with the hash mode being used to -derive the appropriate account version number. -* An 8-byte **nonce**. -* An 8-byte **fee rate**. -* Either a **single-signature spending condition** or a **multisig spending - condition**, described below. If the hash mode byte is either `0x00` or -`0x02`, then a signle-signature spending condition follows. Otherwise, a -multisig spending condition follows. - -A _single-signature spending condition_ is encoded as follows: - -* A 1-byte **public key encoding** field to indicate whether or not the - public key should be compressed before hashing. It will be: - * `0x00` for compressed - * `0x01` for uncompressed -* A 65-byte **recoverable ECDSA signature**, which contains a signature -and metadata for a secp256k1 signature. - -A _multisig spending condition_ is encoded as follows: - -* A length-prefixed array of **spending authorization fields**, described - below. -* A 2-byte **signature count** indicating the number of signatures that - are required for the authorization to be valid. - -A _spending authorization field_ is encoded as follows: - -* A 1-byte **field ID**, which can be `0x00`, `0x01`, `0x02`, or - `0x03`. -* The **spending field body**, which will be the following, - depending on the field ID: - * `0x00` or `0x01`: The next 33 bytes are a compressed secp256k1 public key. - If the field ID is `0x00`, the key will be loaded as a compressed - secp256k1 public key. If it is `0x01`, then the key will be loaded as - an uncompressed secp256k1 public key. - * `0x02` or `0x03`: The next 65 bytes are a recoverable secp256k1 ECDSA - signature. If the field ID is `0x03`, then the recovered public - key will be loaded as a compressed public key. If it is `0x04`, - then the recovered public key will be loaded as an uncompressed - public key. - -A _compressed secp256k1 public key_ has the following encoding: - -* A 1-byte sign byte, which is either `0x02` for even values of the curve's `y` - coordinate, or `0x03` for odd values. -* A 32-byte `x` curve coordinate. - -An _uncompressed secp256k1 public key_ has the following encoding: - -* A 1-byte constant `0x04` -* A 32-byte `x` coordinate -* A 32-byte `y` coordinate - -A _recoverable ECDSA secp256k1 signature_ has the following encoding: - -* A 1-byte **recovery ID**, which can have the value `0x00`, `0x01`, `0x02`, or - `0x03`. -* A 32-byte `r` curve coordinate -* A 32-byte `s` curve coordinate. Of the two possible `s` values that may be - calculated from an ECDSA signature on secp256k1, the lower `s` value MUST be -used. - -The number of required signatures and the list of public keys in a spending -condition structure uniquely identifies a standard account. -and can be used to generate its address per the following rules: - -| Hash mode | Spending Condition | Mainnet version | Hash algorithm | -| --------- | ------------------ | --------------- | -------------- | -| `0x00` | Single-signature | 22 | Bitcoin P2PKH | -| `0x01` | Multi-signature | 20 | Bitcoin redeem script P2SH | -| `0x02` | Single-signature | 20 | Bitcoin P2WPK-P2SH | -| `0x03` | Multi-signature | 20 | Bitcoin P2WSH-P2SH | - -The corresponding testnet address versions are: -* For 22 (`P` in the c32 alphabet), use 26 (`T` in the c32 alphabet) -* For 20 (`M` in the c32 alphabet), use 21 (`N` in the c32 alphabet). - -The hash algorithms are described below briefly, and mirror hash algorithms used -today in Bitcoin. This is necessary for backwards compatibility with Stacks v1 -accounts, which rely on Bitcoin's scripting language for authorizations. - -_Hash160_: Takes the SHA256 hash of its input, and then takes the RIPEMD160 -hash of the 32-byte - -_Bitcoin P2PKH_: This algorithm takes the ECDSA recoverable signature and -public key encoding byte from the single-signature spending condition, converts them to -a public key, and then calculates the Hash160 of the key's byte representation -(i.e. by serializing the key as a compressed or uncompressed secp256k1 public -key). - -_Bitcoin redeem script P2SH_: This algorithm converts a multisig spending -condition's public keys and recoverable signatures -into a Bitcoin BIP16 P2SH redeem script, and calculates the Hash160 -over the redeem script's bytes (as is done in BIP16). It converts the given ECDSA -recoverable signatures and public key encoding byte values into their respective -(un)compressed secp256k1 public keys to do so. - -_Bitcoin P2WPKH-P2SH_: This algorithm takes the ECDSA recoverable signature and -public key encoding byte from the single-signature spending condition, converts -them to a public key, and generates a P2WPKH witness program, P2SH redeem -script, and finally the Hash160 of the redeem script to get the address's public -key hash. - -_Bitcoin P2WSH-P2SH_: This algorithm takes the ECDSA recoverable signatures and -pbulic key encoding bytes, as well as any given public keys, and converts them -into a multisig P2WSH witness program. It then generates a P2SH redeem script -from the witness program, and obtains the address's public key hash from the -Hash160 of the redeem script. - -The resulting public key hash must match the public key hash given in the -transaction authorization structure. This is only possible if the ECDSA -recoverable signatures recover to the correct public keys, which in turn is only -possible if the corresponding private key(s) signed this transaction. - -#### Transaction Post-Conditions - -The list of post-conditions is encoded as follows: -* A 4-byte length prefix -* Zero or more post-conditions. - -A post-condition can take one of the following forms: -* A 1-byte **post-condition type ID** -* A variable-length **post-condition** - -The _post-condition type ID_ can have the following values: -* `0x00`: A **STX post-condition**, which pertains to the origin account's STX. -* `0x01`: A **Fungible token post-condition**, which pertains to one of the origin -account's fungible tokens. -* `0x02`: A **Non-fungible token post-condition**, which pertains to one of the origin -account's non-fungible tokens. - -A _STX post condition_ body is encoded as follows: -* A variable-length **principal**, containing the address of the standard account or contract - account -* A 1-byte **fungible condition code**, described below -* An 8-byte value encoding the literal number of microSTX - -A _Fungible token post-condition_ body is encoded as follows: -* A variable-length **principal**, containing the address of the standard account or contract - account -* A variable-length **asset info** structure that identifies the token type, described below -* A 1-byte **fungible condition code** -* An 8-byte value encoding the literal number of token units - -A _Non-fungible token post-condition_ body is encoded as follows: -* A variable-length **principal**, containing the address of the standard account or contract - account -* A variable-length **asset info** structure that identifies the token type -* A variable-length **asset name**, which is the Clarity value that names the token instance, - serialized according to the Clarity value serialization format. -* A 1-byte **non-fungible condition code** - -A **principal** structure encodes either a standard account address or a -contract account address. -* A standard account address is encoded as a 1-byte version number and a 20-byte - Hash160 -* A contract account address is encoded as a 1-byte version number, a 20-byte - Hash160, a 1-byte name length, and a variable-length name of up to 128 -characters. The name characters must be a valid contract name (see below). - -An **asset info** structure identifies a token type declared somewhere in an -earlier-processed Clarity smart contract. It contains the following fields: -* An **address**, which identifies the standard account that created the - contract that declared the token. This is encoded as a 1-byte version, - followed by a 20-byte public key hash (i.e. a standard account address). -* A **contract name**, a length-prefixed Clarity string that encodes the - human-readable part of the contract's name. -* An **asset name**, a length-prefixed Clarity string that encodes the name of - the token as declared in the Clarity code. - -The _address_ and _contract name_ fields together comprise the smart contract's -fully-qualified name, and the asset name field identifies the specific token -declaration within the contract. - -The _contract name_ is encoded as follows: -* A 1-byte length prefix, up to 128 -* A variable-length string of valid ASCII characters (up to 128 bytes). This - string must be accepted by the regex `^[a-zA-Z]([a-zA-Z0-9]|[-_])`*`$`. - -The _asset name_ is encoded as follows: -* A 1-byte length prefix, up to 128 -* A variable length string of valid ASCII characters (up to 128 bytes). This - string must be accepted by the regex `^[a-zA-Z]([a-zA-Z0-9]|[-_!?])`*`$`. - -A **fungible condition code** encodes a statement being made for either STX or -a fungible token, with respect to the originating account. It can take one of the -following values, with the following meanings regarding the associated token -units: -* `0x01`: "The account sent an amount equal to the number of units" -* `0x02`: "The account sent an amount greater than the number of units" -* `0x03`: "The account sent an amount greater than or equal to the number of units" -* `0x04`: "The account sent an amount less than the number of units" -* `0x05`: "The account sent an amount less than or equal to the number of units" - -A **non-fungible condition code** encodes a statement being made about a -non-fungible token, with respect to whether or not the particular non-fungible -token is owned by the account. It can take the following values: -* `0x10`: "The account does NOT own this non-fungible token" -* `0x11`: "The account owns this non-fungible token" - -Post-conditions are defined in terms of which assets the origin account sends or -does not send during the transaction's execution. To enforce post-conditions, -the Clarity VM records which assets the origin account sends as the transaction -is evaluated to produce an "asset map." The asset map is used to evaluate the post-conditions. - -#### Transaction Payloads - -There are five different types of transaction payloads. Each payload is encoded -as follows: -* A 1-byte **payload type ID**, between 0 and 5 exclusive. -* A variable-length **payload**, of which there are five varieties. - -The _payload type ID_ can take any of the following values: -* `0x00`: the payload that follows is a **token-transfer payload** -* `0x01`: the payload that follows is a **smart-contract payload** -* `0x02`: the payload that follows is a **contract-call payload** -* `0x03`: the payload that follows is a **poison-microblock payload** -* `0x04`: the payload that follows is a **coinbase payload**. - -The _STX token-transfer_ structure is encoded as follows: -* A **recipient principal** encoded as follows: - * A 1-byte type field indicating whether the principal is - * `0x05`: a recipient address - * `0x06`: a contract recipient - * If a simple recipient address, the 1-byte type is followed by a - 1-byte address version number and a 20-byte hash identifying a standard - recipient account. - * If a contract recipient address, the 1-byte type is followed by - the issuer address of the contract, encoded with a 1-byte address - version number and the 20-byte hash that identifies the standard - account of the issuer. This is followed by the encoding of the - contract name -- encoded as described above. -* An 8-byte number denominating the number of microSTX to send to the recipient - address's account. - -Note that if a transaction contains a token-transfer payload, it MUST -have only a standard authorization field. It cannot be sponsored. The -recipient principal does not need to be a materialized account -- STX -may be transfered to an account which has not been used in any prior -transactions. In the case of a contract principal, the unmaterialized -contract principal will receive the funds and maintain a balance in -the STX holdings map. If and when that contract is published, the contract -will be able to spend those STX via `(as-contract (stx-transfer? ...` -invocations. - -A _smart-contract payload_ is encoded as follows: -* A **contract name** string, described above, that encodes the human-readable - part of the contract's fully-qualified name. -* A **code body** string that encodes the Clarity smart contract itself. This - string is encoded as: - * A 4-byte length prefix - * Zero or more human-readable ASCII characters -- specifically, those between `0x20` and - `0x7e` (inclusive), and the whitespace characters `\n` and `\t`. - -Note that when the smart contract is instantiated, its fully-qualified name will -be computed from the transaction's origin account address and the given contract -name. The fully-qualified name must be globally unique -- the transaction will -not be accepted if its fully-qualified name matches an already-accepted smart -contract. - -A _contract-call payload_ is encoded as follows: -* A **contract address**, comprised of a 1-byte address version number and a - 20-byte public key hash of the standard account that created the smart -contract whose public function is to be called, -* A length-prefixed **contract name** string, described above, that encodes the - human readable part of the contract's fully-qualified name, -* A length-prefixed **function name** string, comprised of a 1-byte length and - up to 128 characters of valid ASCII text, that identifies the public function -to call. The characters must match the regex `^[a-zA-Z]([a-zA-Z0-9]|[-_!?])`*`$`. -* A length-prefixed list of **function arguments**, encoded as follows: - * A 4-byte length prefix, indicating the number of arguments - * Zero or more binary strings encoding the arguments as Clarity values. - Clarity values are serialized as described in the section - [Clarity Value Representation](#clarity-value-representation). - -Note that together, the _contract address_ and _contract name_ fields uniquely identify -the smart contract within the Clarity VM. - -A _poison microblock payload_ is encoded as follows: -* Two Stacks microblock headers, such that either the `prev_block` or `sequence` - values are equal. When validated, the ECDSA recoverable `signature` fields of both microblocks -must recover to the same public key, and it must hash to the leader's parent -anchored block's public key hash. See the following sections for the exact -encoding of a Stacks microblock header. - -This transaction type is sent to punish leaders who intentionally equivocate -about the microblocks they package, as described in SIP 001. - -A _coinbase payload_ is encoded as follows: -* A 32-byte field called a **coinbase buffer** that the Stacks leader can fill with whatever it wants. - -Note that this must be the first transaction in an anchored block in order for the -anchored block to be considered well-formed (see below). - -#### Transaction Signing and Verifying - -A transaction may have one or two spending conditions. The act of signing -a transaction is the act of generating the signatures for its authorization -structure's spending conditions, and the act of verifying a transaction is the act of (1) verifying -the signatures in each spending condition and (2) verifying that the public key(s) -of each spending condition hash to its address. - -Signing a transaction is performed after all other fields in the transaction are -filled in. The high-level algorithm for filling in the signatures in a spending -condition structure is as follows: - -0. Set the spending condition address, and optionally, its signature count. -1. Clear the other spending condition fields, using the appropriate algorithm below. - If this is a sponsored transaction, and the signer is the origin, then set the sponsor spending condition - to the "signing sentinal" value (see below). -2. Serialize the transaction into a byte sequence, and hash it to form an - initial `sighash`. -3. Calculate the `presign-sighash` over the `sighash` by hashing the - `sighash` with the authorization type byte (0x04 or 0x05), fee rate (as an 8-byte big-endian value), - and nonce (as an 8-byte big-endian value). -4. Calculate the ECDSA signature over the `presign-sighash` by treating this - hash as the message digest. Note that the signature must be a `libsecp256k1` - recoverable signature. -5. Calculate the `postsign-sighash` over the resulting signature and public key - by hashing the `presign-sighash` hash, the signing key's public key encoding byte, and the - signature from step 4 to form the next `sighash`. Store the message - signature and public key encoding byte as a signature auth field. -6. Repeat steps 3-5 for each private key that must sign, using the new `sighash` - from step 5. - -The algorithms for clearing an authorization structure are as follows: -* If this is a single-signature spending condition, then set the fee rate and - nonce to 0, set the public key encoding byte to `Compressed`, and set the - signature bytes to 0 (note that the address is _preserved_). -* If this is a multi-signature spending condition, then set the fee rate and - nonce to 0, and set the vector of authorization fields to the empty vector - (note that the address and the 2-byte signature count are _preserved_). - -While signing a transaction, the implementation keeps a running list of public -keys, public key encoding bytes, and signatures to use to fill in the spending condition once signing -is complete. For single-signature spending conditions, the only data the -signing algorithm needs to return is the public key encoding byte and message signature. For multi-signature -spending conditions, the implementation returns the sequence of public keys and -(public key encoding byte, ECDSA recoverable signature) pairs that make up the condition's authorization fields. -The implementation must take care to preserve the order of public keys and -(encoding-byte, signature) pairs in the multisig spending condition, so that -the verifying algorith will hash them all in the right order when verifying the -address. - -When signing a sponsored transaction, the origin spending condition signatures -are calculated first, and the sponsor spending conditions are calculated second. -When the origin key(s) sign, the set the sponsor spending condition to a -specially-crafted "signing sentinel" structure. This structure is a -single-signature spending condition, with a hash mode equal to 0x00, an -address and signature of all 0's, a fee rate and a nonce equal to 0, and a -public key encoding byte equal to 0x00. This way, the origin commits to the -fact that the transaction is sponsored without having to know anything about the -sponsor's spending conditions. - -When sponsoring a transaction, the sponsor uses the same algorithm as above to -calculate its signatures. This way, the sponsor commits to the signature(s) of -the origin when calculating its signatures. - -When verifying a transaction, the implementation verifies the sponsor spending -condition (if present), and then the origin spending condition. It effectively -performs the signing algorithm again, but this time, it verifies signatures and -recovers public keys. - -0. Extract the public key(s) and signature(s) from the spending condition. -1. Clear the spending condition. -2. Serialize the transaction into a byte sequence, and hash it to form an - initial `sighash`. -3. Calculate the `presign-sighash` from the `sighash`, authorization type byte, - fee rate, and nonce. -4. Use the `presign-sighash` and the next (public key encoding byte, - ECDSA recoverable signature) pair to recover the public key that generated it. -5. Calculate the `postsign-sighash` from `presign-sighash`, the signature, public key encoding - byte, -6. Repeat steps 3-5 for each signature, so that all of the public keys are - recovered. -7. Verify that the sequence of public keys hash to the address, using - the address's indicated public key hashing algorithm. - -When verifying a sponsored transaction, the sponsor's signatures are verified -first. Once verified, the sponsor spending condition is set to the "signing -sentinal" value in order to verify the origin spending condition. - -## Blocks - -Blocks are batches of transactions proposed by a single Stacks leader. The -Stacks leader gathers transactions from the peer network (by means of a -_mempool_), selects the ones they wish to package together into the next block -("mines" them), and then announces the block to the rest of the peer network. - -A block is considered valid if (1) it is well-formed, (2) it contains a valid -sequence of transactions -- i.e. each transaction's state-transitions are -permitted, and (3) it follows the rules described in this document below. - -Per SIP 001, there are two kinds of blocks: anchored blocks, and streamed -microblcoks. An anchored block is comprised of the following two fields: - -* A **block header** -* A list of one or more **transactions**, encoded as: - * A 4-byte length, counting the number of transactions, - * A coinbase transaction, with an anchor mode of `0x01` or `0x03`, - * Zero or more additional transactions, all of which must have an anchor mode - byte set to either `0x01` or `0x03`. - -A _block header_ is encoded as follows: -* A 1-byte **version number** to describe how to validate the block. -* The **cumulative work score** for this block's fork, described below. -* An 80-byte **VRF proof** which must match the burn commitment transaction on the burn - chain (in particular, it must hash to its VRF seed), described below. -* A 32-byte **parent block hash**, which must be the SHA512/256 hash of the last _anchored_ block - that precedes this block in the fork to which this block is to be appended. -* A 32-byte **parent microblock hash**, which must be the SHA512/256 hash of the last _streamed_ - block that precedes this block in the fork to which this block is to be appended. -* A 2-byte **parent microblock sequence number**, which indicates the sequence - number of the parent microblock to which this anchored block is attached. -* A 32-byte **transaction Merkle root**, the SHA512/256 root hash of a binary Merkle tree - calculated over the sequence of transactions in this block (described below). -* A 32-byte **state Merkle root**, the SHA512/256 root hash of a MARF index over the state of - the blockchain (see SIP-004 for details). -* A 20-byte **microblock public key hash**, the Hash160 of a compressed public key whose private - key will be used to sign microblocks during the peer's tenure. - -The _VRF proof_ field contains the following fields: -* A 32-byte **Gamma** Ristretto point, which itself is a compressed Ed25519 - curve point (see https://ristretto.group). -* A 16-byte **c scalar**, an unsigned integer (encoded big-endian) -* A 32-byte **s scalar**, an unsigned integer mod 2^255 - 19 (big-endian) - -The _cumulative work score_ contains the following two fields: -* An 8-byte unsigned integer that encodes the sum of all burnchain tokens - burned or transferred in this fork of the Stacks blockchain (i.e. by means of - proof-of-burn or proof-of-transfer, whichever is in effect). -* An 8-byte unsigned integer that encodes the total proof-of-work done in this - fork of the burn chain. - -In-between two consecutive anchored blocks in the same fork there can exist -zero or more Stacks microblocks. - -A _microblock_ is comprised of two fields: -* A **microblock header**, -* A length-prefixed list of transactions, all of which have an - anchor mode set to `0x02` or `0x03`. This is comprised of: - * A 4-byte length, which counts the number of transactions, - * Zero or more transactions. - -Each _microblock header_ contains the following information: -* A 1-byte **version number** to describe how to validate the block. -* A 2-byte **sequence number** as a hint to describe how to order a set of - microblocks. -* A 32-byte **parent microblock hash**, which is the SHA512/256 hash of the previous signed microblock - in this stream. -* A 32-byte **transaction Merkle root**, the SHA512/256 root hash of a binary Merkle tree - calculated over this block's sequence of transactions. -* A 65-byte **signature** over the block header from the Stacks peer that produced - it, using the private key whose public key was announced in the anchored - block. This is a recoverable ECDSA secp256k1 signature, whose recovered - compressed public key must hash to the same value as the parent anchor block's microblock - public key hash field. - -For both blocks and microblocks, a block's hash is calculated by first -serializing its header to bytes, and then calculating the SHA512/256 hash over those bytes. - -### Block Validation - -The hash of the anchored block's header is written to the burn chain via a block commitment -transaction, per SIP 001. When a well-formed anchored block is received from the peer -network, the peer must confirm that: - -* The block header hashes to a known commitment transaction that won - cryptographic sortition. -* All transactions are well-formed and have the appropriate anchor byte. -* All transactions, when assembled into a Merkle tree, hash to the given - transaction Merkle root. -* The first transaction is a coinbase transaction. -* The block version is supported. -* The cumulative work score is equal to the sum of all work - scores on this fork. -* The block header's VRF proof hashes to the burn commitment transaction's VRF - seed. Note that this is the VRF seed produced by the burnchain block just before the - burnchain block that contains the block commitment; it is _not_ the parent - Stacks block's VRF seed. - -If any of the above are false, then there is _no way_ that the block can be -valid, and it is dropped. - -Once an block passes this initial test, it is queued up for processing in a -"staging" database. Blocks remain in this staging database -until there exists a chain tip to which to append -the block (where the chain tip in this case refers both to the parent anchored -block and parent microblock). - -An anchored block is _processed_ and either _accepted_ or _rejected_ once its -parent anchored block _and_ its parent microblocks are available. -To accept the anchored block, the peer applies the parent microblock stream's -transactions to the chain state, followed by the anchored block's transactions. -If the resulting state root matches the block's state root, then the block is -valid and the leader is awarded the anchored block's coinbase and 60% of the -microblock stream's transaction fees, released over a maturation period -(per SIP 001). The microblock stream and anchored block are marked as -_accepted_ and will be made available for other peers to download. - -Not every anchored block will have a parent microblock stream. Anchored blocks -that do not have parent microblock streams will have their parent microblock -header hashes set to all 0's, and the parent microblock sequence number set to -0. - -### Microblock Validation - -When a well-formed microblock arrives from the peer network, the peer -first confirms that: - -* The parent anchored block is either fully accepted, or is queued up. -* The parent anchored block's leader signed the microblock. - -If all these are true, then the microblock is queued up for processing. It will -be processed when its descendent anchored block is ready for processing. - -As discussed in SIP 001, a Stacks leader can equivocate while packaging -transactions as microblocks by deliberately creating a microblock stream fork. -This will be evidenced by the discovery of either of the following: - -* Two well-formed, signed microblocks with the same parent hash -* Two well-formed, signed microblocks with the same sequence number - -If such a discovery is made, the microblock stream is truncated to the last -microblock before the height in the microblock stream of the equivocation, and -this microblock (or any of its predecessor microblocks in the stream) remain -viable chain tips for subsequent leaders to build off of. In the mean time, -anyone can submit a poison-microblock transaction with both signed headers in -order to (1) destroy the equivocating leader's coinbase and fees, and (2) receive -5% of the destroyed tokens as a reward, provided that the poison-microblock -transaction is processed before the block reward becomes spendable by the -equivocating leader. - -Because microblocks are released quickly, it is possible that they will not -arrive in order, and may even arrive before their parent microblock. Peers are -expected to cache well-formed microblocks for some time, in order to help ensure that -they are eventually enqueued for processing if they are legitimate. - -Valid microblocks in the parent stream may be orphaned by the child anchored block, i.e. -because the leader didn't see them in time to build off of them. -If this happens, then the orphaned microblocks are dropped. - -## Block Processing - -Block processing is the act of calculating the next materialized view of the -blockchain, using both the anchored block and the parent microblcok stream -that connects it to its parent anchored block. -Processing the anchored block entails applying all of the transactions of its ancestor -microblocks, applying all of the anchored transactions, -and verifying that the cryptographic digest of the materialized view encoded -in the anchored block header matches the cryptographic digest calculated by applying -these transactions. - -To begin _processing_ the anchored block and its parent microblock stream, -the peer must first ensure that: - -* It has received all microblocks between the parent anchored block and the - newly-arrived anchored block. -* The microblocks are well-formed. -* The microblocks are contiguous. That is: - * The first microblock's sequence number is 0, and its parent block hash is - equal to the parent anchored block's hash. - * The *i*th microblock's parent block hash is equal to the block hash of the - *i-1*th microblock. - * The *i*th microblock has a sequence number that is equal to 1 + the - sequence number of the *i-1*th microblock. - * The last microblock's hash and sequence number match the anchored block's - parent microblock hash and parent microblock sequence number. - * There are at most 65536 microblocks per epoch. - -If all of these are true, then the peer may then proceed to process the microblocks' transactions. - -To process a microblock stream, the peer will do the following for each -microblock: - -1. Verify that each transaction authorization is valid. If not, then reject and - punish the previous Stacks leader. -2. Verify that each paying account has sufficient STX to pay their transaction - fees. If not, then reject and punish the previous Stacks leader. -3. For each transaction, grant the previous Stacks leader 40% of the transaction - fee, and the current leader 60% of the tranasction fee. This encourages the -leader that produced the current anchored block to build on top of as many -of the parent's microblocks as possible. - -If a microblock contains an invalid transaction, then parent block's leader forfeits their -block reward. The deepest valid microblock remains a valid chain tip to which -subsequent anchored blocks may be attached. - -Once the end of the stream is reached, the peer processes the anchored block. -To process the anchored block, the peer will process the state-transitions of -each transaction iteratively. To do so, it will first: - -1. Verify that each transaction authorization is valid. If not, then the block - and any of its descendent microblocks will be rejected, and the leader -punished by forfeiting the block reward. -2. Verify that each paying account has sufficient assets to pay their advertised - fees. If one or more do not, then reject the block and its descendent -microblocks and punish the leader. -3. Determine the *K*-highest offerred _STX fee rate per computation_ from all - transactions in the parent microblock stream and the anchored block, as -measured by computational work. Use the *K+1*-highest rate to find the price paid by -these top-*K* transactions, and debit each spending account by this rate -multiplied by amount of computation used by the transaction. All other -transactions' spending accounts are not debited any fee. - -A Stacks epoch has a fixed budget of "compute units" which the leader fills up. -The fee mechanism is designed to encourage leaders to fill up their epochs with -transactions while also encouraging users to bid their honest valuation of the -compute units (see [1] for details). To do so, the Stacks peer measures a block -as *F%* full, where *F* is the fraction of compute units consumed in its epoch. -If the block consumes less than some protocol-defined fraction of -the compute units, the block is considered "under-full." - -Leaders who produce under-full blocks are not given the full coinbase, but -instead given a fraction of the coinbase determined by how under-full the block -was (where an empty block receives 0 STX). In addition, the fee rate assessed -to each transaction in the epoch is set to a protocol-defined minimum rate, -equal to the minimum-relay fee rate. This is to encourage leaders to fill -up their epochs with unconfirmed transactions, even if they have low fees. - -Stacks leaders receive all anchored block transaction fees exclusively, as well -as 40% of the microblock transaction fees they produce, as well as 60% of the -microblock transaction fees they validate by building upon. - -Leaders do not receive their block rewards immediately. Instead, they must -mature for 100 Stacks epochs before they become spendable. - -### Calculating the Materialized View - -Once the microblock stream and anchored block transactions have been validated, -and the peer has determined that each paying account has sufficient funds -to pay their transaction fees, the peer will process the contained Clarity -code to produce the next materialized view of the -blockchain. The peer determines that the previous leader processed them -correctly by calculating a cryptographic digest over the resulting materialized -view of the blockchain state. The digest must match the digest provided in the -anchored block. If not, then the anchored block and its parent microblock -stream are rejected, and the previous leader is punished. - -Fundamentally, the materialized view of a fork is a set of sets of -key/value pairs. Each set of key/value pairs is calculated in the service -of _light clients_ who will want to query them. In this capacity, -the Stacks peer tracks the following sets of key/value pairs: - -* the mapping between account addresses and their nonces and asset maps -* the mapping between fully-qualified smart contract names and a bundle of - metadata about them (described below). -* the mapping between fully-qualified smart contract data keys and their - associated values. - -The first set of key/value pairs is the **account state**. The Stacks peer -calculates an index over all accounts in each fork as they are created. - -The second set of key/value pairs is the **smart contract context state**. It maps the -_fully-qualified name_ of the smart contract to: - * the transaction ID that created the smart contract (which can be used to - derive the contract account address and to query its code), - -The fully-qualified name of a smart contract is composed of the c32check-encoded -standard account address that created it, followed by an ASCII period `.`, as well -as an ASCII-encoded string chosen by the standard account owner(s) when the contract -is instantiated (subject to the constraints mentioned in the above sections). Note that all -fully-qualified smart contract names are globally unique -- the same standard -account cannot create two smart contracts with the same name. - -The third set of key/value pairs is the **smart contract data state**. -It maps the _fully-qualified_ data keys to their values. This stores -all data related to a smart contract: the values associated with data -map keys, the current value of any data variables, and the ownership -of fungible and non-fungible tokens. The construction of these keys and -values is described below. - -All sets of key/value pairs are stored in the same MARF index. Keys are -prefixed with the type of state they represent in order to avoid key collisions -with otherwise-identically-named objects. - -When a key/value pair is inserted into the MARF, the hash of its key is -calculated using the MARF's cryptographic hash function in order to determine -where to insert the leaf. The hash of the value is inserted as the leaf node, -and the (hash, leaf) pair is inserted into the peer's data store. This ensures -that the peer can query any key/value pair on any fork it knows about in -constant-time complexity. - -The text below describes the canonical encoding of key/value pairs that will be -inserted into the MARF. - -#### Calculating a Fully-Qualified Object Name - -All objects' fully-qualified names start with the type of object they are, -followed by an ASCII period `.`. This can be the ASCII string "account", -"smart-contract", "data-variable", or "data-map". - -Within an object type, a c32check-encoded addresses act as "namespaces" for keys in the state. In all -sets of key/value pairs, an ASCII period `.` is used to denote the separation between -the c32check-encoded address and the following name. Note that the c32 alphabet -does _not_ include the ASCII period. - -#### Clarity Value Representation - -Clarity values are represented through a specific binary encoding. Each value -representation is comprised of a 1-byte type ID, and a variable-length -serialized payload. The payload itself may be composed of additional Clarity -values. - -The following type IDs indicate the following values: -* 0x00: 128-bit signed integer -* 0x01: 128-bit unsigned integer -* 0x02: buffer -* 0x03: boolean `true` -* 0x04: boolean `false` -* 0x05: standard principal -* 0x06: contract principal -* 0x07: Ok response -* 0x08: Err response -* 0x09: None option -* 0x0a: Some option -* 0x0b: List -* 0x0c: Tuple - -The serialized payloads are defined as follows: - -**128-bit signed integer** - -The following 16 bytes are a big-endian 128-bit signed integer - -**128-bit unsigned integer** - -The following 16 bytes are a big-endian 128-bit unsigned integer - -**Buffer** - -The following 4 bytes are the buffer length, encoded as a 32-bit unsigned big-endian -integer. The remaining bytes are the buffer data. - -**Boolean `true`** - -No bytes follow. - -**Boolean `false`** - -No bytes follow. - -**Standard principal** - -The next byte is the address version, and the following 20 bytes are the -principal's public key(s)' Hash160. - -**Contract Principal** - -The next byte is the address version, the following 20 bytes are a Hash160, the -21st byte is the length of the contract name, and the remaining bytes (up to -128, exclusive) encode the name itself. - -**Ok Response** - -The following bytes encode a Clarity value. - -**Err Response** - -The following bytes encode a Clarity value. - -**None option** - -No bytes follow. - -**Some option** - -The following bytes encode a Clarity value. - -**List** - -The following 4 bytes are the list length, encoded as a 32-bit unsigned -big-endian integer. The remaining bytes encode the length-given number of -concatenated Clarity values. - -**Tuple** - -The following 4 bytes are the tuple length, encoded as a 32-bit unsigned -big-endian integer. The remaining bytes are encoded as a concatenation of tuple -items. A tuple item's serialized representation is a -Clarity name (a 1-byte length and up to 128 bytes (exclusive) of valid Clarity -name characters) followed by a Clarity value. - -#### Calculating the State of an Account - -An account's canonical encoding is a set of key/value pairs that represent the -account's nonce, STX tokens, and assets owned. - -The nonce is encoded as follows: - -* Key: the string `"vm-account::"`, a c32check-encoded address, and the string - `"::18"` -* Value: a serialized Clarity `UInt` - -Example: `"vm-account::SP2RZRSEQHCFPHSBHJTKNWT86W6VSK51M7BCMY06Q::18"` refers to -the nonce of account `SP2RZRSEQHCFPHSBHJTKNWT86W6VSK51M7BCMY06Q`. - -The STX balance is encoded as follows: - -* Key: the string "vm-account::", a Principal Address (see below), and the string - `"::19"` -* Value: a serialized Clarity `UInt` - -Example: `"vm-account::SP2RZRSEQHCFPHSBHJTKNWT86W6VSK51M7BCMY06Q::19"` refers to -the STX balance of account `SP2RZRSEQHCFPHSBHJTKNWT86W6VSK51M7BCMY06Q`. - -A fungible token balance owned by an account is encoded as follows: - -* Key: the string `"vm::"`, the fully-qualified contract identifier, the string `"::2::"`, - the name of the token as defined in its Clarity contract, the string `"::"`, and the -Principal Address of the account owning the token (see below). -* Value: a serialized Clarity `UInt` - -Example: `"vm::SP13N5TE1FBBGRZD1FCM49QDGN32WAXM2E5F8WT2G.example-contract::2::example-token::SP2RZRSEQHCFPHSBHJTKNWT86W6VSK51M7BCMY06Q"` -refers to the balance of `example-token` -- a fungible token defined in contract `SP13N5TE1FBBGRZD1FCM49QDGN32WAXM2E5F8WT2G.example-contract` -- -that is owned by account `SP2RZRSEQHCFPHSBHJTKNWT86W6VSK51M7BCMY06Q`. - -A non-fungible token owned by an account is encoded as follows: - -* Key: the string `"vm::"`, the fully-qualified contract identifier, the string `"::4::"`, - the name of the token as defined in its Clarity contract, the string `"::"`, -and the serialized Clarity value that represents the token. -* Value: a serialized Clarity Principal (either a Standard Principal or a Contract Principal) - -Example: `"vm::SP13N5TE1FBBGRZD1FCM49QDGN32WAXM2E5F8WT2G.example-contract::4::example-nft::\x02\x00\x00\x00\x0b\x68\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64"` -refers to the non-fungible token `"hello world"` (which has type `buff` and is -comprised of 11 bytes), defined in Clarity contract `SP13N5TE1FBBGRZD1FCM49QDGN32WAXM2E5F8WT2G.example-contract` -as a type of non-fungible token `example-nft`. - -A Principal Address is either a c32check-encoded address in the case of standard principal, or a c32check-encoded address, followed by an ASCII period `.`, and an ASCII-encoded string for a contract principal. - -#### Calculating the State of a Smart Contract - -Smart contract state includes data variables, data maps, the contract code, and -type metadata. All of this state is represented in the MARF via a layer of indirection. - -Contract and type metadata is _not_ committed to by the MARF. The MARF only -binds the contract's fully-qualified name to a "contract commitment" structure, -comprised of the contract's source code hash and the block height at which it -was instantiated. This contract commitment, in turn, is used to refer to -implementation-defined contract analysis data, including the computed AST, cost -analysis, type information, and so on. - -A contract commitment structure is comprised of the SHA512/256 hash of the -contract source code body (taken verbatim from the transaction), and the block -height at which the transaction containing it was mined. The contract -commitment is serialized as follows: - -* Bytes 0-64: the ASCII-encoding of the hash -* Bytes 65-72: the ASCII-encoding of the block height, itself as a big-endian - unsigned 32-bit integer. - -Example: The contract commitment of a contract whose code's SHA512/256 hash is -`d8faa525ecb3661e7f88f0bd18b8f6676ec3c96fcd5915cf47d48778da1b7ce0` at block -height 123456 would be `"d8faa525ecb3661e7f88f0bd18b8f6676ec3c96fcd5915cf47d48778da1b7ce0402e0100"`. - -When processing a new contract, the Stacks node only commits to the serialized -contract commitment structure, and stores its analysis data separately. For -example, the reference implementation uses the contract commitment structure as -a key prefix in a separate key/value store for loading and storing its contract -analysis data. - -The MARF commits to the contract by inserting this key/value pair: - -* Key: the string `"clarity-contract::", followed by the fully-qualified - contract identifier. -* Value: A serialized `ContractCommitment` structure. - -Example: `"clarity-contract::SP13N5TE1FBBGRZD1FCM49QDGN32WAXM2E5F8WT2G.example-contract"` -refer to the contract commitment for the contract -`SP13N5TE1FBBGRZD1FCM49QDGN32WAXM2E5F8WT2G.example-contract`. - -### Cryptographic Commitment - -The various key/value sets that make up the materialized view of the fork are -each indexed within the same MARF. To validate an anchored block, each Stacks -peer will: - -* Load the state of the MARF as of the anchored block's parent anchored block. -* Insert a mapping between this anchored block's height and a sentinel anchor - hash (see below) -* Insert a mapping between the parent anchored block's height and its "anchor - hash" derived from both the parent block's hash and the burnchain block that - selected it (see below) -* Process all transactions in this anchored block's parent microblock stream, - thereby adding all keys and values described above to the materialized view. -* Process all transactions in the anchored block, thereby adding all keys and - values described above to the materialized view. -* Insert the rewards from the latest now-matured block (i.e. the - leader reward for the Stacks block 100 epochs ago in this fork) into the -leader rewards contract in the Stacks chain boot code. This rewards the leader -and all users that burned in support of the leader's block. - -Once this process is complete, the Stacks peer checks the root hash of its MARF -against the root hash in the anchored block. If they match, then the block is -accepted into the chain state. If they do not match, then the block is invalid. - -#### Measuring Block Height - -Stacks counts its forks' lengths on a per-fork basis within each fork's MARF. -To do so, a leader always inserts four key/value pairs into the MARF when it -starts processing the next cryptographic commitment: two to map the block's parent's height to -its anchor hash and vice versa, two to map this block's height to a sentinel -anchor hash (and vice versa), and one to represent this block's height. -These are always added before processing any transactions. - -The anchored block's _anchor hash_ is the SHA512/256 hash of the anchored block's -header concatenated with the hash of the underlying burn chain block's header. -For example, if an anchored block's header's hash is -`7f3f0c0d5219f51459578305ed2bbc198588758da85d08024c79c1195d1cd611`, and the -underlying burn chain's block header hash is -`e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfdac795c9d84101eb317`, then the -(litte-endian) anchor hash would be -`7fbeb26cae32d96dbc1329f7e59f821b2c99b0a71943e153c071906ca7205f5f`. In the case -where Bitcoin is the burn chain, the block's header hash is the double-SHA256 of -its header, in little-endian byte order (i.e. the 0's are trailing). - -When beginning to process the anchored block (and similarly, when a leader -begins to produce its anchored block), the peer adds the following key/value -pairs to the MARF, in this order: - -* Key: The string - `"_MARF_BLOCK_HEIGHT_TO_HASH::af425f228a92ebe4d7741b129bb2c2f4326179f682da305b030250ccea9d4cd5"` -* Value: the height of the current Stacks block, encoded as a 4-byte - little-endian 32-bit integer - -The hash `af425f228a92ebe4d7741b129bb2c2f4326179f682da305b030250ccea9d4cd5` is -the sentinel anchored hash. It is the SHA512/256 hash of a 64 `0x01` bytes -- -equivalent to calculating an anchored hash from a Stacks block header and a burn -chain block header whose hashes were both `0101010101010101010101010101010101010101010101010101010101010101`. - -* Key: The string `"_MARF_BLOCK_HASH_TO_HEIGHT::"`, followed by the ASCII string - representation of the block height -* Value: the 32-byte sentinel anchor hash - -Example: The key `"_MARF_BLOCK_HEIGHT_TO_HASH:124"` would map to the sentinel -anchor hash if the Stacks block being appended was the 124th block in the fork. - -* Key: The string `"_MARF_BLOCK_HEIGHT_SELF"` -* Value: the ASCII representation of the block's height. - -Example: The key `"_MARF_BLOCK_HEIGHT_SELF"` would map to the string `"123"` if -this was the 123rd block in this fork. - -* Key: The string `"_MARF_BLOCK_HEIGHT_TO_HASH::"`, followed by the ASCII string -representation of the anchored block's parent's height. Note that when -processing an anchored block, the parent's block hash will be known, so the -sentinel anchor hash is _not_ used. The only exception is the boot block (see -below) -* Value: The 32-byte anchor hash of the block - -Example: The key `"_MARF_BLOCK_HEIGHT_TO_HASH::123"` would map to the anchor -hash of the 123rd anchored Stacks block. - -* Key: The string "_MARF_BLOCK_HASH_TO_HEIGHT::"`, followed by 64 characters in - the ASCII range `[0-9a-f]`. -* Value: The little-endian 32-bit block height - -Example: The key `"_MARF_BLOCK_HASH_TO_HEIGHT::7fbeb26cae32d96dbc1329f7e59f821b2c99b0a71943e153c071906ca7205f5f"` -would map to the height of the block whose anchored hash was -`7fbeb26cae32d96dbc1329f7e59f821b2c99b0a71943e153c071906ca7205f5f`. - -Using these five key/value pairs, the MARF is able to represent the height of -a fork terminating in a given block hash, and look up the height of a block in a -fork, given its anchor hash. - -### Processing the Boot Block - -The first-ever block in the Stacks v2 chain is the **boot block**. It contains -a set of smart contracts and initialization code for setting up miner reward -maturation, for handling BNS names, for migrating BNS state from Stacks v1, and -so on. - -When processing the boot block, the anchor hash will always be -`8aeecfa0b9f2ac7818863b1362241e4f32d06b100ae9d1c0fbcc4ed61b91b17a`, which is -equal to the anchor hash calculated from a Stacks block header hash and a -burnchain block header hash of all 0's. The `_MARF_BLOCK_HASH_TO_HEIGHT::0` key -will be mapped to this ASCII-encoded hash, the key `_MARF_BLOCK_HASH_TO_HEIGHT::8aeecfa0b9f2ac7818863b1362241e4f32d06b100ae9d1c0fbcc4ed61b91b17a` -will be mapped to `"0"`, and `_MARF_BLOCK_HEIGHT_SELF` will be mapped to `"0"`. After these three keys are inserted, the block is -processed like a normal Stacks anchored block. The boot block has no parent, -and so it will not have height-to-hash mappings for one. - -When processing a subsequent block that builds directly on top of the boot -block, the parent Stacks block header hash should be all 0's. - -### References - -[1] Basu, Easley, O'Hara, Sirer. [Towards a Functional Fee Market for Cryptocurrencies](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3318327) +This SIP is now located in the [stacksgov/sips repository](https://github.com/stacksgov/sips/blob/main/sips/sip-005/sip-005-blocks-and-transactions.md) as part of the [Stacks Community Governance organization](https://github.com/stacksgov). diff --git a/sip/sip-006-runtime-cost-assessment.md b/sip/sip-006-runtime-cost-assessment.md index 0b1db7dcd..019e6173f 100644 --- a/sip/sip-006-runtime-cost-assessment.md +++ b/sip/sip-006-runtime-cost-assessment.md @@ -1,1012 +1,5 @@ -# SIP 006 Clarity Execution Cost Assessment +# SIP-006 Clarity Execution Cost Assessment -## Preamble +This document formerly contained SIP-006 before the Stacks 2.0 mainnet launched. -Title: Clarity Execution Cost Assessment - -Author: Aaron Blankstein , Reed Rosenbluth - -Status: Draft - -Type: Standard - -Created: 10/19/2019 - -License: BSD 2-Clause - -# Abstract - -This document describes the measured costs and asymptotic costs -assessed for the execution of Clarity code. This will not specify the -_constants_ associated with those asymptotic cost functions. Those -constants will necessarily be measured via benchmark harnesses and -regression analyses. Furthermore, the _analysis_ cost associated with -this code will not be covered by this proposal. - -The asymptotic cost functions for Clarity functions are modifiable -via an on-chain voting mechanism. This enables new native functions to -be added to the language over time. - -This document also describes the memory limit imposed during contract -execution, and the memory model for enforcing that limit. - -# Measurements for Execution Cost - -Execution cost of a block of Clarity code is broken into 5 categories: - -1. Runtime cost: captures the number of cycles that a single - processor would require to process the Clarity block. This is a - _unitless_ metric, so it will not correspond directly to cycles, - but rather is meant to provide a basis for comparison between - different Clarity code blocks. -2. Data write count: captures the number of independent writes - performed on the underlying data store (see SIP-004). -3. Data read count: captures the number of independent reads - performed on the underlying data store. -4. Data write length: the number of bytes written to the underlying - data store. -5. Data read length: the number of bytes read from the underlying - data store. - -Importantly, these costs are used to set a _block limit_ for each -block. When it comes to selecting transactions for inclusion in a -block, miners are free to make their own choices based on transaction -fees, however, blocks may not exceed the _block limit_. If they do so, -the block is considered invalid by the network --- none of the block's -transactions will be materialized and the leader forfeits all rewards -from the block. - -# Static versus Dynamic Cost Assessment - -Tracking the execution cost of a contract may be done either dynamically -or statically. Dynamic cost assessment involves tracking, at the VM level, -the various metrics as a contract is executed. Static cost assessment is -performed via analysis of the contract source code, and is inherently -a more pessimistic accounting of the execution cost: list operations -are charged according to the _maximum_ size of the list (per the type -annotations and inferences from the source code) and branching statements -are charged according to the _maximum_ branch cost (per metric tracked, i.e., -if one branch performs 1 write and has a runtime cost of 1, and another -branch performs 0 writes and has a runtime cost of 2, the whole statement -will be assessed as having a maximum of 1 write and runtime cost of 2). - -# Costs of Common Operations - -### Variable Lookup - -Looking up variables in Clarity incurs a non-constant cost -- the stack -depth _and_ the length of the variable name affect this cost. However, -variable names in Clarity have bounded length -- 128 characters. Therefore, -the cost assessed for variable lookups may safely be constant with respect -to name length. - -The stack depth affects the lookup cost because the variable must be -checked for in each context on the stack. - -The cost model of Clarity depends on a copy-on-read semantic for -objects. This allows operations like appends, matches, wrapping/unwrapping, -to be constant cost, but it requires that variable lookups be charged for -copies. - -Cost Function: - -``` -a*X+b*Y+c -``` - -where a, b, and c are constants, -X := stack depth -Y := variable size - -### Function Lookup - -Looking up a function in Clarity incurs a constant cost with respect -to name length (for the same reason as variable lookup). However, -because functions may only be defined in the top-level contract -context, stack depth does not affect function lookup. - -Cost Function: - -``` -a -``` - -where a is a constant. - -### Name Binding - -The cost of binding a name in Clarity -- in either a local or the contract -context is _constant_ with respect to the length of the name: - -``` -binding_cost = a -``` - -where a is a constant - -### Function Application - -Function application in Clarity incurs a cost in addition to the -cost of executing the function's body. This cost is the cost of -binding the arguments to their passed values, and the cost of -ensuring that those arguments are of the correct type. Type checks -and argument binding are _linear_ in the size of the arguments. - -The cost of applying a function is: - - -``` -(a*X+b) + costEval(body) -``` - -where a and b are constants, -X := the cumulative size of the argument types, -costEval(body) := the cost of executing the body of the function - -### contract-call Transactions - -User-signed transactions for contract-calls are charged for the -application of the function, as well as the loading of the contract -data. This charge is the same as a normal contract-call. _However_, -contract principals that are supplied as trait arguments must be -checked by the runtime system to ensure that they validly implement -the trait. The cost of this check is: - -``` -read_count = 2 -read_length = trait_size + contract_size -runtime_cost = a*(contract_size) + b*(trait_size) + c -``` - -This check needs to read the trait, and then validate that the supplied -contract fulfills that trait by reading the contract in, and checking -the method signatures. This check must be performed for each such -trait parameter. - -### Type Parsing - -Parsing a type in Clarity incurs a linear cost in the size of the -AST describing the type: - -``` -type_parsing_cost(X) = (a*X+b) -``` - -where a, b, are constants, -X := the number of elements in the type description AST - -The type description AST is the tree of Clarity language elements used -for describing the type, e.g.: - -* `(list 1 uint)` - this AST has four elements: `list`, `1`, `uint` - and the parentheses containing them. -* `(response bool int)` - this AST has four elements: `response`, `bool`, `int` - and the parentheses containing them. -* `int` - this AST is just one component. - -### Function Definition - -Defining a function in Clarity incurs an execution cost at the -time of contract publishing (unrelated to any analysis). This -is the cost of _parsing_ the function's signature, which is linear -in the length of the type signatures, and linear in the length of the -function name and argument names. - -``` -binding_cost + sum(a + type_parsing_cost(Y) for Y in ARG_TYPES) -``` - -`type_parsing_cost(Y)` := the cost of parsing argument Y -ARG_TYPES := the function definition's argument type signatures -and a is a constant associated with the binding of argument types. - -### Contract Storage Cost - -Storing a contract incurs both a runtime cost as well as storage costs. Both of -these are _linear_ the size of the contract AST. - -``` -WRITE_LENGTH = a*X+b -RUNTIME_COST = c*X+d -``` - -where a, b, c, and d, are constants. - -# Initial Native Function Costs - -These are the initial set values for native function costs, however, these -can be changed as described below in the [Cost Upgrades](#cost-upgrades) -section of this document. - -## Data, Token, Contract-Calls ## - -### Data Lookup Costs - -Fetching data from the datastore requires hashing the key to be looked up. -That cost is linear in the key size: - -``` -data_hash_cost(X) = a*X+b -``` - -X := size of the key - -### Data Fetching Costs - -Fetching data from the datastore incurs a runtime cost, in addition to -any costs associated with MARF accesses (which are simply counted as the -integer number of times the MARF is accessed). That runtime cost -is _linear_ in the size of the fetched value (due to parsing). - -``` -read_data_cost = a*X+b -``` - -X := size of the fetched value. - -### Data Writing Costs - -Writing data to the datastore incurs a runtime cost, in addition to -any costs associated with MARF writes (which are simply counted as the -integer number of times the MARF is written). That runtime cost -is _linear_ in the size of the written value (due to data serialization). - -``` -write_data_cost = a*X+b -``` - -X := size of the stored value. - -### contract-call - -Contract calls incur the cost of a normal function lookup and -application, plus the cost of loading that contract into memory from -the data store (which is linear in the size of the called contract). - -``` -RUNTIME_COST: (a*Y+b) + func_lookup_apply_eval(X) -READ_LENGTH: Y -``` - -where a and b are constants, -Y := called contract size -`func_lookup_apply_eval(X)` := the cost of looking up, applying, and -evaluating the body of the function - - -Note that contract-calls that use _trait_ definitions for dynamic dispatch -are _not_ charged at a different cost rate. Instead, there is a cost for -looking up the trait variable (assessed as a variable lookup), and the cost -of validating any supplied trait implementors is assessed during a transaction's -argument validation. - -### map-get - -``` -RUNTIME_COST: data_hash_cost(X+Y) + read_data_cost(Z) -READ_LENGTH: Z -``` - -X := size of the map's _key_ tuple -Z := the size of the map's _value_ tuple - - -### contract-map-get - -``` -RUNTIME_COST: data_hash_cost(X) + read_data_cost(Z) -READ_LENGTH: Z -``` - -X := size of the map's _key_ tuple -Z := the size of the map's _value_ tuple - -### map-set - -``` -RUNTIME_COST: data_hash_cost(X+Y) + write_data_cost(Z) -WRITE_LENGTH: Z -``` - -X := size of the map's _key_ tuple -Z := the size of the map's _value_ tuple - -### map-insert - -``` -RUNTIME_COST: data_hash_cost(X+Y) + write_data_cost(Z) -WRITE_LENGTH: Z -``` - -X := size of the map's _key_ tuple -Z := the size of the map's _value_ tuple - -### map-delete - -``` -RUNTIME_COST: data_hash_cost(X+Y) + write_data_cost(1) -WRITE_LENGTH: 1 -``` - -X := size of the map's _key_ tuple -Y := the length of the map's name - -### var-get - -``` -RUNTIME_COST: data_hash_cost(1) + read_data_cost(Y) -READ_LENGTH: Y -``` - -Y := the size of the variable's _value_ type - -### var-set - -``` -RUNTIME_COST: data_hash_cost(1) + write_data_cost(Y) -WRITE_LENGTH: Y -``` - -Y := the size of the variable's _value_ type - -### nft-mint - -``` -RUNTIME_COST: data_hash_cost(Y) + write_data_cost(a) + b -WRITE_LENGTH: a -``` - -Y := size of the NFT type - -a is a constant: the size of a token owner -b is a constant cost (for tracking the asset in the assetmap) - -### nft-get-owner - -``` -RUNTIME_COST: data_hash_cost(Y) + read_data_cost(a) -READ_LENGTH: a -``` - -Y := size of the NFT type - -a is a constant: the size of a token owner - - -### nft-transfer - -``` -RUNTIME_COST: data_hash_cost(Y) + write_data_cost(a) + write_data_cost(a) + b -READ_LENGTH: a -WRITE_LENGTH: a -``` - -Y := size of the NFT type - -a is a constant: the size of a token owner -b is a constant cost (for tracking the asset in the assetmap) - -### ft-mint - -Minting a token is a constant-time operation that performs a constant -number of reads and writes (to check the total supply of tokens and -incremement). - -``` -RUNTIME: a -READ_LENGTH: b -WRITE_LENGTH: c -``` -a, b, and c are all constants. - -### ft-transfer - -Transfering a token is a constant-time operation that performs a constant -number of reads and writes (to check the token balances). - -``` -RUNTIME: a -READ_LENGTH: b -WRITE_LENGTH: c -``` -a, b, and c are all constants. - -### ft-get-balance - -Getting a token balance is a constant-time operation that performs a -constant number of reads. - -``` -RUNTIME: a -READ_LENGTH: b -``` -a and b are constants. - -### get-block-info - -``` -RUNTIME: a -READ_LENGTH: b -``` - -a and b are constants. - -## Control-Flow and Context Manipulation - -### let - -In addition to the cost of evaluating the body expressions of a `let`, -the cost of a `let` expression has a constant cost, plus -the cost of binding each variable in the new context (similar -to the cost of function evaluation, without the cost of type checks). - - -``` -a + b * Y + costEval(body) + costEval(bindings) -``` - -where a and b are constants, -Y := the number of let arguments -costEval(body) := the cost of executing the body of the let -costEval(bindings) := the cost of evaluating the value of each let binding - -### if - -``` -a + costEval(condition) + costEval(chosenBranch) -``` - -where a is a constant -costEval(condition) := the cost of evaluating the if condition -costEval(chosenBranch) := the cost of evaluating the chosen branch - -if computed during _static analysis_, the chosen branch cost is the -`max` of the two possible branches. - -### asserts! - -``` -a + costEval(condition) + costEval(throwBranch) -``` - -where a is a constant -costEval(condition) := the cost of evaluating the asserts condition -costEval(throwBranch) := the cost of evaluating the throw branch in -the event that condition is false - -if computed during _static analysis_, the thrown branch cost is always -included. - -## List and Buffer iteration -### append - -The cost of appending an item to a list is the cost of checking the -type of the added item, plus some fixed cost. - -``` -a + b * X -``` - -where a and b is a constant -X := the size of the list _entry_ type - -### concat - -The cost of concatting two lists or buffers is linear in -the size of the two sequences: - -``` -a + b * (X+Y) -``` - -where a and b are constants -X := the size of the right-hand iterable -Y := the size of the left-hand iterable - -### as-max-len? - -The cost of evaluating an `as-max-len?` function is constant (the function -is performing a constant-time length check) - -### map - -The cost of mapping a list is the cost of the function lookup, -and the cost of each iterated function application - -``` -a + func_lookup_cost(F) + L * apply_eval_cost(F, i) -``` - -where a is a constant, -`func_lookup_cost(F)` := the cost of looking up the function name F -`apply_eval_cost(F, i)` := the cost of applying and evaluating the body of F on type i -`i` := the list _item_ type -`L` := the list length - -if computed during _static analysis_, L is the maximum length of the list -as specified by it's type. - -### filter - -The cost of filtering a list is the cost of the function lookup, -and the cost of each iterated filter application - -``` -a + func_lookup_cost(F) + L * apply_eval_cost(F, i) -``` - -where a is a constant, -`func_lookup_cost(F)` := the cost of looking up the function name F -`apply_eval_cost(F, i)` := the cost of applying and evaluating the body of F on type i -`i` := the list _item_ type -`L` := the list length - -if computed during _static analysis_, L is the maximum length of the list -as specified by it's type. - -### fold - - -The cost of folding a list is the cost of the function lookup, -and the cost of each iterated application - -``` -a + func_lookup_cost(F) + (L) * apply_eval_cost(F, i, j) -``` - -where a is a constant, -`func_lookup_cost(F)` := the cost of looking up the function name F -`apply_eval_cost(F, i, j)` := the cost of applying and evaluating the body of F on types i, j -`j` := the accumulator type -`i` := the list _item_ type -`L` := the list length - -if computed during _static analysis_, L is the maximum length of the list -as specified by it's type. - -### len - -The cost of getting a list length is constant, because Clarity lists -store their lengths. - -### list - -The cost of constructing a new list is linear -- Clarity ensures that -each item in the list is of a matching type. - -``` -a*X+b -``` - -where a and b are constants, -X := the total size of all arguments to the list constructor - -### tuple - -The cost of constructing a new tuple is `O(nlogn)` with respect to the number of -keys in the tuple (because tuples are represented as BTrees). - -``` -a*(X*log(X)) + b -``` - -where a and b are constants, -X := the number of keys in the tuple - -### get - -Reading from a tuple is `O(nlogn)` with respect to the number of -keys in the tuple (because tuples are represented as BTrees). - -``` -a*(X*log(X)) + b -``` - -where a and b are constants, -X := the number of keys in the tuple - -## Option/Response Operations - -### match - -Match imposes a constant cost for evaluating the match, a cost for checking -that the match-bound name does not _shadow_ a previous variable. The -total cost of execution is: - -``` -a + evalCost(chosenBranch) + cost(lookupVariable) -``` - -where a is a constant, and `chosenBranch` is whichever branch -is chosen by the match. In static analysis, this will be: -`max(branch1, branch2)` - -### is-some, is-none, is-error, is-okay - -These check functions all have constant cost. - -### unwrap, unwrap-err, unwrap-panic, unwrap-err-panic, try! - -These functions all have constant cost. - -## Arithmetic and Logic Operations - -### Variadic operators - -The variadic operators (`+`,`-`,`/`,`*`, `and`, `or`) all have costs linear -in the _number_ of arguments supplied - -``` -(a*X+b) -``` - -where X is the number of arguments - -### Binary/Unary operators - -The binary and unary operators: - -``` -> ->= -< -<= -mod -pow -xor -not -to-int -to-uint -``` - -all have constant cost, because their inputs are all of fixed sizes. - -### Hashing functions - -The hashing functions have linear runtime costs: the larger the value being -hashed, the longer the hashing function takes. - -``` -(a*X+b) -``` - -where X is the size of the input. - - -# Memory Model and Limits - -Clarity contract execution imposes a maximum memory usage limit for applications. -For any given Clarity value, the memory usage of that value is counted using -the _size_ of the Clarity value. - -Memory is consumed by the following variable bindings: - -* `let` - each value bound in the `let` consumes that amount of memory - during the execution of the `let` block. -* `match` - the bound value in a `match` statement consumes memory during - the execution of the `match` branch. -* function arguments - each bound value consumes memory during the execution - of the function. this includes user-defined functions _as well as_ native - functions. - -Additionally, functions that perform _context changes_ also consume memory, -though they consume a constant amount: - -* `as-contract` -* `at-block` - -## Type signature size - -Types in Clarity may be described using type signatures. For example, -`(tuple (a int) (b int))` describes a tuple with two keys `a` and `b` -of type `int`. These type descriptions are used by the Clarity analysis -passes to check the type correctness of Clarity code. Clarity type signatures -have varying size, e.g., the signature `int` is smaller than the signature for a -list of integers. - -The size of a Clarity value is defined as follows: - -``` -type_size(x) := - if x = - int => 16 - uint => 16 - bool => 1 - principal => 148 - (buff y) => 4 + y - (some y) => 1 + size(y) - (ok y) => 1 + size(y) - (err y) => 1 + size(y) - (list ...) => 4 + sum(size(z) for z in list) - (tuple ...) => 1 + 2*(count(entries)) - + sum(size(z) for each value z in tuple) -``` - -## Contract Memory Consumption - -Contract execution requires loading the contract's program state in -memory. That program state counts towards the memory limit when -executed via a programmatic `contract-call!` or invoked by a -contract-call transaction. - -The memory consumed by a contract is equal to: - -``` -a + b*contract_length + sum(size(x) for each constant x defined in the contract) -``` - -That is, a contract consumes memory which is linear in the contract's -length _plus_ the amount of memory consumed by any constants defined -using `define-constant`. - -## Database Writes - -While data stored in the database itself does _not_ count against the -memory limit, supporting public function abort/commit behavior requires -holding a write log in memory during the processing of a transaction. - -Operations that write data to the data store therefore consume memory -_until the transaction completes_, and the write log is written to the -database. The amount of memory consumed by operations on persisted data -types is defined as: - -* `data-var`: the size of the stored data var's value. -* `map`: the size of stored key + the size of the stored value. -* `nft`: the size of the NFT key -* `ft`: the size of a Clarity uint value. - -# Cost Upgrades - -In order to enable the addition of new native functions to the Clarity -language, there is a mechanism for voting on and changing a function's -_cost-assessment function_ (the asymptotic function that describes its -complexity and is used to calculate its cost). - -New functions can be introduced to Clarity by first being implemented -_in_ Clarity and published in a contract. This is necessary to avoid -a hard fork of the blockchain and ensure the availability of the function -across the network. - -If it is decided that a published function should be a part of the Clarity -language, it can then be re-implemented as a native function in Rust and -included in a future release of Clarity. Nodes running this upgraded VM -would instead use this new native implementation of the function when they -encounter Clarity code that references it via `contract-call?`. - -The new native function is likely faster and more efficient than the -prior Clarity implementation, so a new cost-assessment function may need -to be chosen for its evaluation. Until a new cost-assessment function is -agreed upon, the cost of the Clarity implementation will continue to be used. - -## Voting on the Cost of Clarity Functions - -New and more accurate cost-assessment functions can be agreed upon as -follows: - -1. A user formulates a cost-assessment function and publishes it in a -Clarity contract as a `define-read-only` function. -2. The user proposes the cost-assessment function by calling the -`submit-proposal` function in the **Clarity Cost Voting Contract**. -3. Voting on the proposed function ensues via the voting functions in -the **Clarity Cost Voting Contract**, and the function is either -selected as a replacement for the existing cost-assessment function, -or ignored. - -### Publishing a Cost-Assessment Function - -Cost-assessment functions to be included in proposals can be published -in Clarity contracts. They must be written precisely as follows: - -1. The function must be `read-only`. -2. The function must return a tuple with the keys `runtime`, `write_length`, -`write_count`, `read_count`, and `read_length` (the execution cost measurement -[categories](#measurements-for-execution-cost)). -3. The values of the returned tuple must be Clarity expressions representing -the asymptotic cost functions. These expressions must be limited to **arithmetic operations**. -4. Variables used in these expressions must be listed as arguments to the Clarity -function. -5. Constant factors can be set directly in the code. - -For example, suppose we have a function that implements a sorting algorithm: - -```lisp -(define-public (quick-sort (input (list 1024 int))) ... ) -``` - -The cost-assessment function should have _one_ argument, corresponding to the size -of the `input` field, e.g. - -```lisp -(define-read-only (quick-sort-cost (n uint)) - { - runtime: (* n (log n)), - write_length: 0, - write_count: 0, - read_count: 0, - read_length: 0, - }) -``` - -Here's another example where the cost function contains constant factors: - -```lisp -(define-read-only (quick-sort-cost (n uint)) - { - runtime: (+ 30 (* 2 n (log n))), - write_length: 0, - write_count: 0, - read_count: 0, - read_length: 0 - }) -``` - -### Making a Cost-Assessment Function Proposal - -Stacks holders can propose cost-assessment functions by calling the -`submit-proposal` function in the **Clarity Cost Voting Contract**. - -```Lisp -(define-public (submit-proposal - (function-contract principal) - (function-name (string-ascii 128)) - (cost-function-contract principal) - (cost-function-name (string-ascii 128)) - ... -) -``` - -Description of `submit-proposal` arguments: -- `function-contract`: the principal of the contract that defines -the function for which a cost is being proposed -- `function-name`: the name of the function for which a cost is being proposed -- `cost-function-contract-principal`: the principal of the contract that defines -the cost function being proposed -- `cost-function-name`: the name of the cost-function being proposed - -If submitting a proposal for a native function included in Clarity at boot, -provide the principal of the boot costs contract for the `function-contract` -argument, and the name of the corresponding cost function for the `function-name` -argument. - -This function will return a response containing the proposal ID, if successful, -and an error otherwise. - -Usage: -```Lisp -(contract-call? - .cost-voting-contract - "submit-proposal" - .function-contract - "function-name" - .new-cost-function-contract - "new-cost-function-name" -) -``` - -### Viewing a Proposal - -To view cost-assessment function proposals, you can use the -`get-proposal` function in the **Clarity Cost Voting Contract**: - -```Lisp -(define-read-only (get-proposal (proposal-id uint)) ... ) -``` - -This function takes a `proposal-id` and returns a response containing the proposal -data tuple, if a proposal with the supplied ID exists, or an error code. Proposal -data tuples contain information about the state of the proposal, and take the -following form: - -```Lisp -{ - cost-function-contract: principal, - cost-function-name: (string-ascii 128), - function-contract: principal, - function-name: (string-ascii 128), - expiration-block-height: uint -} -``` - -Usage: -```Lisp -(contract-call? .cost-voting-contract "get-proposal" 123) -``` - -### Voting - -#### Stacks Holders - -Stacks holders can vote for cost-assessment function proposals by calling the -**Clarity Cost Voting Contract's** `vote-proposal` function. The `vote-proposal` -function takes two arguments, `proposal-id` and `amount`. `proposal-id` is the ID -of the proposal being voted for. `amount` is the amount of STX the voter would like -to vote *with*. The amount of STX you include is the number of votes you are casting. - -Calling the `vote` function authorizes the contract to transfer STX out -of the caller's address, and into the address of the contract. The equivalent -amount of `cost-vote-tokens` will be distributed to the voter, representing the -amount of STX they have staked in the voting contract. - -STX staked for voting can be withdrawn from the voting contract by the voter with -the `withdraw-votes` function. If staked STX are withdrawn prior to confirmation, -they will not be counted as votes. - -Upon withdrawal, the voter permits the contract to reclaim allocated `CFV` tokens, -and will receive the equivalent amount of their staked STX tokens. - -**Voting example** -```Lisp -(contract-call? .cost-voting-contract "vote-proposal" 123 10000) -``` - -The `vote-proposal` function will return a successful response if the STX were staked -for voting, or an error if the staking failed. - -**Withdrawal example** -```Lisp -(contract-call? .cost-voting-contract "withdraw-votes" 123 10000) -``` - -Like the `vote-proposal` function, the `withdraw-votes` function expects a `proposal-id` and -an `amount`, letting the voter withdraw some or all of their staked STX. This function -will return a successful response if the STX were withdrawn, and an error otherwise. - -#### Miner Veto - -Miners can vote *against* (veto) cost-function proposals by creating a transaction that -calls the **Clarity Cost Voting Contract's** `veto` function and mining -a block that includes this transaction. The `veto` function won't count -the veto if the block wasn't mined by the node that signed that transaction. -In other words, miners must **commit** their veto with their mining power. - -Usage: -```Lisp -(contract-call? .cost-voting-contract "veto" 123) -``` - -This function will return a successful response if the veto was counted, or -an error if the veto failed. - -### Confirming the Result of a Vote - -In order for a cost-function proposal to get successfully voted in, it must be -**confirmed**. Confirmation is a two step process, involving calling the `confirm-votes` -function _before_ the proposal has expired to confirm the vote threshold was met, -and calling the `confirm-miners` function _after_ to confirm that the proposal wasn't vetoed -by miners. - -#### Confirm Votes - -Any stacks holder can call the `confirm-votes` function in the **Clarity -Cost Voting Contract** to attempt confirmation. `confirm-votes` will return a -success response and become **vote confirmed** if the following criteria are met. - -1. The proposal must receive votes representing 20% of the liquid supply of STX. -This is calculated like such: - -```lisp -(>= (/ (* votes u100) stx-liquid-supply) u20) -``` - -2. The proposal must not be expired, meaning its `expiration-height` must -not have been reached. - -Usage: -```Lisp -(contract-call? .cost-voting-contract "confirm-votes" 123) -``` - -#### Confirm Miners - -Like `confirm-votes`, any stacks holder can call the `confirm-miners` function. -`confirm-miners` will return a success response and the proposal will become -**miner confirmed** if the following criteria are met: - -1. The number of vetos is less than 500. -2. There have been less than 10 proposals already confirmed in the current block. -3. The proposal has expired. - -Usage: -```Lisp -(contract-call? .cost-voting-contract "confirm-miners" 123) -``` +This SIP is now located in the [stacksgov/sips repository](https://github.com/stacksgov/sips/blob/main/sips/sip-006/sip-006-runtime-cost-assessment.md) as part of the [Stacks Community Governance organization](https://github.com/stacksgov). diff --git a/sip/sip-007-stacking-consensus.md b/sip/sip-007-stacking-consensus.md index 6a8a599df..37afb0723 100644 --- a/sip/sip-007-stacking-consensus.md +++ b/sip/sip-007-stacking-consensus.md @@ -1,611 +1,5 @@ -# SIP-007: Stacking Consensus +# SIP-007 Stacking Consensus -# Preamble +This document formerly contained SIP-007 before the Stacks 2.0 mainnet launched. -Title: Stacking Consensus - -Authors: - - Muneeb Ali , - Aaron Blankstein , - Michael J. Freedman , - Diwaker Gupta , - Jude Nelson , - Jesse Soslow , - Patrick Stanley - -Status: Draft - -Type: Standard - -Created: 01/14/2020 - -# Abstract - -This SIP proposes a new consensus algorithm, called Stacking, that -uses the proof-of-work cryptocurrency of an established blockchain to -secure a new blockchain. An economic benefit of the Stacking consensus -algorithm is that the holders of the new cryptocurrency can earn a -reward in a base cryptocurrency by actively participating in the -consensus algorithm. - -This SIP proposes to change the mining mechanism of the Stacks -blockchain. [SIP-001](./sip-001-burn-election.md) introduced -proof-of-burn (PoB) where a base cryptocurrency is destroyed to -participate in mining of a new cryptocurrency. This proposal argues -that a new mining mechanism called proof-of-transfer (PoX) will be an -improvement over proof-of-burn. - -With proof-of-transfer, instead of destroying the base cryptocurrency, -miners are required to distribute the base cryptocurrency to existing -holders of the new cryptocurrency who participate in the consensus -algorithm. Therefore, existing holders of the new cryptocurrency have -an economic incentive to participate, do useful work for the network, -and receive rewards. - -Proof-of-transfer avoids burning of the base cryptocurrency which -destroys some supply of the base cryptocurrency. Stacking in general -can be viewed as a more "efficient" algorithm where instead of -destroying a valuable resource (like electricity or base -cryptocurrency), the valuable resource is distributed to holders of -the new cryptocurrency. - -The SIP describes one potential implementation of the Stacking -consensus algorithm for the Stacks blockchain using Bitcoin as the -base cryptocurrency. - -# Introduction - -Consensus algorithms for public blockchains require computational or -financial resources to secure the blockchain state. Mining mechanisms -used by these algorithms are broadly divided into proof-of-work (PoW), -in which nodes dedicate computational resources, and proof-of-stake -(PoS), in which nodes dedicate financial resources. The intention -behind both proof-of-work and proof-of-stake is to make it practically -infeasible for any single malicious actor to have enough computational -power or ownership stake to attack the network. - -With proof-of-work, a miner does some "work" that consumes electricity -and is rewarded with digital currency. The miner is, theoretically, -converting electricity and computing power into the newly minted -digital currency. Bitcoin is an example of this and is by far the -largest and most secure PoW blockchain. - -With proof-of-stake, miners stake their holdings of a new digital -currency to participate in the consensus algorithm and bad behavior -can be penalized by "slashing" the funds of the miner. PoS requires -less energy/electricity to be consumed and can give holders of the new -cryptocurrency who participate in staking a reward on their holdings -in the new cryptocurrency. - -In this SIP we introduce a new consensus algorithm called -Stacking. The Stacking consensus algorithm uses a new type of mining -mechanism called *proof-of-transfer* (PoX). With PoX, miners are not -converting electricity and computing power to newly minted tokens, nor -are they staking their cryptocurrency. Rather they use an existing PoW -cryptocurrency to secure a new, separate blockchain. - -This SIP is currently a draft and proposes to change the mining -mechanism of the Stacks blockchain from proof-of-burn (SIP-001) to -proof-of-transfer. - -The PoX mining mechanism is a modification of proof-of-burn (PoB) -mining (See -the [Blockstack Technical Whitepaper](https://blockstack.org/papers) -and [SIP-001](./sip-001-burn-election.md)). In -proof-of-burn mining, miners burn a base cryptocurrency to participate -in mining — effectively destroying the base cryptocurrency to mint -units of a new cryptocurrency. **In proof-of-transfer, rather than -destroying the base cryptocurrency, miners transfer the base -cryptocurrency as a reward to owners of the new cryptocurrency**. In -the case of the Stacks blockchain, miners would transfer Bitcoin to -owners of Stacks tokens in order for miners to receive newly-minted -Stacks tokens. The security properties of proof-of-transfer are -comparable to proof-of-burn. - -# Stacking with Bitcoin - -In the Stacking consensus protocol, we require the base cryptocurrency -to be a proof-of-work blockchain. In this proposed implementation of -Stacking we assume that the PoW crypto-currency is Bitcoin, given it -is by far the most secure PoW blockchain. Theoretically, other PoW -blockchains can be used, but the security properties of Bitcoin are -currently superior to other PoW blockchains. - -As with PoB, in PoX, the protocol selects the winning miner (*i.e.*, -the leader) of a round using a verifiable random function (VRF). The -leader writes the new block of the Stacks blockchain and mints the -rewards (newly minted Stacks). However, instead of bitcoins being sent -to burn addresses, the bitcoins are sent to a set of specific -addresses corresponding to Stacks (STX) tokens holders that are adding -value to the network. Thus, rather than being destroyed, the bitcoins -consumed in the mining process go to productive Stacks holders as a -reward based on their holdings of Stacks and participation in the -Stacking algorithm. - -# Stacking Consensus Algorithm - -In addition to the normal tasks of PoB mining -(see [SIP-001](./sip-001-burn-election.md)), the Stacking consensus -algorithm *must* determine the set of addresses that miners may -validly transfer funds to. PoB mining does not need to perform these -steps, because the address is always the same — the burn -address. However, with Stacking, network participants must be able to -validate the addresses that are sent to. - -Progression in Stacking consensus happens over *reward cycles*. In -each reward cycle, a set of Bitcoin addresses are iterated over, such -that each Bitcoin address in the set of reward addresses has exactly -one Bitcoin block in which miners will transfer funds to the reward -address. - -To qualify for a reward cycle, an STX holder must: - - -* Control a Stacks wallet with >= 0.02% of the total share of unlocked - Stacks tokens (currently, there are ~470m unlocked Stacks tokens, - meaning this would require ~94k Stacks). This threshold level - adjusts based on the participation levels in the Stacking protocol. -* Broadcast a signed message before the reward cycle begins that: - * Locks the associated Stacks tokens for a protocol-specified - lockup period. - * Specifies a Bitcoin address to receive the funds. - * Votes on a Stacks chain tip. - -Miners participating in the Stacks blockchain compete to lead blocks -by transferring Bitcoin. Leaders for particular Stacks blocks are -chosen by sortition, weighted by the amount of Bitcoin sent (see -SIP-001). Before a reward cycle begins, the Stacks network must reach -consensus on which addresses are valid recipients. Reaching consensus -on this is non-trivial: the Stacks blockchain itself has many -properties independent from the Bitcoin blockchain, and may experience -forks, missing block data, etc., all of which make reaching consensus -difficult. As an extreme example, consider a miner that forks the -Stacks chain with a block that claims to hold a large fraction (e.g., -100%) of all Stacks holdings, and proceeds to issue block commitments -that pay all of the fees to themselves. How can other nodes on the -network detect that this miner’s commitment transfers are invalid? - -The Stacking algorithm addresses this with a two-phase cycle. Before -each reward cycle, Stacks nodes engage in a *prepare* phase, in which -two items are decided: - - -1. An **anchor block** — the anchor block is a Stacks chain block. For - the duration of the reward cycle, mining any descendant forks of - the anchor block requires transferring mining funds to the - appropriate reward addresses. -2. The **reward set** -- the reward set is the set of Bitcoin - addresses which will receive funds in the reward cycle. This set is - determined using Stacks chain state from the anchor block. - -During the reward cycle, miners contend with one another to become the -leader of the next Stacks block by broadcasting *block commitments* on -the Bitcoin chain. These block commitments send Bitcoin funds to -either a burn address or a PoX reward address. - -Address validity is determined according to two different rules: - - -1. If a miner is building off of any chain tip *which is not a - descendant of the anchor block*, all of the miner's commitment - funds must be burnt. -2. If a miner is building off a descendant of the anchor block, the - miner must send commitment funds to 2 addresses from the reward - set, chosen as follows: - * Use the verifiable random function (also used by sortition) to - choose 2 addresses from the reward set. These 2 addresses are - the reward addresses for this block. - * Once addresses have been chosen for a block, these addresses are - removed from the reward set, so that future blocks in the reward - cycle do not repeat the addresses. - -Note that the verifiable random function (VRF) used for address -selection ensures that the same addresses are chosen by each miner -selecting reward addresses. If a miner submits a burn commitment which -*does not* send funds to a valid address, those commitments are -ignored by the rest of the network (because other network participants -can deduce that the transfer addresses are invalid). - -To reduce the complexity of the consensus algorithm, Stacking reward -cycles are fixed length --- if fewer addresses participate in the -Stacking rewards than there are slots in the cycle, then the remaining -slots are filled with *burn* addresses. Burn addresses are included -in miner commitments at fixed intervals (e.g, if there are 1000 burn -addresses for a reward cycle, then each miner commitment would have -1 burn address as an output). - -## Adjusting Reward Threshold Based on Participation - -Each reward cycle may transfer miner funds to up to 4000 Bitcoin -addresses (2 addresses in a 2000 burn block cycle). To ensure that -this number of addresses is sufficient to cover the pool of -participants (given 100% participation of liquid STX), the threshold -for participation must be 0.025% (1/4000th) of the liquid supply of -STX. However, if participation is _lower_ than 100%, the reward pool -could admit lower STX holders. The Stacking protocol specifies **2 -operating levels**: - -* **25%** If fewer than `0.25 * STX_LIQUID_SUPPLY` STX participate in - a reward cycle, participant wallets controlling `x` STX may include - `floor(x / (0.0000625*STX_LIQUID_SUPPLY))` addresses in the reward set. - That is, the minimum participation threshold is 1/16,000th of the liquid - supply. -* **25%-100%** If between `0.25 * STX_LIQUID_SUPPLY` and `1.0 * - STX_LIQUID_SUPPLY` STX participate in a reward cycle, the reward - threshold is optimized in order to maximize the number of slots that - are filled. That is, the minimum threshold `T` for participation will be - roughly 1/4,000th of the participating STX (adjusted in increments - of 10,000 STX). Participant wallets controlling `x` STX may - include `floor(x / T)` addresses in the - reward set. - -In the event that a Stacker signals and locks up enough STX to submit -multiple reward addresses, but only submits one reward address, that -reward address will be included in the reward set multiple times. - -## Submitting Reward Address and Chain Tip Signaling - -Stacking participants must broadcast signed messages for three purposes: - -1. Indicating to the network how many STX should be locked up, and for - how many reward cycles. -2. Indicate support for a particular chain tip. -3. Specifying the Bitcoin address for receiving Stacking rewards. - -These messages may be broadcast either on the Stacks chain or the -Bitcoin chain. If broadcast on the Stacks chain, these messages must -be confirmed on the Stacks chain _before_ the anchor block for the -reward period. If broadcast on the Bitcoin chain, they may be -broadcast during the prepare phase, but must be included before -the prepare phase finishes. - -These signed messages are valid for at most 12 reward cycles (25200 Bitcoin -blocks or ~7 months). If the signed message specifies a lockup period `x` less -than 25200 blocks, then the signed message is only valid for Stacking -participation for `floor(x / 2100)` reward cycles (the minimum participation -length is one cycle: 2100 blocks). - - -# Anchor Blocks and Reward Consensus - -In the **prepare** phase of the Stacking algorithm, miners and network -participants determine the anchor block and the reward set. The -prepare phase is a window `w` of Bitcoin blocks *before* the reward -cycle begins (e.g., the window may be 100 Bitcoin blocks). - -At a high-level, nodes determine whether any block was confirmed by -`F*w` blocks during the phase, where `F` is a large fraction (e.g., -`0.8`). Once the window `w` closes at time `cur`, Stacks nodes find -the potential anchor block as described in the following pseudocode: - - -```python -def find_anchor_block(cur): - blocks_worked_on = get_all_stacks_blocks_between(cur - w, cur) - - # get the highest/latest ancestor before the PREPARE phase for each block worked - # on during the PREPARE phase. - - candidate_anchors = {} - for block in blocks_worked_on: - pre_window_ancestor = last_ancestor_of_block_before(block, cur - w) - if pre_window_ancestor is None: - continue - if pre_window_ancestor in candidate_anchors: - candidate_anchors[pre_window_ancestor] += 1 - else: - candidate_anchors[pre_window_ancestor] = 1 - - # if any block is confirmed by at least F*w, then it is the anchor block. - for candidate, confirmed_by_count in candidate_anchors.items(): - if confirmed_by_count >= F*w - return candidate - - return None -``` - -Note that there can be at most one anchor block (so long as `F > -0.5`), because: - -* Each of the `w` blocks in the prepare phase has at most one - candidate ancestor. -* The total possible number of confirmations for anchor blocks is `w`. -* If any block is confirmed by `>= 0.5*w`, then any other block must - have been confirmed by `< 0.5*w`. - -The prepare phase, and the high threshold for `F`, are necessary to -protect the Stacking consensus protocol from damage due to natural -forks, missing block data, and potentially malicious participants. As -proposed, PoX and the Stacking protocol require that Stacks nodes are -able to use the anchor block to determine the *reward set*. If, by -accident or malice, the data associated with the anchor block is -unavailable to nodes, then the Stacking protocol cannot operate -normally — nodes cannot know whether or not a miner is submitting -valid block commitments. A high threshold for `F` ensures that a large -fraction of the Stacks mining power has confirmed the receipt of the -data associated with the anchor block. - -## Recovery from Missing Data - -In the extreme event that a malicious miner *is* able to get a hidden -or invalid block accepted as an anchor block, Stacks nodes must be -able to continue operation. To do so, Stacks nodes treat missing -anchor block data as if no anchor block was chosen for the reward -cycle — the only valid election commitments will therefore be *burns* -(this is essentially a fallback to PoB). If anchor block data which -was previously missing is revealed to the Stacks node, it must -reprocess all of the leader elections for that anchor block's -associated reward cycle, because there may now be many commitments -which were previously invalid that are now valid. - -Reprocessing leader elections is computationally expensive, and -would likely result in a large reorganization of the Stacks -chain. However, such an election reprocessing may only occur once per -reward window (only one valid anchor block may exist for a reward -cycle, whether it was hidden or not). Crucially, intentionally -performing such an attack would require collusion amongst a large -fraction `F` of the Stacks mining power — because such a hidden block -must have been confirmed by `w*F` subsequent blocks. If collusion -amongst such a large fraction of the Stacks mining power is possible, -we contend that the security of the Stacks chain would be compromised -through other means beyond attacking anchor blocks. - -## Anchoring with Stacker Support. - -The security of anchor block selection is further increased through -Stacker support transactions. In this protocol, when Stacking -participants broadcast their signed participation messages, they -signal support of anchor blocks. This is specified by the chain tip's -hash, and the support signal is valid as long as the message itself is -valid. - -This places an additional requirement on anchor block selection. In -addition to an anchor block needing to reach a certain number of miner -confirmations, it must also pass some threshold `t` of valid Stacker -support message signals. This places an additional burden on an anchor -block attack --- not only must the attacker collude amongst a large -fraction of mining power, but they must also collude amongst a -majority of the Stacking participants in their block. - -# Stacker Delegation - -The process of delegation allows a Stacks wallet address (the -represented address) to designate another address (the delegate -address) for participating in the Stacking protocol. This delegate -address, for as long as the delegation is valid, is able to sign and -broadcast Stacking messages (i.e., messages which lock up Stacks, -designate the Bitcoin reward address, and signal support for chain -tips) on behalf of the represented address. This allows the owner of -the represented address to contribute to the security of the network -by having the delegate address signal support for chain tips. This -combats potential attacks on the blockchain stability by miners that -may attempt to mine hidden forks, hide eventually invalid forks, and -other forms of miner misbehavior. - -Supporting delegation adds two new transaction types to the Stacks -blockchain: - -* **Delegate Funds.** This transaction initiates a - represented-delegate relationship. It carries the following data: - * Delegate address - * End Block: the Bitcoin block height at which this relationship - terminates, unless a subsequent delegate funds transaction updates - the relationship. - * Delegated Amount: the total amount of STX from this address - that the delegate address will be able to issue Stacking messages - on behalf of. - * Reward Address (_optional_): a Bitcoin address that must be - designated as the funds recipient in the delegate’s Stacking - messages. If unspecified, the delegate can choose the address. -* **Terminate Delegation.** This transaction terminates a - represented-delegate relationship. It carries the following data: - * Delegate Address - -_Note_: There is only ever one active represented-delegate -relationship between a given represented address and delegate address -(i.e., the pair _(represented-address, delegate-address)_ uniquely -identifies a relationship). If a represented-delegate relationship is -still active and the represented address signs and broadcasts a new -"delegate funds" transaction, the information from the new transaction -replaces the prior relationship. - -Both types of delegation transactions must be signed by the -represented address. These are transactions on the Stacks blockchain, -and will be implemented via a native smart contract, loaded into the -blockchain during the Stacks 2.0 genesis block. These transactions, -therefore, are `contract-call` invocations. The invoked methods are -guarded by: - -``` - (asserts! (is-eq contract-caller tx-sender) (err u0)) -``` - -This insures that the methods can only be invoked by direct -transaction execution. - -**Evaluating Stacking messages in the context of delegation.** In -order to determine which addresses’ STX should be locked by a given -Stacking message, the message must include the represented address in -the Stacking message. Therefore, if a single Stacks address is the -delegate for many represented Stacks addresses, the delegate address -must broadcast a Stacking message for each of the represented -addresses. - -# Adressing Miner Consolidation in Stacking - -PoX when used for Stacking rewards could lead to miner -consolidation. Because miners that _also_ participate as Stackers -could gain an advantage over miners who do not participate as -Stackers, miners would be strongly incentivized to buy Stacks and use -it to crowd out other miners. In the extreme case, this consolidation -could lead to centralization of mining, which would undermine the -decentralization goals of the Stacks blockchain. While we are actively -investigating additional mechanisms to address this potential -consolidation, we propose a time-bounded PoX mechanism and a Stacker- -driven mechanism here. - -**Time-Bounded PoX.** Stacking rewards incentivize miner consolidation -if miners obtain _permanent_ advantages for obtaining the new -cryptocurrency. However, by limiting the time period of PoX, this -advantage declines over time. To do this, we define two time periods for Pox: - -1. **Initial Phase.** In this phase, Stacking rewards proceed as - described above -- commitment funds are sent to Stacking rewards - addresses, except if a miner is not mining a descendant of the - anchor block, or if the registered reward addresses for a given - reward cycle have all been exhausted. This phase will last for - approximately 2 years (100,000 Bitcoin blocks). - -2. **Sunset Phase.** After the initial phase, a _sunset_ block is - determined. This sunset block will be ~8 years (400,000 Bitcoin - blocks) after the sunset phase begins. After the sunset block, - _all_ miner commitments must be burned, rather than transfered to - reward addresses. During the sunset phase, the reward / burn ratio - linearly decreases by `0.25%` (1/400) on each reward cycle, such - that in the 200th reward cycle, the ratio of funds that are - transfered to reward addresses versus burnt must be equal to - `0.5`. For example, if a miner commits 10 BTC, the miner must send - 5 BTC to reward addresses and 5 BTC to the burn address. - -By time-bounding the PoX mechanism, we allow the Stacking protocol to -use PoX to help bootstrap support for the new blockchain, providing -miners and holders with incentives for participating in the network -early on. Then, as natural use cases for the blockchain develop and -gain steam, the PoX system could gradually scale down. - -**Stacker-driven PoX.** To further discourage miners from consolidating, -holders of liquid (i.e. non-Stacked) STX tokens may vote to disable PoX in the next upcoming -reward cycle. This can be done with any amount of STX, and the act of voting -to disable PoX does not lock the tokens. - -This allows a community of vigilent -users guard the chain from bad miner behavior arising from consolidation -on a case-by-case basis. Specifically, if a fraction _R_ of liquid STX -tokens vote to disable PoX, it is disabled -only for the next reward cycle. To continuously deactivate PoX, the STX -holders must continuously vote to disable it. - -Due to the costs of remaining vigilent, this proposal recomments _R = 0.25_. -At the time of this writing, this is higher than any single STX allocation, but -not so high that large-scale cooperation is needed to stop a mining cartel. - -# Bitcoin Wire Formats - -Supporting PoX in the Stacks blockchain requires modifications to the -wire format for leader block commitments, and the introduction of new -wire formats for burnchain PoX participation (e.g., performing the STX -lockup on the burnchain). - - -## Leader Block Commits - -For PoX, leader block commitments are similar to PoB block commits: the constraints on the -BTC transaction's inputs are the same, and the `OP_RETURN` output is identical. However, -the _burn output_ is no longer the same. For PoX, the following constraints are applied to -the second through nth outputs: - -1. If the block commitment is in a reward cycle, with a chosen anchor block, and this block - commitment builds off a descendant of the PoX anchor block (or the anchor block itself), - then the commitment must use the chosen PoX recipients for the current block. - a. PoX recipients are chosen as described in "Stacking Consensus Algorithm": addresses - are chosen without replacement, by using the previous burn block's sortition hash, - mixed with the previous burn block's burn header hash as the seed for the ChaCha12 - pseudorandom function to select M addresses. - b. The leader block commit transaction must use the selected M addresses as outputs [1, M] - That is, the second through (M+1)th output correspond to the select PoX addresses. - The order of these addresses does not matter. Each of these outputs must receive the - same amount of BTC. - c. If the number of remaining addresses in the reward set N is less than M, then the leader - block commit transaction must burn BTC by including (M-N) burn outputs. -2. Otherwise, the second through (M+1)th output must be burn addresses, and the amount burned by - these outputs will be counted as the amount committed to by the block commit. - -In addition, during the sunset phase (i.e., between the 100,000th and 500,000th burn block in the chain), -the miner must include a _sunset burn_ output. This is an M+1 indexed output that includes the burn amount -required to fulfill the sunset burn ratio, and must be sent to the burn address: - -``` -sunset_burn_amount = (total_block_commit_amount) * (reward_cycle_start_height - 100,000) / (400,000) -``` - -Where `total_block_commit_amount` is equal to the sum of outputs [1, M+1]. - -After the sunset phase _ends_ (i.e., blocks >= 500,000th burn block), block commits are _only_ burns, with -a single burn output at index 1. - -## STX Operations on Bitcoin - -As described above, PoX allows stackers to submit `stack-stx` -operations on Bitcoin as well as on the Stacks blockchain. The Stacks -chain also allows addresses to submit STX transfers on the Bitcoin -chain. Such operations are only evaluated by the miner of an anchor block -elected in the burn block that immediately follows the burn block that included the -operations. For example, if a `TransferStxOp` occurs in burnchain block 100, then the -Stacks block elected by burnchain block 101 will process that transfer. - -In order to submit on the Bitcoin chain, stackers must submit two Bitcoin transactions: - -* `PreStxOp`: this operation prepares the Stacks blockchain node to validate the subsequent - `StackStxOp` or `TransferStxOp`. -* `StackStxOp`: this operation executes the `stack-stx` operation. -* `TransferStxOp`: this operation transfers STX from a sender to a recipient - -The wire formats for the above operations are as follows: - -### PreStxOp - -This operation includes an `OP_RETURN` output for the first Bitcoin output that looks as follows: - -``` - 0 2 3 - |------|--| - magic op -``` - -Where `op = p` (ascii encoded). - -Then, the second Bitcoin output _must_ be Stacker address that will be used in a `StackStxOp`. This -address must be a standard address type parseable by the stacks-blockchain node. - -### StackStxOp - -The first input to the Bitcoin operation _must_ consume a UTXO that is -the second output of a `PreStxOp`. This validates that the `StackStxOp` was signed -by the appropriate Stacker address. - -This operation includes an `OP_RETURN` output for the first Bitcoin output: - -``` - 0 2 3 19 20 - |------|--|-----------------------------|---------| - magic op uSTX to lock (u128) cycles (u8) -``` - -Where `op = x` (ascii encoded). - -Where the unsigned integer is big-endian encoded. - -The second Bitcoin output will be used as the reward address for any stacking rewards. - -### TransferStxOp - -The first input to the Bitcoin operation _must_ consume a UTXO that is -the second output of a `PreStxOp`. This validates that the `TransferStxOp` was signed -by the appropriate STX address. - -This operation includes an `OP_RETURN` output for the first Bitcoin output: - -``` - 0 2 3 19 80 - |------|--|-----------------------------|---------| - magic op uSTX to transfer (u128) memo (up to 61 bytes) -``` - -Where `op = $` (ascii encoded). - -Where the unsigned integer is big-endian encoded. - -The second Bitcoin output is either a `p2pkh` or `p2sh` output such -that the recipient Stacks address can be derived from the -corresponding 20-byte hash (hash160). +This SIP is now located in the [stacksgov/sips repository](https://github.com/stacksgov/sips/blob/main/sips/sip-007/sip-007-stacking-consensus.md) as part of the [Stacks Community Governance organization](https://github.com/stacksgov). diff --git a/sip/sip-008-analysis-cost-assessment.md b/sip/sip-008-analysis-cost-assessment.md index 1eaaaf261..72813cdc0 100644 --- a/sip/sip-008-analysis-cost-assessment.md +++ b/sip/sip-008-analysis-cost-assessment.md @@ -1,333 +1,5 @@ -# SIP 008 Clarity Parsing and Analysis Cost Assessment - -## Preamble - -Title: Clarity Parsing and Analysis Cost Assessment - -Author: Aaron Blankstein - -Status: Draft - -Type: Standard - -Created: 03/05/2020 - -License: BSD 2-Clause - -# Abstract - -This document describes the measured costs and asymptotic costs -assessed for parsing Clarity code into an abstract syntax tree (AST) -and the static analysis of that Clarity code (type-checking and -read-only enforcement). This will not specify the _constants_ -associated with those asymptotic cost functions. Those constants will -necessarily be measured via benchmark harnesses and regression -analyses. - -# Measurements for Execution Cost - -The cost of analyzing Clarity code is measured using the same 5 categories -described in SIP-006 for the measurement of execution costs: - -1. Runtime cost: captures the number of cycles that a single - processor would require to process the Clarity block. This is a - _unitless_ metric, so it will not correspond directly to cycles, - but rather is meant to provide a basis for comparison between - different Clarity code blocks. -2. Data write count: captures the number of independent writes - performed on the underlying data store (see SIP-004). -3. Data read count: captures the number of independent reads - performed on the underlying data store. -4. Data write length: the number of bytes written to the underlying - data store. -5. Data read length: the number of bytes read from the underlying - data store. - -Importantly, these costs are used to set a _block limit_ for each -block. When it comes to selecting transactions for inclusion in a -block, miners are free to make their own choices based on transaction -fees, however, blocks may not exceed the _block limit_. If they do so, -the block is considered invalid by the network --- none of the block's -transactions will be materialized and the leader forfeits all rewards -from the block. - -Costs for static analysis are assessed during the _type check_ pass. -The read-only and trait-checking passes perform work which is strictly -less than the work performed during type checking, and therefore, the -cost assessment can safely fold any costs that would be incurred during -those passes into the type checking pass. - -# Common Analysis Metrics and Costs - -## AST Parsing - -The Clarity parser has a runtime that is linear with respect to the Clarity -program length. - -``` -a*X+b -``` - -where a and b are constants, and - -X := the program length in bytes - -## Dependency cycle detection - -Clarity performs cycle detection for intra-contract dependencies (e.g., -functions that depend on one another). This detection is linear in the -number of dependency edges in the smart contract: - -``` -a*X+b -``` - -where a and b are constants, and -X := the total number of dependency edges in the smart contract - -Dependency edges are created anytime a top-level definition refers -to another top-level definition. - -## Type signature size - -Types in Clarity may be described using type signatures. For example, -`(tuple (a int) (b int))` describes a tuple with two keys `a` and `b` -of type `int`. These type descriptions are used by the Clarity analysis -passes to check the type correctness of Clarity code. Clarity type signatures -have varying size, e.g., the signature `int` is smaller than the signature for a -list of integers. - -The signature size of a Clarity type is defined as follows: - -``` -type_signature_size(x) := - if x = - int => 1 - uint => 1 - bool => 1 - principal => 1 - buffer => 2 - optional => 1 + type_signature_size(entry_type) - response => 1 + type_signature_size(ok_type) + type_signature_size(err_type) - list => 2 + type_signature_size(entry_type) - tuple => 1 + 2*(count(entries)) - + sum(type_signature_size for each entry) - + sum(len(key_name) for each entry) -``` - -## Type annotation - -Each node in a Clarity contract's AST is annotated with the type value -for that node during the type checking analysis pass. - -The runtime cost of type annotation is: - -``` -a + b*X -``` - -where a and b are constants, and X is the type signature size of the -type being annotated. - -## Variable lookup - -Looking up variables during static analysis incurs a non-constant cost -- the stack -depth _and_ the length of the variable name affect this cost. However, -variable names in Clarity have bounded length -- 128 characters. Therefore, -the cost assessed for variable lookups may safely be constant with respect -to name length. - -The stack depth affects the lookup cost because the variable must be -checked for in each context on the stack. - -Cost Function: - -``` -a*X+b*Y+c -``` - -where a, b, and c are constants, -X := stack depth -Y := the type size of the looked up variable - -## Function Lookup - -Looking up a function incurs a constant cost with respect -to name length (for the same reason as variable lookup). However, -because functions may only be defined in the top-level contract -context, stack depth does not affect function lookup. - -Cost Function: - -``` -a*X + b -``` - -where a and b are constants, -X := the sum of the type sizes for the function signature (each argument's type size, as well - as the function's return type) - -## Name Binding - -The cost of binding a name in Clarity -- in either a local or the contract -context is _constant_ with respect to the length of the name, but linear in -the size of the type signature. - -``` -binding_cost = a + b*X -``` - -where a and b are constants, and -X := the size of the bound type signature - -## Type check cost - -The cost of a static type check is _linear_ in the size of the type signature: - -``` -type_check_cost(expected, actual) := - a + b*X -``` - -where a and b are constants, and - -X := `max(type_signature_size(expected), type_signature_size(actual))` - -## Function Application - -Static analysis of a function application in Clarity requires -type checking the function's expected arguments against the -supplied types. - -The cost of applying a function is: - - -``` -a + sum(type_check_cost(expected, actual) for each argument) -``` - -where a is a constant. - -This is also the _entire_ cost of type analysis for most function calls -(e.g., intra-contract function calls, most simple native functions). - -## Iterating the AST - -Static analysis iterates over the entire program's AST in the type checker, -the trait checker, and in the read-only checker. This cost is assessed -as a constant cost for each node visited in the AST during the type -checking pass. - -# Special Function Costs - -Some functions require additional work from the static analysis system. - -## Functions on sequences (e.g., map, filter, fold) - -Functions on sequences need to perform an additional check that the -supplied type is a list or buffer before performing the normal -argument type checking. This cost is assessed as: - -``` -a -``` - -where a is a constant. - -## Functions on options/responses - -Similarly to the functions on sequences, option/response functions -must perform a simple check to see if the supplied input is an option or -response before performing additional argument type checking. This cost is -assessed as: - -``` -a -``` - -## Data functions (ft balance checks, nft lookups, map-get?, ...) - -Static checks on intra-contract data functions do not require database lookups -(unlike the runtime costs of these functions). Rather, these functions -incur normal type lookup (i.e., fetching the type of an NFT, data map, or data var) -and type checking costs. - -## get - -Checking a tuple _get_ requires accessing the tuple's signature -for the specific field. This has runtime cost: - -``` -a*log(N) + b -``` -where a and b are constants, and - -N := the number of fields in the tuple type - -## tuple - -Constructing a tuple requires building the tuple's BTree for -accessing fields. This has runtime cost: - - -``` -a*N*log(N) + b -``` -where a and b are constants, and - -N := the number of fields in the tuple type - -## use-trait - -Importing a trait imposes two kinds of costs on the analysis. -First, the import requires a database read. Second, the imported -trait is included in the static analysis output -- this increases -the total storage usage and write length of the static analysis. - -The costs are defined as: - -``` -read_count = 1 -write_count = 0 -runtime = a*X+b -write_length = c*X+d -read_length = c*X+d -``` - -where a, b, c, and d are constants, and - -X := the total type size of the trait (i.e., the sum of the - type sizes of each function signature). - -## contract-call? - -Checking a contract call requires a database lookup to inspect -the function signature of a prior smart contract. - -The costs are defined as: - -``` -read_count = 1 -read_length = a*X+b -runtime = c*X+d -``` - -where a, b, c, and d are constants, and - -X := the total type size of the function signature - -## let - -Let bindings require the static analysis system to iterate over -each let binding and ensure that they are syntactically correct. - -This imposes a runtime cost: - -``` -a*X + b -``` -where a and b are constants, and - -X := the number of entries in the let binding. +# SIP-008 Clarity Parsing and Analysis Cost Assessment +This document formerly contained SIP-008 before the Stacks 2.0 mainnet launched. +This SIP is now located in the [stacksgov/sips repository](https://github.com/stacksgov/sips/blob/main/sips/sip-008/sip-008-analysis-cost-assessment.md) as part of the [Stacks Community Governance organization](https://github.com/stacksgov).