BIP 151 use of HMAC_SHA512


To quote:

> HMAC_SHA512(key=ecdh_secret|cipher-type,msg="encryption key").
> 
>  K_1 must be the left 32bytes of the HMAC_SHA512 hash.
>  K_2 must be the right 32bytes of the HMAC_SHA512 hash.

This seems a weak reason to introduce SHA512 to the mix.  Can we just
make:

K_1 = HMAC_SHA256(key=ecdh_secret|cipher-type,msg="header encryption key")
K_2 = HMAC_SHA256(key=ecdh_secret|cipher-type,msg="body encryption key")

Thanks,
Rusty.

parallel token idea & question

token miners who will work to the a new token signal readiness to secure that token by posting a public key to the bitcoin blockchain along with a collateral and possibly a block mined from a side chain, or some other signal proving sufficient participation (allows for non-blockchain tokens).

coin moved to the new token set is sent to a multisig wallet consisting of miners who have signaled readiness, with nlocktime set to some time in the future

coin sits in that wallet - the new token doesn't even have to be a chain, it could be a DAG, or some other mechanism - following whatever rules it pleases

any time, miner of the new system can move coin back to the main chain... trivially and following whatever rules are need.  also, any time a miner fails to follow the rules of the new system, they lose their collateral

any sufficient consortium of miners/participants in the side chain can, of course, steal that coin...but that is true for all sidechains - and to some extent bitcoin - anyway

does this seem too simplistic or weak in some way? 
_______________________________________________
bitcoin-dev mailing list
bitcoin-dev <at> lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Peter Todd via bitcoin-dev | 22 Jun 13:10 2016

Closed Seal Sets and Truth Lists for Better Privacy and Censorship Resistance

At the recent coredev.tech meetup in Zurich I spent much of my time discussing
anti-censorship improvements with Adam Back, building on his idea of blind
symmetric commitments[^bsc], and my own ideas of client-side verification. Our
goal here is to combat censorship by ensuring that miners do not have the
information needed to selectively censor (blacklist) transactions, forcing them
to adopt a whitelist approach of allowed transactions if they choose to censor.

Back's work achieves that by changing the protocol such that users commit to
their transaction in advance, in such a way that the commitment doesn't contain
the information necessary to censor the transaction, although after commitment
all transactional information becomes available. Here we propose a similar
scheme with using "smart contract" state machine tooling, with the potential
for an even better Zerocash-like guarantee that only a subset of data ever
becomes public, without requiring "moon math" of uncertain security.

# The Closed Seal Maps

To implement Single-Use Seals we propose that miners attest to the contents of
a series of key:value maps of true expressions, with the keys being the
expressions, and the values being commitments, which along with (discardable)
witnesses make up the argument to the expression. Once an expression is added
to the closed seal map, the value associated with it can't be changed.

Periodically - perhaps once a year - the most recent map is archived, and the
map is started fresh again. Once archived a closed seal map is never changed.
Miners are expected to keep the contents of the current map, as well as the
most recent closed seal map - the contents of older maps are proven on demand
using techniques similar to TXO commitments.

A single-use seal[^sma] implemented with the closed seal maps is then
identified by the expression and a block height. The seal is open if the
expression does not exist in any closed seal maps between the creation block
height and the most recent block height. A witness to the fact that the seal
has been closed is then a proof that the seal was recorded as closed in one of
the closed seal maps, and (if needed) proof that the seal was still open in any
prior maps between its creation and closing.

Similar to the logic in Bitcoin's segregated witnesses proposal, separating the
commitment and witness arguments to the seal expression ensures that the
witness attesting to the fact that a given seal was closed does not depend on
the exact signature used to actually close it.

Here's a very simple example of such a seal expression, in the author's
Dex[^dex] expression language, for an application that can avoid reusing
pubkeys:

     (checksig <pubkey> <sig> (hash <committed-value>))

This desugars to the following after all named arguments were replaced by
explicit destructuring of the expression argument, denoted by the arg symbol:

    (and <nonce>
         (checksig <pubkey> (cdr arg) (digest (car arg))))

The arguments to the expression are the closed seal map's commitment and
witness, which are our committed value and signature respectively:

    (<committed-value> . <sig>)

## The Truth List

We implement an expression validity oracle by having miners attest to the
validity of a perpetually growing list of true predicate expressions, whose
evaluation can in turn depend on depend on previously attested expressions in
the truth list. SPV clients who trust miners can use the truth list to skip
validation of old history.

Similar to TXO commitments, we expect miners to have a copy of recent entries
in the truth list, perhaps the previous year. Older history can be proven on an
as-needed basis. Unlike TXO commitments, since this is a pure list of valid
expressions, once an item is added to the list it is never modified.

As the truth list can include expressions that reference previously
evaluated expressions, expressions of arbitrary depth can be evaluated. For
example, suppose we have an extremely long linked list of numbers, represented
as the following sexpr:

    (i_n i_n-1 i_n-2 ... i_1 i_0)

We want to check that every number in the list is even:

    (defun all-even? (l)
        (match l
            (nil true)
            ((n . rest) (if (mod n 2)
                            false
                            (all-even? rest)))))

In any real system this will fail for a sufficiently long list, either due to
stack overflow, or (if tail recursion is supported) due to exceeding the
anti-DoS limits on cycles executed in one expression; expressing the above may
even be impossible in expression systems that don't allow unbounded recursion.

A more subtle issue is that in a merkelized expression language, an expression
that calls itself is impossible to directly represent: doing so creates a cycle
in the call graph, which isn't possible without breaking the hash function. So
instead we'll define the special symbol self, which triggers a lookup in the
truth map instead of actually evaluating directly. Now our expression is:

    (defun all-even? (l)
        (match l
            (nil true)
            ((n . rest) (if (mod n 2)
                            false
                            (self rest)))))

We evaluate it in parts, starting with the end of the list. The truth list only
attests to valid expressions - not arguments - so we curry the argument to form
the following expression:

    (all-even? nil)

The second thing that is appended to the truth list is:

    (all-even? (0 . #<digest of "nil">))

Note how we haven't actually provided the cdr of the cons cell - it's been
pruned and replaced by the digest of nil. With an additional bit of metadata -
the index of that expression within the trust list, and possibly a merkle path
to the tip if the expression has been archived - we can show that the
expression has been previously evaluated and is true.

Subsequent expressions follow the same pattern:

    (all-even? (1 . #<digest of "(0)">))

Until finally we reach the last item:

    (all-even? (n_i . #<digest of "(n_i-1 n_i-2 ... 1 0)">))

Now we can show anyone who trusts that the truth list is valid - like a SPV
client - that evaluating all-even? on that list returns true by extracting a
merkle path from that item to the tip of the list's MMR commitment.

# Transactions

When we spend an output our goal is to direct the funds spent to a set of
outputs by irrovocably committing single-use seals to that distribution of
outputs. Equally, to validate an output we must show that sufficient funds have
been directed assigned to it. However, our anti-censorship goals make this
difficult, as we'll often want to reveal some information about where funds
being spend are going immediately - say to pay fees - while delaying when other
information is revealed as long as possible.

To achieve this we generalize the idea of a transaction slightly. Rather than
simply having a set of inputs spent and outputs created, we have a set of
_input splits_ spent, and outputs created. An input split is then a merkle-sum
map of nonces:values that the particular input has been split into; the
transaction commits to a specific nonce within that split, and is only valid if
the seal for that input is closed over a split actually committing to the
transaction.

Secondly, in a transaction with multiple outputs, we don't want it to be
immediately possible to link outputs together as seals associated with them are
closed, even if the transaction ID is known publicly. So we associate each
output with a unique nonce.

Thus we can uniquely identify a specific transaction output - an outpoint - by
the following data (remember that the tx would usually be pruned, leaving just
the digest):

    (struct outpoint
        (tx     :transaction)
        (nonce  :digest))

An transaction output is defined as:

    (struct txout
        (value     :int)    ; value of output
        (nonce     :digest)
        (authexpr  :func))  ; authorization expression

An input:

    (struct txin
        (prevout :outpoint) ; authorization expression
        (split   :digest)   ; split nonce
        (value   :int))     ; claimed value of output spent

And a transaction:

    (struct transaction
        ; fixme: need to define functions to extract sums and keys
        (inputs   :(merkle-sum-map  (:digest :txin))
        (outputs  :(merkle-sum-map  (:digest :txout))
        ; and probably more metadata here)

## Spending Outputs

Our single-use seal associated with a specific output is the expression:

    (<auth expr> <outpoint> . arg)

When the seal is closed it commits to the merkle-sum split map, which is
indexed by split nonces, one per (tx, value) pair committed to.  This means
that in the general case of an spend authorization expression that just checks
a signature, the actual outpoint can be pruned and what actually gets published
in the closed seal set is just:

    (<auth expr> #<digest of <outpoint>> . arg)

Along with the commitment:

    #<digest of split map>

With the relevant data hidden behind opaque digests, protected from
brute-forcing by nonces, external observers have no information about what
transaction output was spent, or anything about the transaction spending that
output. The nonce in the seal commitment prevents that multiple spends for the
same transaction from being linked together.  Yet at the same time, we're still
able to write a special-purpose spend auth expressions that do inspect the
contents of the transaction if needed.

## Validating Transactions

When validating a transaction, we want to validate the least amount of data
possible, allowing the maximum amount of data to be omitted for a given
recipient. Thus when we validate a transaction we _don't_ validate the outputs;
we only validate that the inputs spent by the transaction are valid, and the
sum of (split) inputs spent is correct. We only need to validate outputs when
they're spent - until then an invalid output is of no relevance. We also don't
need to validate any outputs other than the ones we're trying to spend - the
merkle sum tree guarantees that regardless of what's going on with other
outputs, the funds we're spending are uniquely allocated to us.

This means our function to check that a transaction is valid won't check the
outputs of the transaction itself, but will check outputs of previous
transactions:

    (defun valid-tx? (tx)
        (map-reduce tx.inputs
            (lambda (txin)
                (and <input is valid>
                     <witness is valid>
                     <split is valid>
                     (valid-output? txin.prevout)))))

# Censorship Resistant Usage

To make use of the separation between seal closure and validation we need to
pass transaction information from peer to peer. Let's look at what happens when
Alice pays Bob:

1. Alice picks one or more inputs to spend.

2. For each input she constructs a split, paying part of the funds to a
per-input fee transaction with no outputs, and committing part of the funds to
the transaction paying Bob. If she has change left over she'll construct a
third transaction with just that change as an input.

3. She signs each input, creating valid signatures for the corresponding
output's seal's authorization expression.

4. She broadcasts the subset of data corresponding to just the fee paying
transactions and related signatures individually, with a time delay between
each one. All other data is pruned, leaving just opaque digests.

5. Once all inputs are confirmed, she gives Bob the data corresponding to his
transaction, including the relevant parts of the merkle trees, and relevant
closed seal witnesses.

At this point, a whole bunch of seals have been closed, but there's absolutely
nothing on chain that links them together. Now let's suppose Bob pays Charlie,
using the funds Alice gave him, and a different input to pay mining fees:

1. Bob constructs a fee paying transaction, splitting some funds from a
previously revealed output, and depending on the seal for the output Alice gave
him, but without spending any of that output's funds.

2. Bob broadcasts the above publicly. Miners have to add both seals to the
closed seal set to collect the fees.

3. Once confirmed, Bob gives Charlie the corresponding transaction information
for his output, as well as the still-private information it depends on to prove
that the output Alice created for Bob is itself valid.

Again, nearly none of the information related to the transaction is public, yet
the funds have moved twice.

## Pruning Old History

Over time the proofs that a coin is valid will grow as each additional
transaction adds more data. We shorten these proofs by publishing some of the
data in the form of additions to the truth list of valid expressions,
specifically the is-valid-tx? expressions that determine whether or not a
transaction (and prior transactions) are valid. This allows SPV clients who
trust miners to stop validating once they reach that old history.

Secondly, with transaction history linearization[^sma] we can avoid ever
revealing most of the transaction data, greatly improving privacy. Only one
input per transaction needs to be proven, so all data related to other inputs
can be discarded permanently; in practice this will lead to either one or two
public inputs, including the input made public to pay mining fees.

# "Smart Contracts"

Privacy aside, the combination of single-use seal and true expressions list
enables all known "smart contract" applications, such as the ones Ethereum
currently targets. After all, the accounts-based Ethereum architecture can
always be simulated with a series of single-use seal's that explicitly keeps
track of of an account balance based on actions taken.

# Open Questions

1. How does the above architecture interact with scaling proposals, like
sharding? Fraud proofs?

2. How does the statistical inflation protection of transaction history
linearization work in a real economy, e.g. if people use it gamble with their
funds?

3. PoW isn't a perfect random beacon; how do we take that into account when
designing linearization?

4. How do wallets pass proof data between each other, e.g. offline?

5. How do wallets backup proof data? (similar problem that Lightning has)

# References

[^bsc]: "blind symmetric commitment for stronger byzantine voting resilience",
        Adam Back, May 15th 2013, bitcoin-dev mailing list,
        https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2013-May/002600.html

[^sma]: "Building Blocks of the State Machine Approach to Consensus",
        Peter Todd, Jun 20th 2016,
        https://petertodd.org/2016/state-machine-consensus-building-blocks
        https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-June/012773.html

[^dex]: "Dex: Deterministic Predicate Expressions for Smarter Signatures",
        Peter Todd, May 25th 2016,
        https://github.com/WebOfTrustInfo/ID2020DesignWorkshop/blob/master/topics-and-advance-readings/DexPredicatesForSmarterSigs.md

--

-- 
https://petertodd.org 'peter'[:-1] <at> petertodd.org
_______________________________________________
bitcoin-dev mailing list
bitcoin-dev <at> lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Merkel Forrest Partitioning

Hi Akiva,

   I have also given a little thought to partitioning, in a totally different way a Merkel Tree Forrest. Generally the idea here would have be to create new Merkel Trees every so often as currency supply was added. It would partition the mining process and therefore improve the distribution of the verification.

It would work as follows, and NO I haven't really thought this through it's just an idea!


Imagine it was 2009 and there was a small number of 250 BTC in 'Batch 1', once the number of BTC needed to go above 250 BTC two new Batches would be created each one with it's own Merkel Tree until 750 BTC and so on. Eventually there would be a large number of trees, allowing small scale pool miners to dominate a single or small number of the trees and their block chains.

This would also create a potential partial payment problem, where you send 3 BTC but only receive 2 BTC since 1 BTC ends up on a bad block and needs to be resent.


Since most of the BTC currency supply is already available it's a bit late for BitCoin, but could be used for new crypto currencies.


Any thoughts on this idea?


Cheers,

Scott

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev <at> lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Peter Todd via bitcoin-dev | 22 Jun 00:42 2016

Re: Building Blocks of the State Machine Approach to Consensus

On Mon, Jun 20, 2016 at 04:21:39PM +0000, zaki--- via bitcoin-dev wrote:
> Hi Peter,
> 
> I didn't entirely understand the process of transaction linearization.
> 
> What I see is a potential process where when the miner assembles the block,
> he strips all but one sigscript per tx. The selection of which  sigscript
> is retained is determined by the random oracle.  Is this is primary benefit
> you are suggesting?
> 
> It appears to me that blocks still need to contain a list of full TX Input
> and Tx Outputs with your approach. Some of the description seems to
> indicate that there are opportunities to elide further data but it's
> unclear to me how.

I think you've misunderstood what I'm proposing. The state machine approach I
described doesn't necessarily require blocks or even miners to exist at all.
Rather, it assumes that a single-use seal primitive is available, and a random
beacon primitive for tx linearization, and then builds a system on top of those
primitives. Transaction data - the proofs that certain states have been reached
in the system - does not need to be broadcast publicly; if Alice wants to
convince Bob that she has given him money, the only person who needs that
transaction (and transactions prior to it in the tx history) is Bob.

So as to your question about miners assembling blocks, and what blocks contain:
there doesn't need to be blocks at all! Transaction history linearization is
something your wallet would do for you.

--

-- 
https://petertodd.org 'peter'[:-1] <at> petertodd.org
_______________________________________________
bitcoin-dev mailing list
bitcoin-dev <at> lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Geographic Partitioning

I am a long-time developer and I have some experience in process groups. I am going to try to keep this short. If you are interested in pursuing this idea please reply to me privately so we don't put a burden on the list.

As per Satoshi's paper, the blockchain implements a distributed timestamp service. It defeats double-spending by establishing a "total order" on transactions. The "domain" on which the ordering takes place is the entire coin, the money supply. It's obvious to me that total ordering does not scale well as a use case, it's not a matter of implementation details or design. It's the requirement which is a problem. Therefore when I see mention of the many clever schemes proposed to make Bitcoin scalable I already know that by using that proposal we are going to give up something. And in some cases I see lengthy and complex proposals, and just what the user is giving up is not easy to see.

I think that the user has to give up something in order for electronic cash to really scale, and that something has to be non-locality. At the moment Bitcoin doesn't know whether I am buying a laptop from 3,000 miles away or 300. This is a wonderful property, but this property makes it impossible to partition the users geographically. I think that a simple and effective way to do this is to partition the address using a hash. A convention could be adopted whereby there is a well-known partition number for each geographic location. Most users would use third-party clients and the client could generate Bitcoin addresses until it hits one in the user's geographical area.

The partitioning scheme could be hierarchical. For example there could be partitions at the city, state, and country level. A good way to see how this works in real life is shopping at Walmart, which is something like 4,000 stores. Walmart could have users pay local addresses, and then move the money "up" to a regional or country level.

The problem is what to do when an address in partition A wants to pay an address in partition B. This should be done by processing the transaction in partition A first, and once the block is made a hash of that block should be included in some block in partition B. After A has made the block the coin has left A, it cannot be spent. Once B has made its block the coin has "arrived" in B and can be spent. It can be seen that some transactions span a longer distance than others, in that they require two or more blocks. These transactions take longer to execute, and I think that that is entirely okay.

Transaction verification benefits because a small merchant can accept payments from local addresses only. Larger merchants can verify transactions across two or more partitions.

Some will be concerned about 51% attacks on partitions. I would point out that nodes could process transactions at random, so that the majority of the computing power is well-balanced across all partitions.

Regards,
Akiva

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev <at> lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Even more proposed BIP extensions to BIP 0070

BIP 0070 has been a a moderate success, however, IMO:

- protocol buffers are inappropriate since ease of use and extensibility is desired over the minor gains of efficiency in this protocol.  Not too late to support JSON messages as the standard going forward

- problematic reliance on merchant-supplied https (X509) as the sole form of mechant identification.   alternate schemes (dnssec/netki), pgp and possibly keybase seem like good ideas.   personally, i like keybase, since there is no reliance on the existing domain-name system (you can sell with a github id, for example)

- missing an optional client supplied identification

- lack of basic subscription support

Proposed for subscriptions:

- BIP0047 payment codes are recommended instead of wallet addresses when establishing subscriptions.  Or, merchants can specify replacement addresses in ACK/NACK responses.   UI confirms are required when there are no replacement addresses or payment codes used.

- Wallets must confirm and store subscriptions, and are responsible for initiating them at the specified interval.  

- Intervals can only be from a preset list: weekly, biweekly, or 1, 2,3,4,6 or 12 months.   Intervals missed by more than 3 days cause suspension until the user re-verifies.

- Wallets may optionally ask the user whether they want to be notified and confirm every interval - or not.   Wallets that do not ask must notify before initiating each payment.   Interval confirmations should begin at least 1 day in advance of the next payment.

Proposed in general:

- JSON should be used instead of protocol buffers going forward.  Easier to use, explain extend.

- "Extendible" URI-like scheme to support multi-mode identity mechanisms on both payment and subscription requests.   Support for keybase://, netki:// and others as alternates to https://. 

- Support for client as well as merchant multi-mode verification

- Ideally, the identity verification URI scheme is somewhat orthogonal/independent of the payment request itself

Question:

Should this be a new BIP?  I know netki's BIP75 is out there - but I think it's too specific and too reliant on the domain name system.

Maybe an identity-protocol-agnostic BIP + solid implementation of a couple major protocols without any mention of payment URI's ... just a way of sending and receiving identity verified messages in general?

I would be happy to implement plugins for identity protocols, if anyone thinks this is a good idea.

Does anyone think https:// or keybase, or PGP or netki all by themselves, is enough - or is it always better to have an extensible protocol?

- Erik Aronesty
_______________________________________________
bitcoin-dev mailing list
bitcoin-dev <at> lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Peter Todd via bitcoin-dev | 20 Jun 10:56 2016

Building Blocks of the State Machine Approach to Consensus

In light of Ethereum's recent problems with its imperative, account-based,
programming model, I thought I'd do a quick writeup outlining the building
blocks of the state-machine approach to so-called "smart contract" systems, an
extension of Bitcoin's own design that I personally have been developing for a
number of years now as my Proofchains/Dex research work.

# Deterministic Code / Deterministic Expressions

We need to be able to run code on different computers and get identical
results; without this consensus is impossible and we might as well just use a
central authoritative database. Traditional languages and surrounding
frameworks make determinism difficult to achieve, as they tend to be filled
with undefined and underspecified behavior, ranging from signed integer
overflow in C/C++ to non-deterministic behavior in databases. While some
successful systems like Bitcoin are based on such languages, their success is
attributable to heroic efforts by their developers.

Deterministic expression systems such as Bitcoin's scripting system and the
author's Dex project improve on this by allowing expressions to be precisely
specified by hash digest, and executed against an environment with
deterministic results. In the case of Bitcoin's script, the expression is a
Forth-like stack-based program; in Dex the expression takes the form of a
lambda calculus expression.

## Proofs

So far the most common use for deterministic expressions is to specify
conditions upon which funds can be spent, as seen in Bitcoin (particularly
P2SH, and the upcoming Segwit). But we can generalize their use to precisely
defining consensus protocols in terms of state machines, with each state
defined in terms of a deterministic expression that must return true for the
state to have been reached. The data that causes a given expression to return
true is then a "proof", and that proof can be passed from one party to another
to prove desired states in the system have been reached.

An important implication of this model is that we need deterministic, and
efficient, serialization of proof data.

## Pruning

Often the evaluation of an expression against a proof doesn't require all all
data in the proof. For example, to prove to a lite client that a given block
contains a transaction, we only need the merkle path from the transaction to
the block header. Systems like Proofchains and Dex generalize this process -
called "pruning" - with built-in support to both keep track of what data is
accessed by what operations, as well as support in their underlying
serialization schemes for unneeded data to be elided and replaced by the hash
digest of the pruned data.

# Transactions

A common type of state machine is the transaction. A transaction history is a
directed acyclic graph of transactions, with one or more genesis transactions
having no inputs (ancestors), and one or more outputs, and zero or more
non-genesis transactions with one or more inputs, and zero or more outputs. The
edges of the graph connect inputs to outputs, with every input connected to
exactly one output. Outputs with an associated input are known as spent
outputs; outputs with out an associated input are unspent.

Outputs have conditions attached to them (e.g. a pubkey for which a valid
signature must be produced), and may also be associated with other values such
as "# of coins". We consider a transaction valid if we have a set of proofs,
one per input, that satisfy the conditions associated with each output.
Secondly, validity may also require additional constraints to be true, such as
requiring the coins spent to be >= the coins created on the outputs. Input
proofs also must uniquely commit to the transaction itself to be secure - if
they don't the proofs can be reused in a replay attack.

A non-genesis transaction is valid if:

1. Any protocol-specific rules such as coins spent >= coins output are
   followed.

2. For every input a valid proof exists.

3. Every input transaction is itself valid.

A practical implementation of the above for value-transfer systems like Bitcoin
could use two merkle-sum trees, one for the inputs, and one for the outputs,
with inputs simply committing to the previous transaction's txid and output #
(outpoint), and outputs committing to a scriptPubKey and output amount.
Witnesses can be provided separately, and would sign a signature committing to
the transaction or optionally, a subset of of inputs and/or outputs (with
merkle trees we can easily avoid the exponential signature validation problems
bitcoin currently has).

As so long as all genesis transactions are unique, and our hash function is
secure, all transaction outputs can be uniquely identified (prior to BIP34 the
Bitcoin protocol actually failed at this!).

## Proof Distribution

How does Alice convince Bob that she has done a transaction that puts the
system into the state that Bob wanted? The obvious answer is she gives Bob data
proving that the system is now in the desired state; in a transactional system
that proof is some or all of the transaction history. Systems like Bitcoin
provide a generic flood-fill messaging layer where all participants have the
opportunity to get a copy of all proofs in the system, however we can also
implement more fine grained solutions based on peer-to-peer message passing -
one could imagine Alice proving to Bob that she transferred title to her house
to him by giving him a series of proofs, not unlike the same way that property
title transfer can be demonstrated by providing the buyer with a series of deed
documents (though note the double-spend problem!).

# Uniqueness and Single-Use Seals

In addition to knowing that a given transaction history is valid, we also want
to know if it's unique. By that we mean that every spent output in the
transaction history is associated with exactly one input, and no other valid
spends exist; we want to ensure no output has been double-spent.

Bitcoin (and pretty much every other cryptocurrency like it) achieves this goal
by defining a method of achieving consensus over the set of all (valid)
transactions, and then defining that consensus as valid if and only if no
output is spent more than once.

A more general approach is to introduce the idea of a cryptographic Single-Use
Seal, analogous to the tamper-evidence single-use seals commonly used for
protecting goods during shipment and storage. Each individual seals is
associated with a globally unique identifier, and has two states, open and
closed. A secure seal can be closed exactly once, producing a proof that the
seal was closed.

All practical single-use seals will be associated with some kind of condition,
such as a pubkey, or deterministic expression, that needs to be satisfied for
the seal to be closed. Secondly, the contents of the proof will be able to
commit to new data, such as the transaction spending the output associated with
the seal.

Additionally some implementations of single-use seals may be able to also
generate a proof that a seal was _not_ closed as of a certain
time/block-height/etc.

## Implementations

### Transactional Blockchains

A transaction output on a system like Bitcoin can be used as a single-use seal.
In this implementation, the outpoint (txid:vout #) is the seal's identifier,
the authorization mechanism is the scriptPubKey of the output, and the proof
is the transaction spending the output. The proof can commit to additional
data as needed in a variety of ways, such as an OP_RETURN output, or
unspendable output.

This implementation approach is resistant to miner censorship if the seal's
identifier isn't made public, and the protocol (optionally) allows for the
proof transaction to commit to the sealed contents with unspendable outputs;
unspendable outputs can't be distinguished from transactions that move funds.

### Unbounded Oracles

A trusted oracle P can maintain a set of closed seals, and produce signed
messages attesting to the fact that a seal was closed. Specifically, the seal
is identified by the tuple (P, q), with q being the per-seal authorization
expression that must be satisfied for the seal to be closed. The first time the
oracle is given a valid signature for the seal, it adds that signature and seal
ID to its closed seal set, and makes available a signed message attesting to
the fact that the seal has been closed. The proof is that message (and
possibly the signature, or a second message signed by it).

The oracle can publish the set of all closed seals for transparency/auditing
purposes. A good way to do this is to make a merkelized key:value set, with the
seal identifiers as keys, and the value being the proofs, and in turn create a
signed certificate transparency log of that set over time. Merkle-paths from
this log can also serve as the closed seal proof, and for that matter, as
proof of the fact that a seal has not been closed.

### Bounded Oracles

The above has the problem of unbounded storage requirements as the closed seal
set grows without bound. We can fix that problem by requiring users of the
oracle to allocate seals in advance, analogous to the UTXO set in Bitcoin.

To allocate a seal the user provides the oracle P with the authorization
expression q. The oracle then generates a nonce n and adds (q,n) to the set of
unclosed seals, and tells the user that nonce. The seal is then uniquely
identified by (P, q, n)

To close a seal, the user provides the oracle with a valid signature over (P,
q, n). If the open seal set contains that seal, the seal is removed from the
set and the oracle provides the user with a signed message attesting to the
valid close.

A practical implementation would be to have the oracle publish a transparency
log, with each entry in the log committing to the set of all open seals with a
merkle set, as well as any seals closed during that entry. Again, merkle paths
for this log can serve as proofs to the open or closed state of a seal.

Note how with (U)TXO commitments, Bitcoin itself is a bounded oracle
implementation that can produce compact proofs.

### Group Seals

Multiple seals can be combined into one, by having the open seal commit to a
set of sub-seals, and then closing the seal over a second set of closed seal
proofs. Seals that didn't need to be closed can be closed over a special
re-delegation message, re-delegating the seal to a new open seal.

Since the closed sub-seal proof can additionally include a proof of
authorization, we have a protcol where the entity with authorization to close
the master seal has the ability to DoS attack sub-seals owners, but not the
ability to fraudulently close the seals over contents of their choosing. This
may be useful in cases where actions on the master seal is expensive - such as
seals implemented on top of decentralized blockchains - by amortising the cost
over all sub-seals.

## Atomicity

Often protocols will require multiple seals to be closed for a transaction to
be valid. If a single entity controls all seals, this is no problem: the
transaction simply isn't valid until the last seal is closed.

However if multiple parties control the seals, a party could attack another
party by failing to go through with the transaction, after another party has
closed their seal, leaving the victim with an invalid transaction that they
can't reverse.

We have a few options to resolve this problem:

### Use a single oracle

The oracle can additionally guarantee that a seal will be closed iff some other
set of seals are also closed; seals implemented with Bitcoin can provide this
guarantee. If the parties to a transaction aren't already all on the same
oracle, they can add an additional transaction reassigning their outputs to a
common oracle.

Equally, a temporary consensus between multiple mutually trusting oracles can
be created with a consensus protocol they share; this option doesn't need to
change the proof verification implementation.

### Two-phase Timeouts

If a proof to the fact that a seal is open can be generated, even under
adversarial conditions, we can make the seal protocol allow a close to be
undone after a timeout if evidence can be provided that the other seal(s) were
not also closed (in the specified way).

Depending on the implementation - especially in decentralized systems - the
next time the seal is closed, the proof it has been closed may in turn provide
proof that a previous close was in fact invalid.

# Proof-of-Publication and Proof-of-Non-Publication

Often we need to be able to prove that a specified audience was able to receive
a specific message. For example, the author's PayPub protocol[^paypub],
Todd/Taaki's timelock encryption protocol[^timelock], Zero-Knowledge Contingent
Payments[^zkcp], and Lightning, among others work by requiring a secret key to
be published publicly in the Bitcoin blockchain as a condition of collecting a
payment. At a much smaller scale - in terms of audience - in certain FinTech
applications for regulated environments a transaction may be considered invalid
unless it was provably published to a regulatory agency.  Another example is
Certificate Transparency, where we consider a SSL certificate to be invalid
unless it has been provably published to a transparency log maintained by a
third-party.

Secondly, many proof-of-publication schemes also can prove that a message was
_not_ published to a specific audience. With this type of proof single-use
seals can be implemented, by having the proof consist of proof that a specified
message was not published between the time the seal was created, and the time
it was closed (a proof-of-publication of the message).

## Implementations

### Decentralized Blockchains

Here the audience is all participants in the system. However miner censorship
can be a problem, and compact proofs of non-publication aren't yet available
(requires (U)TXO commitments).

The authors treechains proposal is a particularly generic and scalable
implementation, with the ability to make trade offs between the size of
audience (security) and publication cost.

### Centralized Public Logs

Certificate Transparency works this way, with trusted (but auditable) logs run
by well known parties acting as the publication medium, who promise to allow
anyone to obtain copies of the logs.

The logs themselves may be indexed in a variety of ways; CT simply indexes logs
by time, however more efficient schemes are possible by having the operator
commit to a key:value mapping of "topics", to allow publication (and
non-publication) proofs to be created for specified topics or topic prefixes.

Auditing the logs is done by verifying that queries to the state of the log
return the same state at the same time for different requesters.

### Receipt Oracles

Finally publication can be proven by a receipt proof by the oracle, attesting
to the fact that the oracle has successfully received the message. This is
particularly appropriate in cases where the required audience is the oracle
itself, as in the FinTech regulator case.

# Validity Oracles

As transaction histories grow longer, they may become impractical to move from
one party to another. Validity oracles can solve this problem by attesting to
the validity of transactions, allowing history prior to the attested
transactions to be discarded.

A particularly generic validity oracle can be created using deterministic
expressions systems. The user gives the oracle an expression, and the oracle
returns a signed message attesting to the validity of the expression.
Optionally, the expression may be incomplete, with parts of the expression
replaced by previously generated attestations. For example, an expression that
returns true if a transaction is valid could in turn depend on the previous
transaction also being valid - a recursive call of itself - and that recursive
call can be proven with a prior attestation.

## Implementations

### Proof-of-Work Decentralized Consensus

Miners in decentralized consensus systems act as a type of validity oracle, in
that the economic incentives in the system are (supposed to be) designed to
encourage only the mining of valid blocks; a user who trusts the majority of
hashing power can trust that any transaction with a valid merkle path to a
block header in the most-work chain is valid. Existing decentralized consensus
systems like Bitcoin and Ethereum conflate the roles of validity oracle and
single-use seal/anti-replay oracle, however in principle that need not be true.

### Trusted Oracles

As the name suggests. Remote-attestation-capable trusted hardware is a
particularly powerful implementation - a conspiracy theory is that the reason
why essentially zero secure true remote attestation implementations exist is
because they'd immediately make untraceable digital currency systems easy to
implement (Finney's RPOW[^rpow] is a rare counter-example).

Note how a single-use seal oracle that supports a generic deterministic
expressions scheme for seal authorization can be easily extended to provide a
validity oracle service as well. The auditing mechanisms for a single-use seal
oracle can also be applied to validity oracles.

# Fraud Proofs

Protocols specified with deterministic expressions can easily generate "fraud
proofs", showing that claimed states/proof in the system are actually invalid.
Additionally many protocols can be specified with expressions of k*log2(n)
depth, allowing these fraud proofs to be compact.

A simple example is proving fraud in merkle-sum tree, where the validity
expression would be something like:

    (defun valid? (node)
        (or (== node.type leaf)
            (and (== node.sum (+ node.left.sum node.right.sum))
                 (and (valid? node.left)
                      (valid? node.right)))))

To prove the above expression evaluates to true, we'll need the entire contents
of the tree. However, to prove that it evaluates to false, we only need a
subset of the tree as proving an and expression evaluates to false only
requires one side, and requires log2(n) data. Secondly, with pruning, the
deterministic expressions evaluator can automatically keep track of exactly
what data was needed to prove that result, and prune all other data when
serializing the proof.

## Validity Challenges

However how do you guarantee it will be possible to prove fraud in the first
place? If pruning is allowed, you may simply not have access to the data
proving fraud - an especially severe problem in transactional systems where a
single fraudulent transaction can counterfeit arbitrary amounts of value out of
thin air.

A possible approach is the validity challenge: a subset of proof data, with
part of the data marked as "potentially fraudulent". The challenge can be
satisfied by providing the marked data and showing that the proof in question
is in fact valid; if the challenge is unmet participants in the system can
choose to take action, such as refusing to accept additional transactions.

Of course, this raises a whole host of so-far unsolved issues, such as DoS
attacks and lost data.

# Probabilistic Validation

Protocols that can tolerate some fraud can make use of probabilistic
verification techniques to prove that the percentage of undetected fraud within
the system is less than a certain amount, with a specified probability.

A common way to do this is the Fiat-Shamir transform, which repeatedly samples
a data structure deterministically, using the data's own hash digest as a seed
for a PRNG. Let's apply this technique to our merkle-sum tree example. We'll
first need a recursive function to check a sample, weighted by value:

    (defun prefix-valid? (node nonce)
        (or (== node.type leaf)
            (and (and (== node.sum (+ node.left.sum node.right.sum))
                      (> 0 node.sum)) ; mod by 0 is invalid, just like division by zero
                                      ; also could guarantee this with a type system
                 (and (if (< node.left.sum (mod nonce node.sum))
                          (prefix-valid? node.right (hash nonce))
                          (prefix-valid? node.left (hash nonce)))))))

Now we can combine multiple invocations of the above, in this case 256
invocations:

    (defun prob-valid? (node)
        (and (and (and .... (prefix-valid? node (digest (cons (digest node) 0)))
             (and (and ....
                            (prefix-valid? node (digest (cons (digest node) 255)))

As an exercise for a reader: generalize the above with a macro, or a suitable
types/generics system.

If we assume our attacker can grind up to 128 bits, that leaves us with 128
random samples that they can't control. If the (value weighted) probability of
a given node is fraudulent q, then the chance of the attacker getting away with
fraud is (1-q)^128 - for q=5% that works out to 0.1%

(Note that the above analysis isn't particularly well done - do a better
analysis before implementing this in production!)

## Random Beacons and Transaction History Linearization

The Fiat-Shamir transform requires a significant number of samples to defeat
grinding attacks; if we have a random beacon available we can significantly
reduce the size of our probabilistic proofs. PoW blockchains can themselves act
as random beacons, as it is provably expensive for miners to manipulate the
hash digests of blocks they produce - to do so requires discarding otherwise
valid blocks.

An example where this capability is essential is the author's transaction
history linearization technique. In value transfer systems such as Bitcoin, the
history of any given coin grows quasi-exponentially as coins are mixed across
the entire economy. We can linearize the growth of history proofs by redefining
coin validity to be probabilistic.

Suppose we have a transaction with n inputs. Of those inputs, the total value
of real inputs is p, and the total claimed value of fake inputs is q. The
transaction commits to all inputs in a merkle sum tree, and we define the
transaction as valid if a randomly chosen input - weighted by value - can
itself be proven valid. Finally, we assume that creating a genuine input is a
irrevocable action which irrevocable commits to the set of all inputs, real and
fake.

If all inputs are real, 100% of the time the transaction will be valid; if all
inputs are fake, 100% of the time the transaction will be invalid. In the case
where some inputs are real and some are fake the probability that the fraud
will be detected is:

    q / (q + p)

The expected value of the fake inputs is then the sum of the potential upside -
the fraud goes detected - and the potential downside - the fraud is detected
and the real inputs are destroyed:

    E = q(1 - q/(q + p)) - p(q/(q + p)
      = q(p/(q + p)) - p(q/(q + p)
      = (q - q)(p/(q + p))
      = 0

Thus so long as the random beacon is truly unpredictable, there's no economic
advantage to creating fake inputs, and it is sufficient for validity to only
require one input to be proven, giving us O(n) scaling for transaction history
proofs.

### Inflationary O(1) History Proofs

We can further improve our transaction history proof scalability by taking
advantage of inflation. We do this by occasionally allowing a transaction proof
to be considered valid without validating _any_ of the inputs; every time a
transaction is allowed without proving any inputs the size of the transaction
history proof is reset. Of course, this can be a source of inflation, but
provided the probability of this happening can be limited we can limit the
maximum rate of inflation to the chosen value.

For example, in Bitcoin as of writing every block inflates the currency supply
by 25BTC, and contains a maximum of 1MB of transaction data, 0.025BTC/KB. If we
check the prior input proof with probability p, then the expected value of a
transaction claiming to spend x BTC is:

    E = x(1-p)

We can rewrite that in terms of the block reward per-byte R, and the transaction size l:

    lR = x(1-p)

And solving for p:

    p = 1 - lR/x

For example, for a 1KB transaction proof claiming to spending 10BTC we can omit
checking the input 0.25% of the time without allowing more monetary inflation
than the block reward already does. Secondly, this means that after n
transactions, the probability that proof shortening will _not_ happen is p^n,
which reaches 1% after 1840 transactions.

In a system like Bitcoin where miners are expected to validate, a transaction
proof could consist of just a single merkle path showing that a single-use seal
was closed in some kind of TXO commitment - probably under 10KB of data. That
gives us a history proof less than 18.4MB in size, 99% of the time, and less
than 9.2MB in size 90% of the time.

An interesting outcome of thing kind of design is that we can institutionalize
inflation fraud: the entire block reward can be replaced by miners rolling the
dice, attempting to create valid "fake" transactions. However, such a pure
implementation would put a floor on the lowest transaction fee possible, so
better to allow both transaction fee and subsidy collection at the same time.

# References

[^paypub] https://github.com/unsystem/paypub
[^timelock] https://github.com/petertodd/timelock
[^zkcp] https://bitcoincore.org/en/2016/02/26/zero-knowledge-contingent-payments-announcement/
[^rpow] https://cryptome.org/rpow.htm

--

-- 
https://petertodd.org 'peter'[:-1] <at> petertodd.org
_______________________________________________
bitcoin-dev mailing list
bitcoin-dev <at> lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Bram Cohen via bitcoin-dev | 15 Jun 02:14 2016

Merkle trees and mountain ranges

This is in response to Peter Todd's proposal for Merkle Mountain Range commitments in blocks:


I'm in strong agreement that there's a compelling need to put UTXO commitments in blocks, and that the big barrier to getting it done is performance, particularly latency. But I have strong disagreements (or perhaps the right word is skepticism) about the details.

Peter proposes that there should be both UTXO and STXO commitments, and they should be based on Merkle Mountain Ranges based on Patricia Tries. My first big disagreement is about the need for STXO commitments. I think they're unnecessary and a performance problem. The STXO set is much larger than the utxo set and requires much more memory and horespower to maintain. Most if not all of its functionality can in practice be done using the utxo set. Almost anything accepting proofs of inclusion and exclusion will have a complete history of block headers, so to prove inclusion in the stxo set you can use a utxo proof of inclusion in the past and a proof of exclusion for the most recent block. In the case of a txo which has never been included at all, it's generally possible to show that an ancestor of the txo in question was at one point included but that an incompatible descendant of it (or the ancestor itself) is part of the current utxo set. Generating these sorts of proofs efficiently can for some applications require a complete STXO set, but that can done with a non-merkle set, getting the vastly better performance of an ordinary non-cryptographic hashtable.

The fundamental approach to handling the latency problem is to have the utxo commitments trail a bit. Computing utxo commitments takes a certain amount of time, too much to hold up block propagation but still hopefully vastly less than the average amount of time between blocks. Trailing by a single block is probably a bad idea because you sometimes get blocks back to back, but you never get blocks back to back to back to back. Having the utxo set be trailing by a fixed amount - five blocks is probably excessive - would do a perfectly good job of keeping latency from every becoming an issue. Smaller commitments for the utxos added and removed in each block alone could be added without any significant performance penalty. That way all blocks would have sufficient commitments for a completely up to date proofs of inclusion and exclusion. This is not a controversial approach.

Now I'm going to go out on a limb. My thesis is that usage of a mountain range is unnecessary, and that a merkle tree in the raw can be made serviceable by sprinkling magic pixie dust on the performance problem.

There are two causes of performance problems for merkle trees: hashing operations and memory cache misses. For hashing functions, the difference between a mountain range and a straight merkle tree is roughly that in a mountain range there's one operation for each new update times the number of times that thing will get merged into larger hills. If there are fewer levels of hills the number of operations is less but the expense of proof of inclusion will be larger. For raw merkle trees the number of operations per thing added is the log base 2 of the number of levels in the tree, minus the log base 2 of the number of things added at once since you can do lazy evaluation. For practical Bitcoin there are (very roughly) a million things stored, or 20 levels, and there are (even more roughly) about a thousand things stored per block, so each thing forces about 20 - 10 = 10 operations. If you follow the fairly reasonable guideline of mountain range hills go up by factors of four, you instead have 20/2 = 10 operations per thing added amortized. Depending on details this comparison can go either way but it's roughly a wash and the complexity of a mountain range is clearly not worth it at least from the point of view of CPU costs.

But CPU costs aren't the main performance problem in merkle trees. The biggest issues is cache misses, specifically l1 and l2 cache misses. These tend to take a long time to do, resulting in the CPU spending most of its time sitting around doing nothing. A naive tree implementation is pretty much the worst thing you can possibly build from a cache miss standpoint, and its performance will be completely unacceptable. Mountain ranges do a fabulous job of fixing this problem, because all their updates are merges so the metrics are more like cache misses per block instead of cache misses per transaction.

The magic pixie dust I mentioned earlier involves a bunch of subtle implementation details to keep cache coherence down which should get the number of cache misses per transaction down under one, at which point it probably isn't a bottleneck any more. There is an implementation in the works here:


This implementation isn't finished yet! I'm almost there, and I'm definitely feeling time pressure now. I've spent quite a lot of time on this, mostly because of a bunch of technical reworkings which proved necessary. This is the last time I ever write a database for kicks. But this implementation is good on all important dimensions, including:

Lazy root calculation
Few l1 and l2 cache misses
Small proofs of inclusion/exclusion
Reasonably simple implementation
Reasonably efficient in memory
Reasonable defense against malicious insertion attacks

There is a bit of a false dichotomy with the mountain range approach. Mountain ranges need underlying merkle trees, and mine are semantically nearly identically to Peter's. This is not a coincidence - I adopted patricia tries at his suggestion. There are a bunch of small changes which allow a more efficient implementation. I believe that my underlying merkle tree is unambiguously superior in every way, but the question of whether a mountain range is worth it is one which can only be answered empirically, and that requires a bunch of implementation work to be done, starting with me finishing my merkle tree implementation and then somebody porting it to C and optimizing it. The Python version has details which are ridiculous and only make sense once it gets ported, and even under the best of conditions Python performance is not strongly indicative of C performance.

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev <at> lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

RFC for BIP: Derivation scheme for P2WPKH-nested-in-P2SH based accounts

Hi List,

Following up to the discussion last month (
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012695.html ), ive prepared
a proposal for a BIP here:
	
	https://github.com/DanielWeigl/bips/blob/master/bip-p2sh-accounts.mediawiki

Any comments on it? Does anyone working on a BIP44 compliant wallet implement something different?
If there are no objection, id also like to request a number for it.

Thx,
Daniel
Alfie John via bitcoin-dev | 9 Jun 01:47 2016

BIP 151 MITM

Hi folks,

Overall I think BIP 151 is a good idea. However unless I'm mistaken, what's to
prevent someone between peers to suppress the initial 'encinit' message during
negotiation, causing both to fallback to plaintext?

Peers should negotiate a secure channel from the outset or backout entirely
with no option of falling back. This can be indicated loudly by the daemon
listening on an entirely new port.

Alfie

--

-- 
Alfie John
https://www.alfie.wtf

Gmane