Proposition of a formalism that versions and describes the blockchain protocol

This is a draft for specifying the Ğ1 grammar in an attempt to describe formally (rather than loads of forum’s posts) the DUniterBlockchainProtocol.

I used perl6 because perl is kind of a universal language: , you’ll notice there is no imported library and yet I have Ğ as a chars as well as grammar parsing… so yes, languages stuff is a core principle here. I kept it “not to perlish” for my own and everyone’s readability: you dont need to learn perl6 to understand this!! I do not intend to implement more than the grammar and perhaps basic validation rules (reason is perl is design to give programmers the ability to shoot themself)… feel free to make whatever you want with it !

Let’s begin : Grammars are among other things simple objects in perl sooo you can do things like that

grammar Ğ1Primitives {
    ...
    token pubkey      { <[123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz]> ** 44..45 }
    token hash        { <[0123456789ABCDEFGHJKLMNPQRSTUVWXYZ]> ** 64..64 }
    token signature   { [\S] ** 87..88  }
    ...
    token cond        { <sig> | <xhx> | <csv> | <cltv> | <or> | <and> }
    rule and          { '(' <cond> '&&' <cond> ')' }
    rule or           { '(' <cond> '||' <cond> ')' }
    token sig         { 'SIG(' <pubkey> ')' }
    ...
}

and a subclass describing the documents of the DUBP:

grammar Ğ1Theory is Ğ1Primitives {
 ....
  rule document     { <peer> | <membership> | <certification> | <revocation> | <identity> | <transaction> }

  rule transaction  { <Versionn> <Type> <Currency> <Blockstamp> <Locktime> <Issuers> <Inputs> <Unlocks> <Outputs> <Signatures> <Comment> }
  rule identity     { <Versionn> 'Type: Identity' <Currency> <Issuer> <UniqueID> <Timestamp> <signature> }
  rule revocation   { <Versionn> <Type> <Currency> <Issuer> <IdtyUniqueID> <IdtyTimestamp> <IdtySignature> <signature>? }
  rule certification{ <Versionn> <Type> <Currency> <Issuer> <IdtyIssuer> <IdtyUniqueID> <IdtyTimestamp> <IdtySignature> <CertTimestamp> <signature>? }
  rule membership   { <Versionn> <Type> <Currency> <Issuer> <Blockk> <Membership> <UserID> <CertTS> <signature>? }
  rule peer         { <Versionn> <Type> <Currency> <PublicKey> <Blockk> <Endpoints> }
....
}

I can have Grammars that default to the theoretical DUBP defined by the protocol.md file and extend it.
For instance I can have a grammar specific to the version 10 of the protocol or … the version 11 for which I or anyone would like to suggest a change. Rather than having mostly human interpretation of of an algorithm in a post, One could propose a formal descrpiption of the protocol .

grammar Ğ10 is Ğ1Theory  {
  rule TOP          { <document> || <.nopanic: "Ğ10 parsing failed">  }
  token version     { 10 }
}
grammar Ğ11 is Ğ1Theory  {
  rule TOP          { <document> || <.nopanic: "Ğ11 parsing failed"> }
  token version     { 11 }
}

Examples: Lets imagine that I find the current grammar not completely safe and want to suggest an improvement to the rest of the developpers

grammar Ğ11Propal is Ğ10  {
    # rule and          { '(' <cond> '&&' <cond> ')' } 
    # is Equivalent to 
    # token and          { '(' <.ws>? <cond> <.ws>? '&&' <.ws>? <cond> <.ws>? ')' }

    token and          { '(' <cond> ' && ' <cond> ')' }
    token or           { '(' <cond> ' || ' <cond> ')' }
}

here I use the grammar (and perl6 token vs rule definition ) to say “no more whitespace between brackets”
it uses inheritance to overwrite the definition of rules ‘and’ and ‘or’
Notice how couple of line only emphasise what has changed (added or overwritten)

I could also use it to let other developpers know the status of my development, lets say my node / client API doesnt implement anything else than SIG for now, I could overwrite the condition token. Or I can describe the fact that my implementation only deal with tx documents.

grammar Ğ1PeuDeRetards is  Ğ1Theory  {
    #token cond { <sig> | <xhx> | <csv> | <cltv> | <or> | <and> }
    
    token cond        { <sig> }

    rule document     {  <transaction> }
}

With such a tool we can keep track in only few lines every protocol changes in a FORMAL way that is roughly 100 lines including comments instead of the nearly 3000 lines of the protocol.md .

My point is not to advertise the awesome language that is perl6, it is to point out that YES WE CAN have a formal definition of the protocol that is AGNOSTIC to languages & tools, perhaps not the most human readable but certainly the shortest way to do it, the most defined one for anyone who would like to implement the protocol. which I strongly encourage.

In complement to that, a very important thing to do is a test set for every versions of the protocol.
Once people agree on at least one grammar definition, they should also match a bunch of unit tests. then they’re life becomes a lot easier because they dont have to read half the forum in order to figure out if a whitespace is allowed. It should reduce the amount of differences between nodes and help developers track the changes.

Once you get there and have realized how powerful that is, please take a minute and think about upgrading the protocol with and without a formal definition…

The full snippet :

I’m not sure you can translate DUP into just a grammar, because some rules (most of them, actually) are discrete, non-continuous rules. If you read the part of “block interpretation” in the protocol, you will have the parts of what I’m talking about.

I suppose it is relative to your definition of grammar:

For clarity we may conceptualize 2 grammars:

  • the one defining what may be parsed and written. used for communication & state: the data
  • the logical grammar used to verify, compute, … : the code

But in the end its one and the same thing if you are recursive enumerable.

tokens are not strings with no scope they are part of your language. Ill try to complete the example to show what I mean

You mean that it depends on state ? Im not sure how you define your term, could you be more specific

like ? doc/Protocol.md · dev · nodes / typescript / duniter · GitLab
Im confused

I meant what you’ve written above: I tought the “grammar” was only a way to describe the data. A protocol of communication, in other words.

You pretend it also exists a “logical grammar” that can be used to verify and compute, the code. That’s interesting. The part of DUP that would be concerned by this grammar is doc/Protocol.md · dev · nodes / typescript / duniter · GitLab

This is what I’ve called “block interpretation”.

It is divded into 2 parts : local and global validation. I’ve tried to make it as much formal as possible (with my own knowledge and skills), but it you know a better way to express these rules I’m interested in knowing such method.