Skip to content

WIP: More protocol flavors #355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions docs/flavors
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Existing and proposed flavors of Leios.

> ![Note]
> This is currently just a brain dump on understanding the status quo and the Leios flavors encounterd by @ch1bo. Maybe some of these names and summarirs are helpful to others.

## Status quo / Praos

Lets look at the status quo consensus protocol of Cardano - Ouroboros Praos - and how it is currently deployed. This section can be also seen as my assumptions on how the current system works. See also [Pi's blog post](https://314pool-v2-git-pi-leios-pi-lanninghams-projects.vercel.app/post/leios#review-of-ouroboros-praos):

- Every network participant can submit transactions, which diffused across the whole network
- Each node validates all received transactions against its latest ledger state built from the current longest Praos chain of blocks
- Nodes pull transactions from their peers, potentially sampling across them
- However, no punishment for "invalid" transactions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about that? I thought nodes disconnect from a peer that shares invalid transactions after some threshold.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I asked in our internal AMA and either @nfrisby or @coot said that we don't punish right now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right; the only punishment we'll do in a new tx submission logic, is that we will deprioritise downloading txs from a peer that offered us an invalid tx.


- Block production: A stake based VRF lottery decides which (stake pool) node will mint the next block
- Tuned to one active slot every ~20 seconds
- Probabilistic, so multiple blocks per slot or in short sequence possible
- Honest nodes: put as many txs from their already ordered mempool into a block
- Adversary nodes: may fill a block with txs unknown to the network

- Blocks need to reach the next block producer as fast as possible
- Currently takes about 3 seconds; Target is < 5 seconds
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it takes < 1s for 99.5% blocks; < 3s for 99.8% blocks.

- The lower, the less block height battles / chain re-orgs there are
- To get 3 seconds end-to-end network diffusion we only have fractions of a second to forward (= download, validate and upload) blocks

## Vanilla: Full Leios
- As published in paper TODO link
- 7 stage pipeline with two EB rounds (?)
- Optimal use of available bandwidth
- TODO ...

## Chocolate: Short Leios
- 5 stages with only one round of EB (?)
- TODO ... + link short leios in docs/

## Proposal 1 - Pistachio: IBs not contain txs
- Only reference txs in IBs
- Tx diffusion happening already for praos
- Should reuse already transmitted data
Copy link

@berewt berewt May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One doubt I have about it: what happen if a malicious actor spam the mempool?
The in the vanilla and chocolate version, the node can easily prioritize IBs to get access to the txs it needs to vote. We would need to find a way to prioritize the right txs in the mempool as well. It sounds non-trivial, but I didn't think to much about it, so I may miss an obvious solution to the problem.
(you could prioritize the retrieval of the references txs, of course, but could it introduce some critical latency?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might answer your question: #341 (reply in thread)

Basically, if you allow transactions or permissionless IBs you do indeed get a spam problem. Maybe there are clever solutions, but obvious ones clash with the need for data availablility.

Copy link
Member Author

@ch1bo ch1bo May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One doubt I have about it: what happen if a malicious actor spam the mempool?

@ both: How would such a "spam attack" actually work?

A malicious actor can be an upstream peer to parts of the network and provide them with valid transactions. But that is just business as usual?

The in the vanilla and chocolate version, the node can easily prioritize IBs to get access to the txs it needs to vote.

No it can't? If it's time to vote it can either validate the EB's sequence of IBs or it doesn't have it. If it the data is not available -> no vote.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is the following:
With chocolate and vanilla, you can prioritise the IB propagation over mempool tx propagation. With the other flavours, once you get the IB, you need to rush to resolve the txId of the IB to be able to vote for it.
So if the mempool are spammed, the propagation of the "right" txs can be impacted and few IBs may be successfully voted.

Copy link

@berewt berewt May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's really an SLA question: IBs are a way to prioritize network propagation of some txs. Without IBs, this lack of prioritisation can compromise the propagation. As you said, the result is "data is not available -> no vote" but if it means that we now need to consider a worst case scenario where you need to communicate the IB to most nodes first, and then resolve the txs within this IB, while vanilla and chocolate only need to consider the latency needed to communicate a (bigger but standalone) IB. If my understanding is right, it means that it will slow down the protocol to get high confidence that we'll have the tx on time.

(sorry for the spam, I'm trying to clarify this for myself as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the prioritisation aspect. Not why it would make sense and not how you would do that. @coot asked last week about directionality of the mini protocols and AFAIU are tx diffusion and block propagation in opposed directions today and seemingly nobody has thought about that for Leios yet. That means, a single node cannot decide whether txs or blocks are prioritized - neither for them, nor for their peers - because it can only pull transactions or blocks from one direction.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant prioritisation in the sense that with vanilla and chocolate, once you have the IB, you also have the tx. It avoid the latency issue that occurs with pistachio (latency meaning: you first get the IB, then need to resolve the tx). It's a bit unclear to me with stracciatella, but I'm still doubtful we could ensure a good diffusion without a dedicated transmission.
It could, if needed, led to a parametrisation where you restrict the bandwidth you allocate to tx diffusion (by limiting the number of transactions you asked at each call to the tx-submission mini-protocol).
My understanding is that this distinction between diffusion without any time constraints (as with tx diffusion) and the diffusion with a strong time constraint is what led to the distinction.
AFAIU, it's fairly for the same reasons that we don't rely on tx-submission to just refers to txs in Praos block, and we include them directly in the block instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the most general "easy" attack you can do: You make p*l outputs on the chain, where p is the amount of transactions you want to do in parallel at any given time and l is how long you want to run the attack in Leios stage lengths. Also, let N be the number of block producing nodes (specifically EB, RB and vote producers).

In stage s, you send out N*p different valid transactions:

  • the first N spend output s*p of the p*l outputs and one is submitted to each block producer
  • the second N spend output s*p+1 and again, one is submitted to each block producer
  • etc.
    So each stage, every node receives p transactions it needs to deliver to other nodes, but it also has (N-1)*p transactions it needs to fetch from other nodes to be able to make EBs and vote.

In an ideal network that forms a complete graph, that means that every node needs to do N*p network IO operations, which is the same as you need to do. So everyone with a weaker network connection wouldn't be able to keep up. However, in a p2p network things are going to be worse, because not only do you need to obtain the transactions you haven't seen, you also may need to relay (N-1)*p transactions to multiple of your peers. So a more central node would be hit harder by this attack. And finally this doesn't consider things like loss and timeouts which are just going to make this a lot worse. E.g. if node A asks node B for a transaction, and node B is so spammed by requests that it takes a while, node A may ask node B for the same transaction, causing it to be delivered twice.

I have no statistics and I'm not network engineer, but my guess would be that with a network connection that's as good as 90% of the block producers I'm pretty sure you can just spam the network into oblivion.

And crucially, this attack is relatively cheap. You just need to pay for p transactions per stage because of all the conflicts. There might also be secondary effects you can use while executing this attack to lower this price depending on implementation details. For example if the network is already degraded it might be sufficient to continue the attack with a lower p because the network generates so many duplicated messages already that you don't have to introduce as many new ones.

Let's say N=2000 and you want to saturate a 100 mbit internet connection then that's 50kb/s per node which means roughly one max-size transaction (16*8 kbits) every 2.5s. With a stage length of 10s that gives you p=4. A max-size transaction has fees < 1 ada, so this attack costs you < 4/10 ada/s or a neat 34560 ada per day. I'm not guaranteeing that I didn't make a mistake in the calculation, but if it's correct then this is nothing for a big player who wants to short ada.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So each stage, every node receives p transactions it needs to deliver to other nodes, but it also has (N-1)*p transactions it needs to fetch from other nodes to be able to make EBs and vote.

I'm fairly convinced that this attack is not mountable for two reasons

  • any node would only forward one of the N conflicting transactions (whether it is over p independent outputs is irrelevant) because only one of them would be seen as non-conflicting in a single nodes mempool
  • the attacker will not consistently be "the first" to provide a conflicting tx as each node pulls transactions from all its peers and very well adopt a consistent view of which of the N maliciously created txs is to be included

The network layer is crucial in this and this is clearly a defense-in-depth scenario where we rely on the network protocol to not such an asymmetric resource attack (create a lot of work from little work).

- Same tunable IB rate as in other flavors
- Still 5 stage pipeline which starts with IB propose

## Proposal 2 - Stracciatella: No IBs
- Replace IB propose stage with regular Praos-like tx diffusion
- Effectively removes a stage?
- Lower individual tx latency possible?
- Restrict age of txs (as with IBs)?
- EBs reference txs directly
- Honest EB producers just pick a "likely" set of transactions everyone saw as if they would do for IBs
- Schedule multiple EBs per round like in other flavors
- Maybe also sample EB sizes to increase chance of at least one EB certified?
- Like Pi's 0.2, but with EBs onto transactions directly: https://314pool-v2-git-pi-leios-pi-lanninghams-projects.vercel.app/post/leios#leios-02---design

#### Open points
- Why are we often jumping onto the need for IBs?
- Am I missing something?
- Does this avoid the conflicting tx problem as only one would end up in a certified EB?
- Chain EBs like in vanilla Leios or just vote on longer prefixes?