Skip to content

Conversation

jonathanc-n
Copy link
Collaborator

Which issue does this PR close?

Use for discussion for how TonboCloud will work

Copy link

codecov bot commented Jul 29, 2025

Codecov Report

❌ Patch coverage is 0% with 38 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
cloud/src/aws/mod.rs 0.00% 29 Missing ⚠️
cloud/src/metadata.rs 0.00% 6 Missing ⚠️
cloud/src/compaction.rs 0.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@jonathanc-n jonathanc-n marked this pull request as draft July 29, 2025 02:46
@jonathanc-n jonathanc-n requested a review from ethe July 29, 2025 02:46
@jonathanc-n
Copy link
Collaborator Author

@ethe Are you able to take a look at this today? I'll see if I can fix the rest of the code tomorrow. I would just like to see if this is what we would like so far for the postgresql read structure.

@ethe
Copy link
Member

ethe commented Aug 4, 2025

Thank you @jonathanc-n , I'll take a look

@jonathanc-n jonathanc-n marked this pull request as ready for review August 5, 2025 18:47
@jonathanc-n jonathanc-n changed the title POC: Add Tonbo Cloud feat: Add Tonbo Cloud Aug 5, 2025
@jonathanc-n jonathanc-n changed the title feat: Add Tonbo Cloud feat: Add Tonbo Cloud + gRPC + Flight Aug 10, 2025
}

#[tonic::async_trait]
impl FlightService for TonboFlightSvc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Arrow Flight IPC make things simple at this stage, we can move forward on relying Arrow Flight.

Copy link
Member

@ethe ethe Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we introduce Flight SQL in Tonbo Cloud? Arrow Flight can not support clauses pushdown, which is really important to our core intent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a look into fligtht sql


/// Every table has its own tonbo cloud instance.
#[allow(dead_code)]
pub struct AWSTonbo {
Copy link
Member

@ethe ethe Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think about how to support multi Tonbo instances at first:

  • Routing: Add db to Flight Ticket and gRPC requests; route via a DbRegistry<DbId, DB<DynRecord, TokioExecutor>>.
  • Isolation: Separate S3/local paths per DB; enforce per‑DB semaphores and global caps to prevent noisy neighbors.
  • Lifecycle: Lazy‑open on first access; provide idle eviction and clean shutdown hooks.

DB‑Instance LRU

  • Policy: Track last_access, active_count, mem_footprint, reopen_cost; choose victims by LRU score adjusted by footprint and reopen cost; skip pinned DBs.
  • Eviction: Quiesce → drain (active_count==0 or timeout) → flush WAL/minor‑flush → close handles → remove from registry.
  • Soft‑evict tier: Optionally drop heavy resources (FDs/compactor) but keep lightweight metadata to speed reopen.
  • Knobs: max_instances, idle_ttl, min_resident, eviction_batch_size, per‑DB concurrency limits.

Cold Start

  • Checkpoints: Persist manifest_version, latest_ts, active_wal_ids; on reopen, replay only WALs newer than checkpoint.
  • Bounded WAL: Small segments (32–64MB) + time‑based rotation (10–30s) to cap replay time.
  • Fast serve: Load manifest/SSTs immediately for reads; parallel WAL replay; gate writes until replay completes (or route to a fresh WAL/memtable).

// Local Tonbo instance
tonbo: DB<DynRecord, TokioExecutor>,
// Remote file system
s3_fs: Arc<dyn DynFs>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to determine which I/O offload to S3, it might not the duty of this crate, we have to make the strategy clear in Tonbo

s3_fs: Arc<dyn DynFs>,
// Endpoint for read requests (scans)
endpoint: String,
buffered_data: Option<RecordBatch>,
Copy link
Member

@ethe ethe Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be better to build a unified Parquet data block LRU cache under Tonbo instance to replace this?


impl AWSTonbo {
#[cfg(test)]
pub fn tonbo(&self) -> &DB<DynRecord, TokioExecutor> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokioExecutor is annoying, I think maybe we could use default type to omit it.


// todo: Separate the data
// Returns number of rows and row size
async fn parquet_metadata<'a>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the ideal implementation of estimator, I think we need to rethink the design, find a way that never really scan data to compute them.
Row size is derived from the first non-empty Arrow batch’s memory footprint, this is biased (projection-dependent, variable-length columns, one-row projection batches). Also It measures Arrow in-memory usage, not Parquet on-disk size; “ParquetMetadata” naming is misleading.

};

// Converts an iterator over `DynRecord` into record batch
pub fn records_to_record_batch<I>(schema: &DynSchema, rows: I) -> RecordBatch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Tonbo should support it natively.

@@ -89,9 +89,15 @@ pub trait Record: 'static + Sized + Decode + Debug + Send + Sync {
self.as_record_ref().key()
}

/// Returns owned value of record (self)
fn as_owned_value(&self) -> Self;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this cloning semantic?

/// Returns a reference to the record.
fn as_record_ref(&self) -> Self::Ref<'_>;

/// Applies projection mask
fn projection(&mut self, projection_mask: &ProjectionMask);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unifying projection/limit/filter behind a small logical expression is a good idea instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants