Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 140 additions & 0 deletions rust/cubestore/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Repository Overview

CubeStore is the Rust-based distributed OLAP storage engine for Cube.js, designed to store and serve pre-aggregations at scale. It's part of the larger Cube.js monorepo and serves as the materialized cache store for rollup tables.

## Architecture Overview

### Core Components

The codebase is organized as a Rust workspace with multiple crates:

- **`cubestore`**: Main CubeStore implementation with distributed storage, query execution, and API interfaces
- **`cubestore-sql-tests`**: SQL compatibility test suite and benchmarks
- **`cubehll`**: HyperLogLog implementation for approximate distinct counting
- **`cubedatasketches`**: DataSketches integration for advanced approximate algorithms
- **`cubezetasketch`**: Theta Sketch implementation for set operations
- **`cuberpc`**: RPC layer for distributed communication
- **`cuberockstore`**: RocksDB wrapper and storage abstraction

### Key Modules in `cubestore/src/`

- **`metastore/`**: Metadata management, table schemas, partitioning, and distributed coordination
- **`queryplanner/`**: Query planning, optimization, and physical execution planning using DataFusion
- **`store/`**: Core storage layer with compaction and data management
- **`cluster/`**: Distributed cluster management, worker pools, and inter-node communication
- **`table/`**: Table data handling, Parquet integration, and data redistribution
- **`cachestore/`**: Caching layer with eviction policies and queue management
- **`sql/`**: SQL parsing and execution layer
- **`streaming/`**: Kafka streaming support and traffic handling
- **`remotefs/`**: Cloud storage integration (S3, GCS, MinIO)
- **`config/`**: Dependency injection and configuration management

## Development Commands

### Building

```bash
# Build all crates in release mode
cargo build --release

# Build all crates in debug mode
cargo build

# Build specific crate
cargo build -p cubestore

# Check code without building
cargo check
```

### Testing

```bash
# Run all tests
cargo test

# Run tests for specific crate
cargo test -p cubestore
cargo test -p cubestore-sql-tests

# Run single test
cargo test test_name

# Run tests with output
cargo test -- --nocapture

# Run integration tests
cargo test --test '*'

# Run benchmarks
cargo bench
```

### Development

```bash
# Format code
cargo fmt

# Check formatting
cargo fmt -- --check

# Run clippy lints
cargo clippy

# Run with debug logging
RUST_LOG=debug cargo run

# Run specific binary
cargo run --bin cubestore

# Watch for changes (requires cargo-watch)
cargo watch -x check -x test
```

### JavaScript Wrapper Commands

```bash
# Build TypeScript wrapper
npm run build

# Run JavaScript tests
npm test

# Lint JavaScript code
npm run lint

# Fix linting issues
npm run lint:fix
```

## Key Dependencies and Technologies

- **DataFusion**: Apache Arrow-based query engine (using Cube's fork)
- **Apache Arrow/Parquet**: Columnar data format and processing
- **RocksDB**: Embedded key-value store for metadata
- **Tokio**: Async runtime for concurrent operations
- **sqlparser-rs**: SQL parsing (using Cube's fork)

## Configuration via Dependency Injection

The codebase uses a custom dependency injection system defined in `config/injection.rs`. Services are configured through the `Injector` and use `Arc<dyn ServiceTrait>` patterns for abstraction.

## Testing Approach

- Unit tests are colocated with source files using `#[cfg(test)]` modules
- Integration tests are in `cubestore-sql-tests/tests/`
- SQL compatibility tests use fixtures in `cubestore-sql-tests/src/tests.rs`
- Benchmarks are in `benches/` directories

## Important Notes

- This is a Rust nightly project (see `rust-toolchain.toml`)
- Uses custom forks of Arrow/DataFusion and sqlparser-rs for Cube-specific features
- Distributed mode involves router and worker nodes communicating via RPC
- Heavy use of async/await patterns with Tokio runtime
- Parquet files are the primary storage format for data
36 changes: 0 additions & 36 deletions rust/cubestore/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion rust/cubestore/cubestore/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,6 @@ tokio-stream = { version = "0.1.15", features=["io-util"] }
scopeguard = "1.1.0"
async-compression = { version = "0.3.7", features = ["gzip", "tokio"] }
tempfile = "3.10.1"
tarpc = { version = "0.24", features = ["tokio1"] }
pin-project-lite = "0.2.4"
paste = "1.0.4"
memchr = "2"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,35 +43,29 @@ impl InfoSchemaTableDef for ColumnsInfoSchemaTableDef {
fn columns(&self) -> Vec<Box<dyn Fn(Arc<Vec<(Column, TablePath)>>) -> ArrayRef>> {
vec![
Box::new(|tables| {
Arc::new(StringArray::from(
Arc::new(StringArray::from_iter_values(
tables
.iter()
.map(|(_, row)| row.schema.get_row().get_name().as_str())
.collect::<Vec<_>>(),
.map(|(_, row)| row.schema.get_row().get_name()),
))
}),
Box::new(|tables| {
Arc::new(StringArray::from(
Arc::new(StringArray::from_iter_values(
tables
.iter()
.map(|(_, row)| row.table.get_row().get_table_name().as_str())
.collect::<Vec<_>>(),
.map(|(_, row)| row.table.get_row().get_table_name()),
))
}),
Box::new(|tables| {
Arc::new(StringArray::from(
tables
.iter()
.map(|(column, _)| column.get_name().as_str())
.collect::<Vec<_>>(),
Arc::new(StringArray::from_iter_values(
tables.iter().map(|(column, _)| column.get_name()),
))
}),
Box::new(|tables| {
Arc::new(StringArray::from(
Arc::new(StringArray::from_iter_values(
tables
.iter()
.map(|(column, _)| column.get_column_type().to_string())
.collect::<Vec<_>>(),
.map(|(column, _)| column.get_column_type().to_string()),
))
}),
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,8 @@ impl InfoSchemaTableDef for SchemataInfoSchemaTableDef {

fn columns(&self) -> Vec<Box<dyn Fn(Arc<Vec<Self::T>>) -> ArrayRef>> {
vec![Box::new(|tables| {
Arc::new(StringArray::from(
tables
.iter()
.map(|row| row.get_row().get_name().as_str())
.collect::<Vec<_>>(),
Arc::new(StringArray::from_iter_values(
tables.iter().map(|row| row.get_row().get_name()),
))
})]
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,48 +40,38 @@ impl InfoSchemaTableDef for TablesInfoSchemaTableDef {
fn columns(&self) -> Vec<Box<dyn Fn(Arc<Vec<Self::T>>) -> ArrayRef>> {
vec![
Box::new(|tables| {
Arc::new(StringArray::from(
tables
.iter()
.map(|row| row.schema.get_row().get_name().as_str())
.collect::<Vec<_>>(),
Arc::new(StringArray::from_iter_values(
tables.iter().map(|row| row.schema.get_row().get_name()),
))
}),
Box::new(|tables| {
Arc::new(StringArray::from(
Arc::new(StringArray::from_iter_values(
tables
.iter()
.map(|row| row.table.get_row().get_table_name().as_str())
.collect::<Vec<_>>(),
.map(|row| row.table.get_row().get_table_name()),
))
}),
Box::new(|tables| {
Arc::new(TimestampNanosecondArray::from(
tables
.iter()
.map(|row| {
row.table
.get_row()
.build_range_end()
.as_ref()
.map(|t| t.timestamp_nanos())
})
.collect::<Vec<_>>(),
))
Arc::new(TimestampNanosecondArray::from_iter(tables.iter().map(
|row| {
row.table
.get_row()
.build_range_end()
.as_ref()
.map(|t| t.timestamp_nanos())
},
)))
}),
Box::new(|tables| {
Arc::new(TimestampNanosecondArray::from(
tables
.iter()
.map(|row| {
row.table
.get_row()
.seal_at()
.as_ref()
.map(|t| t.timestamp_nanos())
})
.collect::<Vec<_>>(),
))
Arc::new(TimestampNanosecondArray::from_iter(tables.iter().map(
|row| {
row.table
.get_row()
.seal_at()
.as_ref()
.map(|t| t.timestamp_nanos())
},
)))
}),
]
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,17 +47,14 @@ impl InfoSchemaTableDef for SystemCacheTableDef {
))
}),
Box::new(|items| {
Arc::new(TimestampNanosecondArray::from(
items
.iter()
.map(|row| {
row.get_row()
.get_expire()
.as_ref()
.map(|t| t.timestamp_nanos())
})
.collect::<Vec<_>>(),
))
Arc::new(TimestampNanosecondArray::from_iter(items.iter().map(
|row| {
row.get_row()
.get_expire()
.as_ref()
.map(|t| t.timestamp_nanos())
},
)))
}),
Box::new(|items| {
Arc::new(StringArray::from_iter(
Expand Down
Loading
Loading