Explore alternatives to SQLite/Parquet for fast data access


**Description**  
Fast access to experiment file formats is essential. Alternatives such as LMDB or Hugging Face datasets may offer better performance in some scenarios. The current Parquet dataset is broken and unlikely to work satisfactorily. Benchmarking is needed to determine tradeoffs.

Potential candidates: LMBD, Hugging Face Dataset, memorymapped .npy arrays (PolarBERT)
 
Some of these formats provide fast random access (like SQLite), while others is read sequentially and therefore require randomization on-write. As a result, the user experience is different. We should consider if/how we can support both regimes.

**Acceptance Criteria**  
- [ ] Benchmark storage footprint and query speeds
- [ ] Assess feasibility of storing data representations, not just raw data  
- [ ] Document benchmarking results and recommendations  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore alternatives to SQLite/Parquet for fast data access #834

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Explore alternatives to SQLite/Parquet for fast data access #834

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions