Skip to content

feat: box/csv backend #2093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
e7a2d88
added scafolding for tests
jjmachan Jun 21, 2025
e43441d
migrated all the tests
jjmachan Jun 21, 2025
18c9ba3
removed unused imports
jjmachan Jun 21, 2025
0a78c80
simplified Makefile
jjmachan Jun 21, 2025
1282bac
fix format
jjmachan Jun 21, 2025
e882b49
simplified the make file
jjmachan Jun 21, 2025
bc54634
added the ci
jjmachan Jun 21, 2025
0799188
fix install
jjmachan Jun 21, 2025
c715555
added dependencies
jjmachan Jun 21, 2025
9f892f5
document how to code with this project
jjmachan Jun 21, 2025
576cca4
update CI
jjmachan Jun 21, 2025
47c8644
added group to uv
jjmachan Jun 21, 2025
dd92f28
removed autogenerated comments
jjmachan Jun 21, 2025
91facf8
refactored off `@patch`
jjmachan Jun 22, 2025
41dc6d5
added a plugin system for backend
jjmachan Jun 23, 2025
32f5b83
added plugin docs
jjmachan Jun 23, 2025
ab4023a
fixed type issue for backend
jjmachan Jun 23, 2025
83c5e98
fixed some issue from PR
jjmachan Jun 23, 2025
0f18152
refactored the rest
jjmachan Jun 23, 2025
dc6d283
removed settings.ini
jjmachan Jun 23, 2025
adc0cf9
formating
jjmachan Jun 23, 2025
a976d89
removed all the old files completely
jjmachan Jun 24, 2025
d56e2f7
fixed CI
jjmachan Jun 24, 2025
5370057
fix formatting
jjmachan Jun 24, 2025
205180f
unified the monorepo configurations to workspace.yaml
jjmachan Jun 24, 2025
735110e
fix circular imports
jjmachan Jun 24, 2025
1cfa066
optimized the CI
jjmachan Jun 24, 2025
ce3a710
fix the CI and comment out type test for experimental
jjmachan Jun 24, 2025
849fffa
fixing some names
jjmachan Jun 24, 2025
881e0ef
added box integration basics
jjmachan Jun 25, 2025
8afc1b9
moved to pydantic config objects
jjmachan Jun 25, 2025
0cdd59c
made box with authenticated client
jjmachan Jun 25, 2025
5796637
added some refactors
jjmachan Jun 26, 2025
eba2ac0
refactored to DataTab
jjmachan Jun 26, 2025
7aafe8f
refactor backend out from project/
jjmachan Jun 26, 2025
5e7a684
added a backend readme
jjmachan Jun 26, 2025
292862f
fixed append
jjmachan Jun 28, 2025
0ca5d00
fixed formating
jjmachan Jul 1, 2025
58fdd3f
Merge branch 'main' into feat/box-csv
jjmachan Jul 1, 2025
15f1751
fixed linting
jjmachan Jul 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions experimental/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ name = "ragas_experimental"
description = "Experimental extensions for Ragas"
requires-python = ">=3.9"
authors = [
{name = "jjmachan", email = "jamesjithin97@gmail.com"}
{name = "jjmachan", email = "jithin@explodinggradients.com"},
{name = "ikka", email = "shahul@explodinggradients.com"}
]
license = {text = "Apache-2.0"}
keywords = ["jupyter", "notebook", "python", "evaluation", "llm", "ragas"]
Expand All @@ -22,7 +23,7 @@ classifiers = [
]
dependencies = [
"fastcore",
"tqdm",
"tqdm",
"langfuse",
"instructor",
"pydantic",
Expand All @@ -40,8 +41,9 @@ readme = "README.md"
all = ["pandas"]

[project.entry-points."ragas.backends"]
local_csv = "ragas_experimental.project.backends.local_csv:LocalCSVProjectBackend"
platform = "ragas_experimental.project.backends.platform:PlatformProjectBackend"
"local/csv" = "ragas_experimental.project.backends.local_csv:LocalCSVProjectBackend"
"ragas/app" = "ragas_experimental.project.backends.ragas_app:RagasAppProjectBackend"
"box/csv" = "ragas_experimental.project.backends.box_csv:BoxCSVProjectBackend"

[tool.setuptools.packages.find]
include = ["ragas_experimental*"]
Expand All @@ -58,6 +60,11 @@ dev = [
"pytest-mock>=3.10.0",
"black",
"ruff",
"vcrpy",
"pytest-vcr",
]
box = [
"boxsdk[jwt]",
]
test = []

Expand Down
197 changes: 197 additions & 0 deletions experimental/ragas_experimental/backends/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
# Ragas Backends

Backends store your project data (datasets/experiments) in different places: local files, databases, cloud APIs. You implement 2 classes: `ProjectBackend` (manages projects) and `DataTableBackend` (handles data operations).

```
Project → ProjectBackend → DataTableBackend → Storage
```

## Current State

**Available Backends:**
- `local/csv` - Local CSV files
- `ragas/app` - Ragas cloud platform
- `box/csv` - Box cloud storage

**Import Path:** `ragas_experimental.backends`

**Core Classes:**
- `ProjectBackend` - Project-level operations (create datasets/experiments)
- `DataTableBackend` - Data operations (read/write entries)
- `DataTable` - Base class for `Dataset` and `Experiment`

## Learning Roadmap

Follow this path to build your own backend:

```
□ 1. Understand: Read local_csv.py (simplest example)
□ 2. Explore: Study base.py abstract methods
□ 3. Practice: Modify LocalCSVBackend to add logging
□ 4. Build: Create your own backend following the pattern
□ 5. Advanced: Study ragas_app.py for API/async patterns
□ 6. Package: Create plugin (see Plugin Development)
```

## Quick Usage

**Using existing backends:**
```python
from ragas_experimental.project import Project

# Local CSV
project = Project.create("my_project", "local/csv", root_dir="./data")

# Ragas platform
project = Project.create("my_project", "ragas/app", api_key="your_key")
```

**Basic backend structure:**
```python
from ragas_experimental.backends.base import ProjectBackend, DataTableBackend

class MyProjectBackend(ProjectBackend):
def create_dataset(self, name, model):
# Create storage space for dataset
pass

class MyDataTableBackend(DataTableBackend):
def load_entries(self, model_class):
# Load entries from storage
pass
```

## Essential Methods

**ProjectBackend** (project management):
- `create_dataset()` / `create_experiment()` - Create storage
- `get_dataset_backend()` / `get_experiment_backend()` - Get data handler
- `list_datasets()` / `list_experiments()` - List existing

**DataTableBackend** (data operations):
- `initialize()` - Setup with dataset instance
- `load_entries()` - Load all entries
- `append_entry()` - Add new entry
- `update_entry()` / `delete_entry()` - Modify entries

See `base.py` for complete interface.

## Learn from Examples

**Start here:**
- `local_csv.py` - File-based storage, easiest to understand
- `config.py` - Configuration patterns

**Advanced patterns:**
- `ragas_app.py` - API calls, async, error handling
- `box_csv.py` - Cloud storage, authentication
- `registry.py` - Backend discovery system

## Quick Development

**1. Copy template:**
```bash
cp local_csv.py my_backend.py
```

**2. Replace CSV logic with your storage**

**3. Register backend:**
```python
# In registry.py _register_builtin_backends()
from .my_backend import MyProjectBackend
self.register_backend("my_storage", MyProjectBackend)
```

**4. Test:**
```python
project = Project.create("test", "my_storage")
```

## Plugin Development

**Create separate package:**
```
my-backend-plugin/
├── pyproject.toml
├── src/my_backend/
│ ├── __init__.py
│ └── backend.py
└── tests/
```

**Entry point in pyproject.toml:**
```toml
[project.entry-points."ragas.backends"]
my_storage = "my_backend.backend:MyProjectBackend"
```

**Install and use:**
```bash
pip install my-backend-plugin
python -c "from ragas_experimental.project import Project; Project.create('test', 'my_storage')"
```

## Common Patterns

**ID Generation:**
```python
from .utils import create_nano_id
dataset_id = create_nano_id()
```

**Error Handling:**
```python
try:
# Storage operation
except ConnectionError:
# Handle gracefully
```

**Testing:**
```python
def test_my_backend():
backend = MyProjectBackend()
backend.initialize("test_project")
dataset_id = backend.create_dataset("test", MyModel)
assert dataset_id
```

## Troubleshooting

**Backend not found?** Check registry with:
```python
from ragas_experimental.backends import list_backends
print(list_backends())
```

**Entries not loading?** Verify:
- `initialize()` called before other methods
- `load_entries()` returns list of model instances
- Entry `_row_id` attributes set correctly

**Need help?** Study existing backends - they handle most common patterns.

## Configuration Examples

**Local CSV:**
```python
from ragas_experimental.backends import LocalCSVConfig
config = LocalCSVConfig(root_dir="/path/to/data")
```

**Ragas App:**
```python
from ragas_experimental.backends import RagasAppConfig
config = RagasAppConfig(api_key="key", api_url="https://api.ragas.io")
```

**Box CSV:**
```python
from ragas_experimental.backends import BoxCSVConfig
config = BoxCSVConfig(client=authenticated_box_client)
```

---

**Next Steps:** Start with modifying `local_csv.py`, then build your own following the same patterns.
59 changes: 59 additions & 0 deletions experimental/ragas_experimental/backends/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
"""Backend factory and exports for all backends."""

from .base import DataTableBackend, ProjectBackend

# Import concrete backends
from .local_csv import LocalCSVProjectBackend
from .ragas_app import RagasAppProjectBackend

# Optional backends with dependencies
try:
from .box_csv import BoxCSVProjectBackend
except ImportError:
BoxCSVProjectBackend = None

# Import configuration classes
from .config import BackendConfig, LocalCSVConfig, RagasAppConfig

try:
from .config import BoxCSVConfig
except ImportError:
BoxCSVConfig = None

from .registry import (
BackendRegistry,
create_project_backend,
get_backend_info,
get_registry,
list_backend_info,
list_backends,
print_available_backends,
register_backend,
)

# Import API client
from .ragas_api_client import RagasApiClient

__all__ = [
"ProjectBackend",
"DataTableBackend",
"BackendRegistry",
"get_registry",
"register_backend",
"list_backends",
"get_backend_info",
"list_backend_info",
"print_available_backends",
"create_project_backend",
# Configuration classes
"BackendConfig",
"LocalCSVConfig",
"RagasAppConfig",
"BoxCSVConfig",
# Concrete backends
"LocalCSVProjectBackend",
"RagasAppProjectBackend",
"BoxCSVProjectBackend",
# API client
"RagasApiClient",
]
Loading