Skip to content

Commit 1b84ca7

Browse files
committed
first pass at gdrive integration
Signed-off-by: Derek Anderson <dmikey@users.noreply.github.com> fix and correct gdrive Signed-off-by: Derek Anderson <dmikey@users.noreply.github.com> remove old doc Signed-off-by: Derek Anderson <dmikey@users.noreply.github.com> proper pytest tests Signed-off-by: Derek Anderson <dmikey@users.noreply.github.com>
1 parent 08953bf commit 1b84ca7

File tree

11 files changed

+1455
-48
lines changed

11 files changed

+1455
-48
lines changed
Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# Google Drive Backend Setup Guide
2+
3+
This guide will help you set up and use the Google Drive backend for Ragas datasets.
4+
5+
## Prerequisites
6+
7+
### 1. Install Dependencies
8+
9+
```bash
10+
pip install google-api-python-client google-auth google-auth-oauthlib
11+
```
12+
13+
### 2. Set up Google Cloud Project
14+
15+
1. Go to the [Google Cloud Console](https://console.cloud.google.com/)
16+
2. Create a new project or select an existing one
17+
3. Enable the following APIs:
18+
- Google Drive API
19+
- Google Sheets API
20+
21+
### 3. Create Credentials
22+
23+
You have two options for authentication:
24+
25+
#### Option A: OAuth 2.0 (Recommended for development)
26+
27+
1. In Google Cloud Console, go to "Credentials"
28+
2. Click "Create Credentials" → "OAuth client ID"
29+
3. Choose "Desktop application"
30+
4. Download the JSON file
31+
5. Save it securely (e.g., as `credentials.json`)
32+
33+
#### Option B: Service Account (Recommended for production)
34+
35+
1. In Google Cloud Console, go to "Credentials"
36+
2. Click "Create Credentials" → "Service account"
37+
3. Fill in the details and create the account
38+
4. Generate a key (JSON format)
39+
5. Download and save the JSON file securely
40+
6. Share your Google Drive folder with the service account email
41+
42+
## Setup Instructions
43+
44+
### 1. Create a Google Drive Folder
45+
46+
1. Create a folder in Google Drive where you want to store your datasets
47+
2. Get the folder ID from the URL:
48+
```
49+
https://drive.google.com/drive/folders/FOLDER_ID_HERE
50+
```
51+
3. If using a service account, share this folder with the service account email
52+
53+
### 2. Set Environment Variables (Optional)
54+
55+
```bash
56+
export GDRIVE_FOLDER_ID="your_folder_id_here"
57+
export GDRIVE_CREDENTIALS_PATH="path/to/credentials.json"
58+
# OR for service account:
59+
export GDRIVE_SERVICE_ACCOUNT_PATH="path/to/service_account.json"
60+
```
61+
62+
### 3. Basic Usage
63+
64+
```python
65+
from ragas_experimental.project.core import Project
66+
from pydantic import BaseModel
67+
68+
# Define your data model
69+
class EvaluationEntry(BaseModel):
70+
question: str
71+
answer: str
72+
score: float
73+
74+
# Create project with Google Drive backend
75+
project = Project.create(
76+
name="my_project",
77+
backend="gdrive",
78+
gdrive_folder_id="your_folder_id_here",
79+
gdrive_credentials_path="path/to/credentials.json" # OAuth
80+
# OR
81+
# gdrive_service_account_path="path/to/service_account.json" # Service Account
82+
)
83+
84+
# Create a dataset
85+
dataset = project.create_dataset(
86+
model=EvaluationEntry,
87+
name="my_dataset"
88+
)
89+
90+
# Add data
91+
entry = EvaluationEntry(
92+
question="What is AI?",
93+
answer="Artificial Intelligence",
94+
score=0.95
95+
)
96+
dataset.append(entry)
97+
98+
# Load and access data
99+
dataset.load()
100+
print(f"Dataset has {len(dataset)} entries")
101+
for entry in dataset:
102+
print(f"{entry.question} -> {entry.answer}")
103+
```
104+
105+
## File Structure
106+
107+
When you use the Google Drive backend, it creates the following structure:
108+
109+
```
110+
Your Google Drive Folder/
111+
├── project_name/
112+
│ ├── datasets/
113+
│ │ ├── dataset1.gsheet
114+
│ │ └── dataset2.gsheet
115+
│ └── experiments/
116+
│ └── experiment1.gsheet
117+
```
118+
119+
Each dataset is stored as a Google Sheet with:
120+
- Column headers matching your model fields
121+
- An additional `_row_id` column for internal tracking
122+
- Automatic type conversion when loading data
123+
124+
## Authentication Flow
125+
126+
### OAuth (First Time)
127+
1. When you first run your code, a browser window will open
128+
2. Sign in to your Google account
129+
3. Grant permissions to access Google Drive
130+
4. A `token.json` file will be created automatically
131+
5. Subsequent runs will use this token (no browser needed)
132+
133+
### Service Account
134+
1. No interactive authentication required
135+
2. Make sure the service account has access to your folder
136+
3. The JSON key file is used directly
137+
138+
## Troubleshooting
139+
140+
### Common Issues
141+
142+
1. **"Folder not found" error**
143+
- Check that the folder ID is correct
144+
- Ensure the folder is shared with your service account (if using one)
145+
146+
2. **Authentication errors**
147+
- Verify your credentials file path
148+
- Check that the required APIs are enabled
149+
- For OAuth: Delete `token.json` and re-authenticate
150+
151+
3. **Permission errors**
152+
- Make sure your account has edit access to the folder
153+
- For service accounts: share the folder with the service account email
154+
155+
4. **Import errors**
156+
- Install required dependencies: `pip install google-api-python-client google-auth google-auth-oauthlib`
157+
158+
### Getting Help
159+
160+
If you encounter issues:
161+
1. Check the error message carefully
162+
2. Verify your Google Cloud setup
163+
3. Test authentication with a simple Google Drive API call
164+
4. Check that all dependencies are installed
165+
166+
## Security Best Practices
167+
168+
1. **Never commit credentials to version control**
169+
2. **Use environment variables for sensitive information**
170+
3. **Limit service account permissions to minimum required**
171+
4. **Regularly rotate service account keys**
172+
5. **Use OAuth for development, service accounts for production**
173+
174+
## Advanced Configuration
175+
176+
### Custom Authentication Paths
177+
178+
```python
179+
project = Project.create(
180+
name="my_project",
181+
backend="gdrive",
182+
gdrive_folder_id="folder_id",
183+
gdrive_credentials_path="/custom/path/to/credentials.json",
184+
gdrive_token_path="/custom/path/to/token.json"
185+
)
186+
```
187+
188+
### Multiple Projects
189+
190+
You can have multiple projects in the same Google Drive folder:
191+
192+
```python
193+
project1 = Project.create(name="project1", backend="gdrive", ...)
194+
project2 = Project.create(name="project2", backend="gdrive", ...)
195+
```
196+
197+
Each will create its own subfolder structure.
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
"""
2+
Example usage of the Google Drive backend for Ragas.
3+
4+
This example shows how to:
5+
1. Set up authentication for Google Drive
6+
2. Create a project with Google Drive backend
7+
3. Create and manage datasets stored in Google Sheets
8+
9+
Prerequisites:
10+
1. Install required dependencies:
11+
pip install google-api-python-client google-auth google-auth-oauthlib
12+
13+
2. Set up Google Drive API credentials:
14+
- Go to Google Cloud Console
15+
- Enable Google Drive API and Google Sheets API
16+
- Create credentials (either OAuth or Service Account)
17+
- Download the JSON file
18+
19+
3. Set environment variables or provide paths directly
20+
"""
21+
22+
import os
23+
from pydantic import BaseModel
24+
from ragas_experimental.project.core import Project
25+
from ragas_experimental.metric import MetricResult
26+
27+
28+
# Example model for our dataset
29+
class EvaluationEntry(BaseModel):
30+
question: str
31+
answer: str
32+
context: str
33+
score: float
34+
feedback: str
35+
36+
37+
def example_oauth_setup():
38+
"""Example using OAuth authentication."""
39+
40+
# Set up environment variables (or pass directly to Project.create)
41+
# os.environ["GDRIVE_FOLDER_ID"] = "your_google_drive_folder_id_here"
42+
# os.environ["GDRIVE_CREDENTIALS_PATH"] = "path/to/your/credentials.json"
43+
44+
# Create project with Google Drive backend
45+
project = Project.create(
46+
name="my_ragas_project",
47+
description="A project using Google Drive for storage",
48+
backend="gdrive",
49+
gdrive_folder_id="1HLvvtKLnwGWKTely0YDlJ397XPTQ77Yg",
50+
gdrive_credentials_path="/Users/derekanderson/Downloads/credentials.json",
51+
gdrive_token_path="token.json" # Will be created automatically
52+
)
53+
54+
return project
55+
56+
57+
def example_usage():
58+
"""Example of using the Google Drive backend."""
59+
60+
# Create a project (choose one of the authentication methods above)
61+
project = example_oauth_setup() # or example_service_account_setup()
62+
63+
# Create a dataset
64+
dataset = project.create_dataset(
65+
model=EvaluationEntry,
66+
name="evaluation_results"
67+
)
68+
69+
# Add some entries
70+
entry1 = EvaluationEntry(
71+
question="What is the capital of France?",
72+
answer="Paris",
73+
context="France is a country in Europe.",
74+
score=0.95,
75+
feedback="Correct answer"
76+
)
77+
78+
entry2 = EvaluationEntry(
79+
question="What is 2+2?",
80+
answer="4",
81+
context="Basic arithmetic question.",
82+
score=1.0,
83+
feedback="Perfect answer"
84+
)
85+
86+
# Append entries to the dataset
87+
dataset.append(entry1)
88+
dataset.append(entry2)
89+
90+
# Load all entries
91+
dataset.load()
92+
print(f"Dataset contains {len(dataset)} entries")
93+
94+
# Access entries
95+
for i, entry in enumerate(dataset):
96+
print(f"Entry {i}: {entry.question} -> {entry.answer} (Score: {entry.score})")
97+
98+
# Update an entry
99+
dataset[0].score = 0.98
100+
dataset[0].feedback = "Updated feedback"
101+
dataset[0] = dataset[0] # Trigger update
102+
103+
# Search for entries
104+
entry = dataset._backend.get_entry_by_field("question", "What is 2+2?", EvaluationEntry)
105+
if entry:
106+
print(f"Found entry: {entry.answer}")
107+
108+
return dataset
109+
110+
111+
if __name__ == "__main__":
112+
# Run the example
113+
try:
114+
dataset = example_usage()
115+
print("Google Drive backend example completed successfully!")
116+
except Exception as e:
117+
print(f"Error: {e}")
118+
print("\nMake sure to:")
119+
print("1. Install required dependencies")
120+
print("2. Set up Google Drive API credentials")
121+
print("3. Update the folder ID and credential paths in this example")
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Optional imports for backends that require additional dependencies
2+
3+
# Always available backends
4+
from .ragas_api_client import RagasApiClient
5+
from .factory import RagasApiClientFactory
6+
7+
# Conditionally import Google Drive backend
8+
try:
9+
from .gdrive_backend import GDriveBackend
10+
__all__ = ["RagasApiClient", "RagasApiClientFactory", "GDriveBackend"]
11+
except ImportError:
12+
__all__ = ["RagasApiClient", "RagasApiClientFactory"]
13+
14+
# Conditionally import Notion backend if available
15+
try:
16+
from .notion_backend import NotionBackend
17+
__all__.append("NotionBackend")
18+
except ImportError:
19+
pass
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
"""Base classes for dataset backends."""
2+
3+
from abc import ABC, abstractmethod
4+
import typing as t
5+
6+
7+
class DatasetBackend(ABC):
8+
"""Abstract base class for dataset backends.
9+
10+
All dataset storage backends must implement these methods.
11+
"""
12+
13+
@abstractmethod
14+
def initialize(self, dataset):
15+
"""Initialize the backend with dataset information"""
16+
pass
17+
18+
@abstractmethod
19+
def get_column_mapping(self, model):
20+
"""Get mapping between model fields and backend columns"""
21+
pass
22+
23+
@abstractmethod
24+
def load_entries(self, model_class):
25+
"""Load all entries from storage"""
26+
pass
27+
28+
@abstractmethod
29+
def append_entry(self, entry):
30+
"""Add a new entry to storage and return its ID"""
31+
pass
32+
33+
@abstractmethod
34+
def update_entry(self, entry):
35+
"""Update an existing entry in storage"""
36+
pass
37+
38+
@abstractmethod
39+
def delete_entry(self, entry_id):
40+
"""Delete an entry from storage"""
41+
pass
42+
43+
@abstractmethod
44+
def get_entry_by_field(self, field_name: str, field_value: t.Any, model_class):
45+
"""Get an entry by field value"""
46+
pass

0 commit comments

Comments
 (0)