|
| 1 | +# Google Drive Backend Setup Guide |
| 2 | + |
| 3 | +This guide will help you set up and use the Google Drive backend for Ragas datasets. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +### 1. Install Dependencies |
| 8 | + |
| 9 | +```bash |
| 10 | +pip install google-api-python-client google-auth google-auth-oauthlib |
| 11 | +``` |
| 12 | + |
| 13 | +### 2. Set up Google Cloud Project |
| 14 | + |
| 15 | +1. Go to the [Google Cloud Console](https://console.cloud.google.com/) |
| 16 | +2. Create a new project or select an existing one |
| 17 | +3. Enable the following APIs: |
| 18 | + - Google Drive API |
| 19 | + - Google Sheets API |
| 20 | + |
| 21 | +### 3. Create Credentials |
| 22 | + |
| 23 | +You have two options for authentication: |
| 24 | + |
| 25 | +#### Option A: OAuth 2.0 (Recommended for development) |
| 26 | + |
| 27 | +1. In Google Cloud Console, go to "Credentials" |
| 28 | +2. Click "Create Credentials" → "OAuth client ID" |
| 29 | +3. Choose "Desktop application" |
| 30 | +4. Download the JSON file |
| 31 | +5. Save it securely (e.g., as `credentials.json`) |
| 32 | + |
| 33 | +#### Option B: Service Account (Recommended for production) |
| 34 | + |
| 35 | +1. In Google Cloud Console, go to "Credentials" |
| 36 | +2. Click "Create Credentials" → "Service account" |
| 37 | +3. Fill in the details and create the account |
| 38 | +4. Generate a key (JSON format) |
| 39 | +5. Download and save the JSON file securely |
| 40 | +6. Share your Google Drive folder with the service account email |
| 41 | + |
| 42 | +## Setup Instructions |
| 43 | + |
| 44 | +### 1. Create a Google Drive Folder |
| 45 | + |
| 46 | +1. Create a folder in Google Drive where you want to store your datasets |
| 47 | +2. Get the folder ID from the URL: |
| 48 | + ``` |
| 49 | + https://drive.google.com/drive/folders/FOLDER_ID_HERE |
| 50 | + ``` |
| 51 | +3. If using a service account, share this folder with the service account email |
| 52 | + |
| 53 | +### 2. Set Environment Variables (Optional) |
| 54 | + |
| 55 | +```bash |
| 56 | +export GDRIVE_FOLDER_ID="your_folder_id_here" |
| 57 | +export GDRIVE_CREDENTIALS_PATH="path/to/credentials.json" |
| 58 | +# OR for service account: |
| 59 | +export GDRIVE_SERVICE_ACCOUNT_PATH="path/to/service_account.json" |
| 60 | +``` |
| 61 | + |
| 62 | +### 3. Basic Usage |
| 63 | + |
| 64 | +```python |
| 65 | +from ragas_experimental.project.core import Project |
| 66 | +from pydantic import BaseModel |
| 67 | + |
| 68 | +# Define your data model |
| 69 | +class EvaluationEntry(BaseModel): |
| 70 | + question: str |
| 71 | + answer: str |
| 72 | + score: float |
| 73 | + |
| 74 | +# Create project with Google Drive backend |
| 75 | +project = Project.create( |
| 76 | + name="my_project", |
| 77 | + backend="gdrive", |
| 78 | + gdrive_folder_id="your_folder_id_here", |
| 79 | + gdrive_credentials_path="path/to/credentials.json" # OAuth |
| 80 | + # OR |
| 81 | + # gdrive_service_account_path="path/to/service_account.json" # Service Account |
| 82 | +) |
| 83 | + |
| 84 | +# Create a dataset |
| 85 | +dataset = project.create_dataset( |
| 86 | + model=EvaluationEntry, |
| 87 | + name="my_dataset" |
| 88 | +) |
| 89 | + |
| 90 | +# Add data |
| 91 | +entry = EvaluationEntry( |
| 92 | + question="What is AI?", |
| 93 | + answer="Artificial Intelligence", |
| 94 | + score=0.95 |
| 95 | +) |
| 96 | +dataset.append(entry) |
| 97 | + |
| 98 | +# Load and access data |
| 99 | +dataset.load() |
| 100 | +print(f"Dataset has {len(dataset)} entries") |
| 101 | +for entry in dataset: |
| 102 | + print(f"{entry.question} -> {entry.answer}") |
| 103 | +``` |
| 104 | + |
| 105 | +## File Structure |
| 106 | + |
| 107 | +When you use the Google Drive backend, it creates the following structure: |
| 108 | + |
| 109 | +``` |
| 110 | +Your Google Drive Folder/ |
| 111 | +├── project_name/ |
| 112 | +│ ├── datasets/ |
| 113 | +│ │ ├── dataset1.gsheet |
| 114 | +│ │ └── dataset2.gsheet |
| 115 | +│ └── experiments/ |
| 116 | +│ └── experiment1.gsheet |
| 117 | +``` |
| 118 | + |
| 119 | +Each dataset is stored as a Google Sheet with: |
| 120 | +- Column headers matching your model fields |
| 121 | +- An additional `_row_id` column for internal tracking |
| 122 | +- Automatic type conversion when loading data |
| 123 | + |
| 124 | +## Authentication Flow |
| 125 | + |
| 126 | +### OAuth (First Time) |
| 127 | +1. When you first run your code, a browser window will open |
| 128 | +2. Sign in to your Google account |
| 129 | +3. Grant permissions to access Google Drive |
| 130 | +4. A `token.json` file will be created automatically |
| 131 | +5. Subsequent runs will use this token (no browser needed) |
| 132 | + |
| 133 | +### Service Account |
| 134 | +1. No interactive authentication required |
| 135 | +2. Make sure the service account has access to your folder |
| 136 | +3. The JSON key file is used directly |
| 137 | + |
| 138 | +## Troubleshooting |
| 139 | + |
| 140 | +### Common Issues |
| 141 | + |
| 142 | +1. **"Folder not found" error** |
| 143 | + - Check that the folder ID is correct |
| 144 | + - Ensure the folder is shared with your service account (if using one) |
| 145 | + |
| 146 | +2. **Authentication errors** |
| 147 | + - Verify your credentials file path |
| 148 | + - Check that the required APIs are enabled |
| 149 | + - For OAuth: Delete `token.json` and re-authenticate |
| 150 | + |
| 151 | +3. **Permission errors** |
| 152 | + - Make sure your account has edit access to the folder |
| 153 | + - For service accounts: share the folder with the service account email |
| 154 | + |
| 155 | +4. **Import errors** |
| 156 | + - Install required dependencies: `pip install google-api-python-client google-auth google-auth-oauthlib` |
| 157 | + |
| 158 | +### Getting Help |
| 159 | + |
| 160 | +If you encounter issues: |
| 161 | +1. Check the error message carefully |
| 162 | +2. Verify your Google Cloud setup |
| 163 | +3. Test authentication with a simple Google Drive API call |
| 164 | +4. Check that all dependencies are installed |
| 165 | + |
| 166 | +## Security Best Practices |
| 167 | + |
| 168 | +1. **Never commit credentials to version control** |
| 169 | +2. **Use environment variables for sensitive information** |
| 170 | +3. **Limit service account permissions to minimum required** |
| 171 | +4. **Regularly rotate service account keys** |
| 172 | +5. **Use OAuth for development, service accounts for production** |
| 173 | + |
| 174 | +## Advanced Configuration |
| 175 | + |
| 176 | +### Custom Authentication Paths |
| 177 | + |
| 178 | +```python |
| 179 | +project = Project.create( |
| 180 | + name="my_project", |
| 181 | + backend="gdrive", |
| 182 | + gdrive_folder_id="folder_id", |
| 183 | + gdrive_credentials_path="/custom/path/to/credentials.json", |
| 184 | + gdrive_token_path="/custom/path/to/token.json" |
| 185 | +) |
| 186 | +``` |
| 187 | + |
| 188 | +### Multiple Projects |
| 189 | + |
| 190 | +You can have multiple projects in the same Google Drive folder: |
| 191 | + |
| 192 | +```python |
| 193 | +project1 = Project.create(name="project1", backend="gdrive", ...) |
| 194 | +project2 = Project.create(name="project2", backend="gdrive", ...) |
| 195 | +``` |
| 196 | + |
| 197 | +Each will create its own subfolder structure. |
0 commit comments