Skip to content

dataforgoodfr/offseason-shelter-for-science-rescue_db

Repository files navigation

πŸ“Š Rescue DB

Python FastAPI PostgreSQL

The central database service for the Offseason Shelter for Science climate data rescue system. This service manages the catalog of datasets, resources, and assets from data.gov, providing a comprehensive database for tracking and organizing climate data that needs to be preserved.

🎯 Purpose

Rescue DB serves as the single source of truth for all climate data metadata in the rescue system. It stores information about:

  • πŸ“‹ Datasets: Complete metadata from data.gov (title, description, organization, access statistics)
  • πŸ“ Resources: Individual data sources within datasets (files, APIs, databases)
  • πŸ’Ύ Assets: Actual downloadable files with URLs, sizes, and metadata
  • 🏒 Organizations: Government agencies that publish the data
  • πŸ“„ AssetKinds: Types of files (CSV, JSON, ZIP, etc.)

πŸ—οΈ Architecture

Database Schema

  • PostgreSQL with SQLAlchemy ORM
  • Alembic for database migrations
  • Proper relationships between all entities
  • Indexing for optimal query performance

API Layer

  • FastAPI for RESTful API endpoints
  • Automatic documentation at /docs
  • Pydantic models for request/response validation
  • Database connection pooling

πŸš€ Quick Start

Prerequisites

🐳 Docker Setup (Recommended)

  1. Clone and navigate to the project:
cd rescue_db
  1. Configure environment:
cp .env.dist .env
# Edit .env with your database credentials
  1. Start the services:
docker compose up
  1. Access the API:

πŸ’» Local Development

  1. Install dependencies:
uv sync
  1. Set up environment:
cp .env.dist .env
# Configure your local PostgreSQL connection
  1. Run database migrations:
uv run alembic upgrade head
  1. Start the development server:
uv run fastapi dev rescue_api/main.py

πŸ“‘ API Endpoints

Core Endpoints

  • GET /datasets - List all datasets
  • GET /datasets/{id} - Get specific dataset details
  • GET /resources - List all resources
  • GET /assets - List all downloadable assets
  • GET /organizations - List all organizations

Search & Filter

  • GET /datasets/search - Search datasets by criteria
  • GET /assets/by-type - Filter assets by file type
  • GET /datasets/by-organization - Filter by organization

Statistics

  • GET /stats/overview - System overview statistics
  • GET /stats/organizations - Data distribution by organization
  • GET /stats/file-types - Asset distribution by file type

πŸ—„οΈ Database Management

Migrations

Apply the latest migrations:

uv run alembic upgrade head

Create a new migration:

uv run alembic revision --autogenerate -m "Description of changes"

Database Models

Key entities in the system:

  • Dataset: Core dataset information from data.gov
  • Resource: Individual data sources within datasets
  • Asset: Actual downloadable files with metadata
  • Organization: Government agencies publishing data
  • AssetKind: File type classifications

Relationships

Organization β†’ Datasets β†’ Resources β†’ Assets
                    ↓
                AssetKinds

πŸ› οΈ Development

Project Structure

rescue_db/
β”œβ”€β”€ πŸ“ rescue_api/           # FastAPI application
β”‚   β”œβ”€β”€ πŸ“ entities/         # SQLAlchemy models
β”‚   β”œβ”€β”€ πŸ“ models/           # Pydantic schemas
β”‚   β”œβ”€β”€ πŸ“ routers/          # API endpoints
β”‚   └── main.py             # Application entry point
β”œβ”€β”€ πŸ“ alembic/             # Database migrations
β”œβ”€β”€ πŸ“ manual_data_update/  # Manual data operations
└── docker-compose.yml      # Docker configuration

Adding New Models

  1. Create the model in rescue_api/entities/
  2. Generate migration:
uv run alembic revision --autogenerate -m "Add new model"
  1. Review and apply migration:
uv run alembic upgrade head

Testing

Run the test suite:

uv run pytest

πŸ”§ Configuration

Environment Variables

Key configuration options in .env:

# Database
RESCUE_API_POSTGRES_USER=user
RESCUE_API_POSTGRES_PASSWORD=password
RESCUE_API_POSTGRES_DB=us_climate_data
RESCUE_API_POSTGRES_HOST=...
RESCUE_API_POSTGRES_PORT=...


# API
RESCUE_API_HOST=0.0.0.0
RESCUE_API_PORT=80

# Development
RESCUE_API_DEBUG=true

Docker Configuration

The docker-compose.yml includes:

  • PostgreSQL database with health checks
  • FastAPI application with hot reload
  • Volume persistence for database data
  • Network isolation for security

πŸ“Š Data Operations

Manual Data Updates

The manual_data_update/ directory contains scripts for:

  • Data deduplication operations
  • Bulk data imports and updates
  • Data validation and cleanup
  • Performance optimization scripts

Backup and Recovery

# Create database backup
docker exec rescue_db_db_1 pg_dump -U user us_climate_data > backup.sql

# Restore from backup
docker exec -i rescue_db_db_1 psql -U user us_climate_data < backup.sql

πŸ› Troubleshooting

Common Issues

Database connection failed:

  • Check PostgreSQL is running
  • Verify credentials in .env
  • Ensure port 5432 is available

Migration errors:

  • Check for conflicting migrations
  • Verify model changes are correct
  • Review migration files before applying

API not responding:

  • Check FastAPI logs
  • Verify port 8000 is available
  • Ensure all dependencies are installed

Logs

View application logs:

# Docker logs
docker compose logs rescue-api

# Database logs
docker compose logs db

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Update documentation
  6. Submit a pull request

πŸ“„ License

This project is part of the Offseason Shelter for Science system and is licensed under the MIT License.


Built with ❀️ by Data For Science, for climate data preservation 🌍

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages