The central database service for the Offseason Shelter for Science climate data rescue system. This service manages the catalog of datasets, resources, and assets from data.gov, providing a comprehensive database for tracking and organizing climate data that needs to be preserved.
Rescue DB serves as the single source of truth for all climate data metadata in the rescue system. It stores information about:
- π Datasets: Complete metadata from data.gov (title, description, organization, access statistics)
- π Resources: Individual data sources within datasets (files, APIs, databases)
- πΎ Assets: Actual downloadable files with URLs, sizes, and metadata
- π’ Organizations: Government agencies that publish the data
- π AssetKinds: Types of files (CSV, JSON, ZIP, etc.)
- PostgreSQL with SQLAlchemy ORM
- Alembic for database migrations
- Proper relationships between all entities
- Indexing for optimal query performance
- FastAPI for RESTful API endpoints
- Automatic documentation at
/docs
- Pydantic models for request/response validation
- Database connection pooling
- uv for Python package management
- Docker Compose for containerized deployment
- PostgreSQL (included in Docker setup)
- Clone and navigate to the project:
cd rescue_db
- Configure environment:
cp .env.dist .env
# Edit .env with your database credentials
- Start the services:
docker compose up
- Access the API:
- API Documentation: http://localhost:8000/docs
- Database: localhost:5432
- Install dependencies:
uv sync
- Set up environment:
cp .env.dist .env
# Configure your local PostgreSQL connection
- Run database migrations:
uv run alembic upgrade head
- Start the development server:
uv run fastapi dev rescue_api/main.py
GET /datasets
- List all datasetsGET /datasets/{id}
- Get specific dataset detailsGET /resources
- List all resourcesGET /assets
- List all downloadable assetsGET /organizations
- List all organizations
GET /datasets/search
- Search datasets by criteriaGET /assets/by-type
- Filter assets by file typeGET /datasets/by-organization
- Filter by organization
GET /stats/overview
- System overview statisticsGET /stats/organizations
- Data distribution by organizationGET /stats/file-types
- Asset distribution by file type
Apply the latest migrations:
uv run alembic upgrade head
Create a new migration:
uv run alembic revision --autogenerate -m "Description of changes"
Key entities in the system:
Dataset
: Core dataset information from data.govResource
: Individual data sources within datasetsAsset
: Actual downloadable files with metadataOrganization
: Government agencies publishing dataAssetKind
: File type classifications
Organization β Datasets β Resources β Assets
β
AssetKinds
rescue_db/
βββ π rescue_api/ # FastAPI application
β βββ π entities/ # SQLAlchemy models
β βββ π models/ # Pydantic schemas
β βββ π routers/ # API endpoints
β βββ main.py # Application entry point
βββ π alembic/ # Database migrations
βββ π manual_data_update/ # Manual data operations
βββ docker-compose.yml # Docker configuration
- Create the model in
rescue_api/entities/
- Generate migration:
uv run alembic revision --autogenerate -m "Add new model"
- Review and apply migration:
uv run alembic upgrade head
Run the test suite:
uv run pytest
Key configuration options in .env
:
# Database
RESCUE_API_POSTGRES_USER=user
RESCUE_API_POSTGRES_PASSWORD=password
RESCUE_API_POSTGRES_DB=us_climate_data
RESCUE_API_POSTGRES_HOST=...
RESCUE_API_POSTGRES_PORT=...
# API
RESCUE_API_HOST=0.0.0.0
RESCUE_API_PORT=80
# Development
RESCUE_API_DEBUG=true
The docker-compose.yml
includes:
- PostgreSQL database with health checks
- FastAPI application with hot reload
- Volume persistence for database data
- Network isolation for security
The manual_data_update/
directory contains scripts for:
- Data deduplication operations
- Bulk data imports and updates
- Data validation and cleanup
- Performance optimization scripts
# Create database backup
docker exec rescue_db_db_1 pg_dump -U user us_climate_data > backup.sql
# Restore from backup
docker exec -i rescue_db_db_1 psql -U user us_climate_data < backup.sql
Database connection failed:
- Check PostgreSQL is running
- Verify credentials in
.env
- Ensure port 5432 is available
Migration errors:
- Check for conflicting migrations
- Verify model changes are correct
- Review migration files before applying
API not responding:
- Check FastAPI logs
- Verify port 8000 is available
- Ensure all dependencies are installed
View application logs:
# Docker logs
docker compose logs rescue-api
# Database logs
docker compose logs db
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Update documentation
- Submit a pull request
This project is part of the Offseason Shelter for Science system and is licensed under the MIT License.
Built with β€οΈ by Data For Science, for climate data preservation π