The Data to Science (D2S) platform at Purdue University is an innovative, open-source initiative designed to facilitate data sharing and collaboration among researchers. Developed by Jinha Jung, an associate professor of civil engineering, and his team, the platform primarily focuses on housing data from unmanned aerial vehicles (UAVs) used in agricultural and forestry research.
The D2S platform aims to create a data-driven open science community that promotes sustained innovation. Researchers can upload, manage, and share their UAV data, making it accessible to a broader audience. This collaborative approach helps in advancing research by providing a centralized repository of valuable datasets from various projects worldwide.
Overview of D2S SystemThe Data to Science (D2S) platform at Purdue University stands out from other data-sharing platforms due to several unique features and approaches:
- Specialization in UAV Data: Unlike many general data-sharing platforms, D2S is specifically designed to manage and share data from unmanned aerial vehicles (UAVs), making it particularly valuable for agricultural and forestry research.
- Open-Source and Free Access: D2S is an open-source platform, ensuring that researchers worldwide can access and contribute to the data repository without any cost barriers.
- Focus on Collaboration: The platform emphasizes building a community of researchers who can collaborate and share insights, fostering a more interactive and cooperative research environment.
- Alignment with Open Science Mandates: D2S aligns with the White House Office of Technology and Policy mandates on openness in scientific enterprise, ensuring that federally funded research and supporting data are disclosed to the public at no cost.
- User-Centric Development: The platform is developed with input from its users, ensuring that the tools and features meet the specific needs of the research community. This user-driven approach helps in creating a more effective and user-friendly platform.
- Training and Support: D2S offers training workshops and support to help researchers get acquainted with the platform's tools and capabilities, ensuring they can make the most of its features.
These aspects make D2S a powerful tool for researchers looking to manage, share, and collaborate on UAV data, particularly in the fields of agriculture and forestry.
Docker Engine and Docker Compose are required to run the container with the following instructions. If you can successfully run docker --version
and docker compose --version
from a terminal then you are ready to proceed to the next section.
- Navigate to the root directory of the repository.
- Copy
backend.example.env
to a new file namedbackend.env
.cp backend.example.env backend.env
- Copy
db.example.env
to a new file nameddb.env
.cp db.example.env db.env
- Copy
.env.example
to a new file named.env
.cp .env.example .env
- Copy
frontend.example.env
to a new file namedfrontend.env
.cp frontend.example.env frontend.env
- Copy
frontend/.env.example
to a new file namedfrontend/.env
.cp frontend/.env.example frontend/.env
- Copy
frontend/example.env.development
to a new file namedfrontend/.env.development
.cp frontend/example.env.development frontend/.env.development
-
Open
.env
. Below is a list of the environment variables that can be set inside.env
.Environment variables
EXTERNAL_STORAGE
: Location where raw image zips and metadata will be sent for image processing jobs. It could be a mapped network drive or any other directory on the host machine. This should be left empty unless you have set up an image processing backend that works with the D2S image processing Celery task.TUSD_STORAGE
: Location of Docker managed volume or mapped host directory that stores user uploaded datasets.TILE_SIGNING_SECRET
: Secret key used for creating a signed URL that the client can use to access raster tiles and MVT tiles.
-
Open
frontend.env
. Below is a list of the environment variables that can be set insidefrontend.env
.Environment variables
VITE_MAPBOX_ACCESS_TOKEN
: Mapbox access token for satellite imagery (optional).VITE_MAPTILER_API_KEY
: Maptiler API key for OSM labels (optional).
-
Open
backend.env
in a text editor. Below is a list of the environment variables that can be set insidebackend.env
. You may use the default values or change them as needed.You must provide a value for
SECRET_KEY
in yourbackend.env
file. Use a cryptographically secure random string of at least 32 characters.Environment variables
API_PROJECT_NAME
: Name that will appear in the FastAPI docs.API_DOMAIN
: Domain used for accessing the application (e.g., http://localhost or https://customdomain)CELERY_BROKER_URL
: Address for local redis service.CELERY_RESULT_BACKEND
: Address for local redis service.EXTENSIONS
: Can be used to enable extensions. Should be left blank typically.EXTERNAL_STORAGE
: Internal mount point for external storage. Should be blank unless you have a binding mount for external storage.MAIL_ENABLED
: Enable SMTP email by changing value from 0 to 1.MAIL_SERVER
: SMTP server address.MAIL_USERNAME
: Username for SMTP server.MAIL_PASSWORD
: Password for SMTP server.MAIL_FROM
: Sender email address.MAIL_FROM_NAME
: Name of sender.MAIL_ADMINS
: List of emails that should receive admin mail separated by commas.MAIL_PORT
: SMTP server port.MAPBOX_ACCESS_TOKEN
: Mapbox access token for satellite imagery (optional).POINT_LIMIT
: Total number of points to be used when generating point cloud preview images.RABBITMQ_HOST
: RabbitMQ hostname. Leave blank.RABBITMQ_USERNAME
: RabbitMQ username. Leave blank.RABBITMQ_PASSWORD
: RabbitMQ password. Leave blank.SECRET_KEY
: Secret key for signing and verifying JWT tokens.STAC_API_KEY
: Secret key that can be used for verification by STAC API.STAC_API_URL
: URL for a STAC API.STAC_API_TEST_URL
: URL for a STAC API that can be used for testing.STAC_BROWSER_URL
: URL for STAC Browser site connected to the STAC API.HTTP_COOKIE_SECURE
: Set to 1 to only send cookies over HTTPS, 0 to allow HTTP.LIMIT_MAX_REQUESTS
: Maximum number of requests a worker will handle before being restarted.UVICORN_WORKERS
: Number of uvicorn workers.
-
Open
db.env
in a text editor.POSTGRES_PASSWORD
should be assigned a secure password. The other environment variables can be left on the default values.POSTGRES_HOST
should always be set todb
unless the database service name is changed fromdb
to another name indocker-compose.yml
.If you change
POSTGRES_USER
orPOSTGRES_HOST
, you must also update these environment variables with the new values under thedb
service indocker-compose.yml
. -
Open
frontend/.env
in a text editor. You may use the default values or change them as needed.Environment variables
VITE_API_V1_STR
: Path for API endpoints. Do not change from default value unless the path has been changed in the backend.VITE_BRAND_FULL
: Full name of application.VITE_BRAND_SHORT
: Abbreviated name of application.VITE_BRAND_SLOGAN
: Slogan that appears on landing page.VITE_TITLE
: Page title.VITE_META_DESCRIPTION
: Description for search results and browser tabs.VITE_META_OG_TITLE
: Title for social media shares.VITE_META_OG_DESCRIPTION
: Description for social media shares.VITE_META_OG_TYPE
: Content type (e.g., 'website', 'article').VITE_SHOW_CONTACT_FORM
: Boolean (0 or 1) to indicate if Contact Form link should be shown (requires email service).
-
Open
frontend/.env.development
in a text editor. You may use the default values or change them as needed.Environment variables:
VITE_META_OG_IMAGE
: Preview image URL for social media shares.VITE_META_OG_URL
: Hostname for site.
-
In the root repository directory where
docker-compose.example.yml
is located. Copy it to a new file nameddocker-compose.yml
.cp docker-compose.example.yml docker-compose.yml
-
Build Docker images for the frontend, backend, and proxy services with the following command:
docker compose build
- Use the following command to run the service containers in the background:
docker compose up -d
- Use the following command to stop the containers:
docker compose stop
The Data To Science web application can be accessed from http://localhost:8000
. Replace localhost
with the DOMAIN
environment variable if it was changed to a different value. If port 8000
is already use, or you want to use a different port, change the port in docker-compose.yml
under the proxy
service's ports
setting.
The above sections should provide all the necessary steps to get Data To Science up and running. These next sections provide additional information about using docker-compose-dev.yml
for development, accessing the FastAPI documentation, and running the backend tests.
After running docker compose up -d
, you should be able to access the web API from http://localhost:8000/docs or http://localhost:8000/redoc. The first URL will display the Swagger UI documentation for the API and the second URL will display the ReDoc documentation. The API endpoints can be tried out from either URL.
The pytest
library can be used to run tests for the FastAPI backend. Use the following command to run the full test suite:
docker compose exec backend pytest
If you make any changes the database models, run the following command to create a new migration:
docker compose exec backend alembic revision --autogenerate -m "migration comment"
After creating the new migration, use the following command to update to the tables in the database:
docker compose exec backend alembic upgrade head
For detailed documentation, visit documentation here.