Shiny Test Generator

shiny-test-generator is a Python tool that uses LLMs (Anthropic Claude or OpenAI GPT) to automatically generate pytest tests for Shiny for Python apps. It supports both CLI and library usage, and includes a quality evaluation suite with inspect-ai.

Features

Automated Test Generation: Create pytest+playwright tests from your Shiny app code or file.
Multi-Provider LLMs: Use Anthropic (Claude) or OpenAI (GPT) models.

Installation

Set your API keys as environment variables or in a .env file:

export ANTHROPIC_API_KEY=your_anthropic_api_key
export OPENAI_API_KEY=your_openai_api_key

Or in .env:

ANTHROPIC_API_KEY=your_anthropic_api_key
OPENAI_API_KEY=your_openai_api_key

Install the package:

pip install -e ".[test]"

Usage

# using openai models
from shiny_test_generator import ShinyTestGenerator

gen = ShinyTestGenerator(provider="openai")
test_code, test_path = gen.generate_test_from_file("app.py", model="gpt-4.1")

# using anthropic models
from shiny_test_generator import ShinyTestGenerator

gen = ShinyTestGenerator(provider="anthropic")
test_code, test_path = gen.generate_test_from_file("app.py", model="sonnet")

Tip

For optimal performance, we recommend using the Anthropic sonnet model—it consistently outperforms OpenAI’s models for generating tests.

Model Aliases

Anthropic: haiku3.5, sonnet
OpenAI: gpt-4.1, o3-mini, o4-mini, gpt-4.1-nano

File Output

app.py → test_app.py (same dir by default, or custom dir)

Running inspect-ai evaluations

To run the quality evaluation suite using inspect-ai, you can use the provided GitHub Actions workflow or run it locally:

# generate test metadata
python evals/create_test_metadata.py
# run the evaluation
inspect eval evals/evaluation.py@shiny_test_evaluation --log-dir results/ --log-format json

GitHub Actions Workflow

The inspect_ai_evaluation.yml workflow automates the quality assurance process:

flowchart TD
    A[🚀 Trigger: PR to main] --> B[⚙️ Setup Environment]
    B --> C[📦 Install Dependencies]
    C --> D[🎭 Cache Playwright Browsers]
    D --> E[🔄 Start Loop: 3 Attempts]
    
    E --> F[🧹 Clean Previous Results]
    F --> G[📋 Generate Test Metadata]
    G --> H[🤖 Run Inspect AI Evaluation]
    H --> I[🧪 Run Generated Tests]
    
    I --> J{✅ Tests Pass?}
    J -->|❌ Fail > 1 test| K[💥 Exit with Error]
    J -->|✅ Pass or ≤ 1 failure| L{🔢 More Attempts?}
    
    L -->|Yes| F
    L -->|No| M[📊 Process Results]
    
    M --> N[🚦 Check Quality Gate]
    N --> O{🎯 Quality Gate Pass?}
    O -->|❌ Fail| P[🔴 Workflow Fails]
    O -->|✅ Pass| Q[💬 Comment PR Results]
    Q --> R[🎉 Workflow Success]
    
    K --> S[🔴 Workflow Fails]

    %% Styling
    classDef trigger fill:#e1f5fe,stroke:#01579b,stroke-width:3px,color:#000
    classDef setup fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000
    classDef process fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px,color:#000
    classDef decision fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000
    classDef success fill:#e8f5e8,stroke:#2e7d32,stroke-width:3px,color:#000
    classDef failure fill:#ffebee,stroke:#c62828,stroke-width:3px,color:#000
    classDef loop fill:#f1f8e9,stroke:#33691e,stroke-width:2px,color:#000

    class A trigger
    class B,C,D setup
    class E,F,G,H,I,M,N,Q loop
    class J,L,O decision
    class R success
    class K,P,S failure

The validation process ensures the quality of test generation is maintained by:

Running 3 complete evaluation cycles to test consistency
Allowing up to 1 test failure per attempt (acknowledging LLM non-determinism)
Failing if more than 1 test fails in any attempt
Quality gate checks on the final results
Automatic PR commenting with results summary

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
.github/workflows		.github/workflows
evals		evals
evaluation_apps		evaluation_apps
scripts		scripts
src/shiny_test_generator		src/shiny_test_generator
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Shiny Test Generator

Features

Installation

Usage

Model Aliases

File Output

Running inspect-ai evaluations

GitHub Actions Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

karangattu/pyshiny_examples

Folders and files

Latest commit

History

Repository files navigation

Shiny Test Generator

Features

Installation

Usage

Model Aliases

File Output

Running inspect-ai evaluations

GitHub Actions Workflow

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages