Chatbot: Add software tests for workshop notebook #920

amotl · 2025-04-25T12:11:33Z

Accompanying GH-912, GH-916, and GH-919.

coderabbitai · 2025-04-25T12:11:40Z

Walkthrough

This update introduces automated testing infrastructure and supporting files for a chatbot-related project. A new GitHub Actions workflow is added to automate validation and testing, including scheduled and manual triggers, with a matrix covering multiple Python and CrateDB versions. Pytest configuration and supporting fixtures are implemented to enable robust notebook-based testing, including database table resets for test isolation. Additional test dependencies are specified, and a test function is added to execute Jupyter Notebooks as test cases.

Changes

Files/Paths	Change Summary
.github/workflows/ml-chatbot.yml	New GitHub Actions workflow for automated testing/validation, with matrix builds, scheduled/manual triggers.
topic/chatbot/table-augmented-generation/workshop/conftest.py	Adds pytest hooks/fixtures for dynamic notebook test generation and database table reset before each test.
topic/chatbot/table-augmented-generation/workshop/pyproject.toml	Adds pytest configuration: options, environment variables, logging, and test discovery settings.
topic/chatbot/table-augmented-generation/workshop/test.py	Adds test function to execute Jupyter Notebooks using testbook.
topic/chatbot/table-augmented-generation/workshop/requirements-test.txt	Adds requirements file specifying test dependencies: pueblo[notebook,testing], sqlalchemy-cratedb.

Sequence Diagram(s)

sequenceDiagram
    participant GitHub
    participant Workflow
    participant Runner
    participant CrateDB
    participant Pytest

    GitHub->>Workflow: Trigger (push, PR, schedule, manual)
    Workflow->>Runner: Start job (matrix: Python/CrateDB)
    Runner->>CrateDB: Start service container
    Runner->>Runner: Set up Python, install dependencies
    Runner->>Pytest: Run tests (ngr test)
    Pytest->>Runner: Load conftest.py, pyproject.toml
    Pytest->>CrateDB: Reset tables before each test
    Pytest->>Runner: Discover and execute notebook tests
    Pytest->>CrateDB: Interact with database as needed
    Pytest->>Runner: Report results

Suggested reviewers

wierdvanderhaar

Poem

In the warren of code, a new workflow appears,
With tests for our bots, and CrateDB cheers!
Pytest now scurries, resets tables with care,
While notebooks are tested, their outputs laid bare.
Dependencies gathered, the rabbits convene—
Automation in place, our codebase stays clean!
🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ceedf8d and 8718e98.

📒 Files selected for processing (5)

.github/workflows/ml-chatbot.yml (1 hunks)
topic/chatbot/table-augmented-generation/workshop/conftest.py (1 hunks)
topic/chatbot/table-augmented-generation/workshop/pyproject.toml (1 hunks)
topic/chatbot/table-augmented-generation/workshop/requirements-test.txt (1 hunks)
topic/chatbot/table-augmented-generation/workshop/test.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

topic/chatbot/table-augmented-generation/workshop/requirements-test.txt

🚧 Files skipped from review as they are similar to previous changes (4)

topic/chatbot/table-augmented-generation/workshop/test.py
topic/chatbot/table-augmented-generation/workshop/conftest.py
topic/chatbot/table-augmented-generation/workshop/pyproject.toml
.github/workflows/ml-chatbot.yml

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 5

🔭 Outside diff range comments (1)

topic/chatbot/table-augmented-generation/workshop/requirements-dev.txt (1)
1-2: ⚠️ Potential issue

Missing required dependency for testing infrastructure.

The tests depend on SQLAlchemy as shown in the conftest.py file, but it's not included in the development requirements. This is causing the pipeline failure with ModuleNotFoundError: No module named 'sqlalchemy'.

Add SQLAlchemy to the requirements:
pueblo[notebook,testing]>=0.0.10
+sqlalchemy

🧹 Nitpick comments (4)

topic/chatbot/table-augmented-generation/workshop/pyproject.toml (1)
7-8: Consider making the connection string configurable with a default.

The hardcoded connection string assumes a specific database configuration which may not be available in all environments. This could make the tests brittle.

Consider modifying to allow for configuration override:
env = [
-    "CRATEDB_CONNECTION_STRING=crate://crate@localhost/?schema=notebook",
+    "CRATEDB_CONNECTION_STRING=${CRATEDB_CONNECTION_STRING:-crate://crate@localhost/?schema=notebook}",
    "PYDEVD_DISABLE_FILE_VALIDATION=1",
]
.github/workflows/ml-chatbot.yml (3)
3-15: Refine path filters: remove leading slash in file patterns.
GitHub Actions treats paths as relative to the repository root. Leading slashes can be misleading and may not behave as expected. For consistency, update:
- - '/requirements.txt'
+ - 'requirements.txt'
23-27: Consider improving concurrency grouping for pull requests.
Using ${{ github.ref }} groups runs by Git reference, which for PRs becomes refs/pull/.../merge. To group runs by the source branch (and avoid conflating PR runs), you may want:
concurrency:
  cancel-in-progress: true
  group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
45-53: Add a health check for the CrateDB service.
Without a health check, the job may attempt to connect before CrateDB is ready. You can leverage Docker health options:
services:
  cratedb:
    image: crate/crate:${{ matrix.cratedb-version }}
    ports:
      - 4200:4200
      - 5432:5432
    options: >-
      --health-cmd "curl -sSf http://localhost:4200/_status || exit 1"
      --health-interval 10s
      --health-retries 30
    env:
      CRATE_HEAP_SIZE: 4g

🛑 Comments failed to post (5)

topic/chatbot/table-augmented-generation/workshop/pyproject.toml (1)
14-16: 💡 Verification agent
🧩 Analysis chain

Potentially restrictive test path configuration.

The current testpaths setting only includes Python files in the current directory. This works for the current setup but may be too restrictive as the test suite grows.

🏁 Script executed:
#!/bin/bash
# Check if there are any test files in subdirectories that would be missed
find . -type f -name "test_*.py" | grep -v "^\./test_.*\.py$" || echo "No test files in subdirectories found"
Length of output: 384
Fix test discovery in pyproject.toml

Your current testpaths only picks up top-level *.py files, but we have tests living in subdirectories. You should either remove the testpaths setting (so pytest’s default discovery kicks in) or explicitly include your subdirs.

• File: topic/chatbot/table-augmented-generation/workshop/pyproject.toml
Lines: 14–16

Suggested change (choose one):

Option A: Let pytest auto-discover all tests
-[tool.pytest.ini_options]
-testpaths = [
-    "*.py",
-]
+[tool.pytest.ini_options]
+# Remove `testpaths` to allow pytest’s default discovery
Option B: Explicitly include your test directories
-[tool.pytest.ini_options]
-testpaths = [
-    "*.py",
-]
+[tool.pytest.ini_options]
+testpaths = [
+    "application/cratedb-toolkit",
+    "by-dataframe/pandas",
+    "testing",
+]
topic/chatbot/table-augmented-generation/workshop/conftest.py (2)
23-26: 🛠️ Refactor suggestion

Add error handling for missing connection string.

The code assumes CRATEDB_CONNECTION_STRING environment variable is always set, but doesn't handle the case when it's missing. If the variable is not set, this will cause a cryptic error.
-    connection_string = os.environ.get("CRATEDB_CONNECTION_STRING")
+    connection_string = os.environ.get("CRATEDB_CONNECTION_STRING")
+    if not connection_string:
+        pytest.skip("CRATEDB_CONNECTION_STRING environment variable not set")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    connection_string = os.environ.get("CRATEDB_CONNECTION_STRING")
    if not connection_string:
        pytest.skip("CRATEDB_CONNECTION_STRING environment variable not set")

    engine = sa.create_engine(connection_string, echo=os.environ.get("DEBUG"))
    connection = engine.connect()
28-34: 🛠️ Refactor suggestion

Implement safer table resetting with schema qualification.

The current DROP TABLE statements do not include schema qualification, which could potentially affect tables with the same name in other schemas. Additionally, there's no transaction management or connection closing.
    reset_tables = [
        "machine_manuals",
        "motor_readings",
    ]

-    for table in reset_tables:
-        connection.execute(sa.text(f"DROP TABLE IF EXISTS {table};"))
+    try:
+        # Extract schema from connection string
+        schema = "notebook"  # Default schema from connection string
+        for table in reset_tables:
+            connection.execute(sa.text(f"DROP TABLE IF EXISTS {schema}.{table};"))
+    finally:
+        connection.close()
+        engine.dispose()
Committable suggestion skipped: line range outside the PR's diff.
.github/workflows/ml-chatbot.yml (2)
30-33: 🛠️ Refactor suggestion

Fix job name multiline quoting.
The current multi-line string with leading quotes is invalid YAML and can cause a parse error. Instead, use a block scalar or a single-line string. For example:
name: |
  Python: ${{ matrix.python-version }}
  CrateDB: ${{ matrix.cratedb-version }}
  OS: ${{ matrix.os }}
73-76: 🛠️ Refactor suggestion

Install workshop dependencies alongside root requirements.
Currently only the root requirements.txt is installed, but notebook tests depend on additional packages in topic/chatbot/.../app/requirements.txt and .../workshop/requirements-dev.txt. For example:
- pip install -r requirements.txt
+ pip install -r requirements.txt \
+     -r topic/chatbot/table-augmented-generation/app/requirements.txt \
+     -r topic/chatbot/table-augmented-generation/workshop/requirements-dev.txt
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
      - name: Install utilities
        run: |
          pip install -r requirements.txt \
              -r topic/chatbot/table-augmented-generation/app/requirements.txt \
              -r topic/chatbot/table-augmented-generation/workshop/requirements-dev.txt

wierdvanderhaar

Looks good!

amotl force-pushed the chatbot-workshop-tests branch from ddedda2 to ceedf8d Compare April 25, 2025 12:14

coderabbitai bot reviewed Apr 25, 2025

View reviewed changes

amotl requested a review from wierdvanderhaar April 25, 2025 12:28

amotl marked this pull request as ready for review April 25, 2025 12:28

amotl requested review from kneth and surister April 25, 2025 12:51

Chatbot: Add software tests for DIS2025 workshop notebook

8718e98

amotl force-pushed the chatbot-workshop-tests branch from ceedf8d to 8718e98 Compare April 25, 2025 12:56

wierdvanderhaar approved these changes Apr 25, 2025

View reviewed changes

amotl merged commit 67b0f5a into main Apr 25, 2025
3 checks passed

amotl deleted the chatbot-workshop-tests branch April 25, 2025 13:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chatbot: Add software tests for workshop notebook #920

Chatbot: Add software tests for workshop notebook #920

Uh oh!

amotl commented Apr 25, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 25, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

wierdvanderhaar left a comment

Uh oh!

Uh oh!

Uh oh!

Chatbot: Add software tests for workshop notebook #920

Chatbot: Add software tests for workshop notebook #920

Uh oh!

Conversation

amotl commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

wierdvanderhaar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

amotl commented Apr 25, 2025 •

edited

Loading

coderabbitai bot commented Apr 25, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)