-
Notifications
You must be signed in to change notification settings - Fork 9
Chatbot: Add software tests for workshop notebook #920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces automated testing infrastructure and supporting files for a chatbot-related project. A new GitHub Actions workflow is added to automate validation and testing, including scheduled and manual triggers, with a matrix covering multiple Python and CrateDB versions. Pytest configuration and supporting fixtures are implemented to enable robust notebook-based testing, including database table resets for test isolation. Additional test dependencies are specified, and a test function is added to execute Jupyter Notebooks as test cases. Changes
Sequence Diagram(s)sequenceDiagram
participant GitHub
participant Workflow
participant Runner
participant CrateDB
participant Pytest
GitHub->>Workflow: Trigger (push, PR, schedule, manual)
Workflow->>Runner: Start job (matrix: Python/CrateDB)
Runner->>CrateDB: Start service container
Runner->>Runner: Set up Python, install dependencies
Runner->>Pytest: Run tests (ngr test)
Pytest->>Runner: Load conftest.py, pyproject.toml
Pytest->>CrateDB: Reset tables before each test
Pytest->>Runner: Discover and execute notebook tests
Pytest->>CrateDB: Interact with database as needed
Pytest->>Runner: Report results
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
ddedda2
to
ceedf8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 5
🔭 Outside diff range comments (1)
topic/chatbot/table-augmented-generation/workshop/requirements-dev.txt (1)
1-2
:⚠️ Potential issueMissing required dependency for testing infrastructure.
The tests depend on SQLAlchemy as shown in the conftest.py file, but it's not included in the development requirements. This is causing the pipeline failure with
ModuleNotFoundError: No module named 'sqlalchemy'
.Add SQLAlchemy to the requirements:
pueblo[notebook,testing]>=0.0.10 +sqlalchemy
🧹 Nitpick comments (4)
topic/chatbot/table-augmented-generation/workshop/pyproject.toml (1)
7-8
: Consider making the connection string configurable with a default.The hardcoded connection string assumes a specific database configuration which may not be available in all environments. This could make the tests brittle.
Consider modifying to allow for configuration override:
env = [ - "CRATEDB_CONNECTION_STRING=crate://crate@localhost/?schema=notebook", + "CRATEDB_CONNECTION_STRING=${CRATEDB_CONNECTION_STRING:-crate://crate@localhost/?schema=notebook}", "PYDEVD_DISABLE_FILE_VALIDATION=1", ].github/workflows/ml-chatbot.yml (3)
3-15
: Refine path filters: remove leading slash in file patterns.
GitHub Actions treats paths as relative to the repository root. Leading slashes can be misleading and may not behave as expected. For consistency, update:- - '/requirements.txt' + - 'requirements.txt'
23-27
: Consider improving concurrency grouping for pull requests.
Using${{ github.ref }}
groups runs by Git reference, which for PRs becomesrefs/pull/.../merge
. To group runs by the source branch (and avoid conflating PR runs), you may want:concurrency: cancel-in-progress: true group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
45-53
: Add a health check for the CrateDB service.
Without a health check, the job may attempt to connect before CrateDB is ready. You can leverage Docker health options:services: cratedb: image: crate/crate:${{ matrix.cratedb-version }} ports: - 4200:4200 - 5432:5432 options: >- --health-cmd "curl -sSf http://localhost:4200/_status || exit 1" --health-interval 10s --health-retries 30 env: CRATE_HEAP_SIZE: 4g
🛑 Comments failed to post (5)
topic/chatbot/table-augmented-generation/workshop/pyproject.toml (1)
14-16: 💡 Verification agent
🧩 Analysis chain
Potentially restrictive test path configuration.
The current testpaths setting only includes Python files in the current directory. This works for the current setup but may be too restrictive as the test suite grows.
🏁 Script executed:
#!/bin/bash # Check if there are any test files in subdirectories that would be missed find . -type f -name "test_*.py" | grep -v "^\./test_.*\.py$" || echo "No test files in subdirectories found"Length of output: 384
Fix test discovery in pyproject.toml
Your current
testpaths
only picks up top-level*.py
files, but we have tests living in subdirectories. You should either remove thetestpaths
setting (so pytest’s default discovery kicks in) or explicitly include your subdirs.• File:
topic/chatbot/table-augmented-generation/workshop/pyproject.toml
Lines: 14–16Suggested change (choose one):
Option A: Let pytest auto-discover all tests
-[tool.pytest.ini_options] -testpaths = [ - "*.py", -] +[tool.pytest.ini_options] +# Remove `testpaths` to allow pytest’s default discoveryOption B: Explicitly include your test directories
-[tool.pytest.ini_options] -testpaths = [ - "*.py", -] +[tool.pytest.ini_options] +testpaths = [ + "application/cratedb-toolkit", + "by-dataframe/pandas", + "testing", +]topic/chatbot/table-augmented-generation/workshop/conftest.py (2)
23-26: 🛠️ Refactor suggestion
Add error handling for missing connection string.
The code assumes CRATEDB_CONNECTION_STRING environment variable is always set, but doesn't handle the case when it's missing. If the variable is not set, this will cause a cryptic error.
- connection_string = os.environ.get("CRATEDB_CONNECTION_STRING") + connection_string = os.environ.get("CRATEDB_CONNECTION_STRING") + if not connection_string: + pytest.skip("CRATEDB_CONNECTION_STRING environment variable not set")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.connection_string = os.environ.get("CRATEDB_CONNECTION_STRING") if not connection_string: pytest.skip("CRATEDB_CONNECTION_STRING environment variable not set") engine = sa.create_engine(connection_string, echo=os.environ.get("DEBUG")) connection = engine.connect()
28-34: 🛠️ Refactor suggestion
Implement safer table resetting with schema qualification.
The current DROP TABLE statements do not include schema qualification, which could potentially affect tables with the same name in other schemas. Additionally, there's no transaction management or connection closing.
reset_tables = [ "machine_manuals", "motor_readings", ] - for table in reset_tables: - connection.execute(sa.text(f"DROP TABLE IF EXISTS {table};")) + try: + # Extract schema from connection string + schema = "notebook" # Default schema from connection string + for table in reset_tables: + connection.execute(sa.text(f"DROP TABLE IF EXISTS {schema}.{table};")) + finally: + connection.close() + engine.dispose()Committable suggestion skipped: line range outside the PR's diff.
.github/workflows/ml-chatbot.yml (2)
30-33: 🛠️ Refactor suggestion
Fix job
name
multiline quoting.
The current multi-line string with leading quotes is invalid YAML and can cause a parse error. Instead, use a block scalar or a single-line string. For example:name: | Python: ${{ matrix.python-version }} CrateDB: ${{ matrix.cratedb-version }} OS: ${{ matrix.os }}
73-76: 🛠️ Refactor suggestion
Install workshop dependencies alongside root requirements.
Currently only the rootrequirements.txt
is installed, but notebook tests depend on additional packages intopic/chatbot/.../app/requirements.txt
and.../workshop/requirements-dev.txt
. For example:- pip install -r requirements.txt + pip install -r requirements.txt \ + -r topic/chatbot/table-augmented-generation/app/requirements.txt \ + -r topic/chatbot/table-augmented-generation/workshop/requirements-dev.txt📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.- name: Install utilities run: | pip install -r requirements.txt \ -r topic/chatbot/table-augmented-generation/app/requirements.txt \ -r topic/chatbot/table-augmented-generation/workshop/requirements-dev.txt
ceedf8d
to
8718e98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Accompanying GH-912, GH-916, and GH-919.