LLM Surving Mutant Anlaysis (#4)

jungs1 · web-flow · commit effff962521f · 2024-07-15T16:27:24.000-04:00
* feat: add LLM surving mutant anlaysis report

* feat: update version
diff --git a/README.md b/README.md
@@ -13,14 +13,12 @@
 
 ## Table of Contents
 
-- [Overview](#overview)
 - [Features](#features)
+- [Recommended Mutation Testing Process](#recommended-mutation-testing-process)
 - [Getting Started](#getting-started)
+- [LLM Survivng Mutant Analysis Report](#mutant-report)
+- [Examples](#examples)
 - [CI/CD Integration](#cicd-integration)
-- [Roadmap](#roadmap)
-- [Cash Bounty Program](#cash-bounty-program)
-
-## Overview
 
 Mutahunter uses LLM models to inject context-aware faults into your codebase. This AI-driven approach produces fewer equivalent mutants, mutants with higher fault detection potential, and those with higher coupling and semantic similarity to real faults, ensuring comprehensive and effective testing.
 
@@ -30,12 +28,14 @@ Mutahunter uses LLM models to inject context-aware faults into your codebase. Th
 - **LLM Context-aware Mutations:** Utilizes LLM models to generate context-aware mutants. [Research](https://arxiv.org/abs/2406.09843) indicates that LLM-generated mutants have higher fault detection potential, fewer equivalent mutants, and higher coupling and semantic similarity to real faults. It uses a map of your entire git repository to generate contextually relevant mutants using [aider's repomap](https://aider.chat/docs/repomap.html). Supports self-hosted LLMs, Anthropic, OpenAI, and any LLM models via [LiteLLM](https://github.com/BerriAI/litellm).
 - **Change-Based Testing:** Runs mutation tests on modified files and lines based on the latest commit or pull request changes, ensuring that only relevant parts of the code are tested.
 - **Language Agnostic:** Compatible with languages that provide coverage reports in Cobertura XML, Jacoco XML, and lcov formats. Extensible to additional languages and testing frameworks.
-- **Detailed Mutant Reports:** Provides comprehensive reports on mutation coverage, killed mutants, and survived mutants.
+- **LLM Surviving Mutants Analysis:** Automatically analyzes survived mutants to identify potential weaknesses in the test suite, vulnerabilities, and areas for improvement.
 
 ## Recommended Mutation Testing Process
 
 ![Workflow](/images/diagram.svg)
 
+We recommend running Mutahunter per test file. This approach ensures that the mutation testing is focused on the test suite's effectiveness and efficiency. Here are some best practices to follow:
+
 1. **Achieve High Line Coverage:** Ensure your test suite has high line coverage, preferably 100%.
 
 2. **Strict Mutation Testing:** Use strict mutation testing during development to improve mutation coverage during development without additional cost. Utilize the `--only-mutate-file-paths` flag for targeted testing on critical files.
@@ -97,6 +97,11 @@ Check the logs directory to view the report:
 
 - `mutants.json` - Contains the list of mutants generated.
 - `coverage.txt` - Contains information about mutation coverage.
+- `audit.md` - Contains the analysis of survived mutants
+
+### Survivng Mutant Analysis Audit Report
+
+![Report](/images/audit.png)
 
 ## CI/CD Integration
 
@@ -154,26 +159,6 @@ jobs:
           filePath: logs/_latest/coverage.txt
 ```
 
-## Roadmap
-
-- [x] **Fault Injection:** Utilize advanced LLM models to inject context-aware faults into the codebase, ensuring comprehensive mutation testing.
-- [x] **Language Support:** Expand support to include various programming languages.
-- [x] **Support for Other Coverage Report Formats:** Add compatibility for various coverage report formats.
-- [x] **Change-Based Testing:** Implement mutation testing on modified files based on the latest commit or pull request changes.
-- [x] **Extreme Mutation Testing:** Apply mutations to the codebase without using LLMs to detect pseudo-tested methods with significantly lower computational cost.
-- [x] **CI/CD Integration:** Display mutation coverage in pull requests and automate mutation testing using GitHub Actions.
-- [ ] **Mutant Analysis:** Automatically analyze survived mutants to identify potential weaknesses in the test suite.
-
 ## Cash Bounty Program
 
 Help us improve Mutahunter and get rewarded! We have a cash bounty program to incentivize contributions to the project. Check out the [bounty board](https://docs.google.com/spreadsheets/d/1cT2_O55m5txrUgZV81g1gtqE_ZDu9LlzgbpNa_HIisc/edit?gid=0#gid=0) to see the available bounties and claim one today!
-
-## Acknowledgements
-
-Mutahunter makes use of the following open-source libraries:
-
-- [aider](https://github.com/paul-gauthier/aider) by Paul Gauthier, licensed under the Apache-2.0 license.
-- [TreeSitter](https://github.com/tree-sitter/tree-sitter) by TreeSitter, MIT License.
-- [LiteLLM](https://github.com/BerriAI/litellm) by BerriAI, MIT License.
-
-For more details, please refer to the LICENSE file in the repository.
diff --git a/images/audit.png b/images/audit.png
diff --git a/pyproject.toml b/pyproject.toml
@@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta"
 name = 'mutahunter'
 description = "LLM Mutation Testing for any programming language"
 requires-python = ">= 3.11"
-version = "1.0.0"
+version = "1.1.0"
 dependencies = [
     "tree-sitter==0.21.3",
     'tree_sitter_languages==1.10.2',
diff --git a/src/mutahunter/core/mutator.py b/src/mutahunter/core/mutator.py
@@ -4,6 +4,7 @@
 import time
 from typing import Any, Dict, List
 
+from jinja2 import Template
 from tqdm import tqdm
 
 from mutahunter.core.analyzer import Analyzer
@@ -13,6 +14,7 @@
 from mutahunter.core.error_parser import extract_error_message
 from mutahunter.core.llm_mutation_engine import LLMMutationEngine
 from mutahunter.core.logger import logger
+from mutahunter.core.prompts.user import MUTANT_ANALYSIS
 from mutahunter.core.report import MutantReport
 from mutahunter.core.router import LLMRouter
 from mutahunter.core.runner import TestRunner
@@ -149,6 +151,9 @@ def run(self) -> None:
                 total_cost=self.router.total_cost,
                 line_rate=self.coverage_processor.line_coverage_rate,
             )
+            if not self.config.extreme:
+                self.run_mutant_analysis()
+
             self.logger.info(
                 f"Mutation Testing Ended. Took {round(time.time() - start)}s"
             )
@@ -456,3 +461,35 @@ def get_modified_lines(self, file_path: str) -> List[int]:
         except subprocess.CalledProcessError as e:
             self.logger.error(f"Error identifying modified lines in {file_path}: {e}")
             return []
+
+    def run_mutant_analysis(self) -> None:
+        """
+        Runs mutant analysis on the generated mutants.
+        """
+        survived_mutants = [m for m in self.mutants if m.status == "SURVIVED"]
+
+        source_file_paths = []
+        for mutant in survived_mutants:
+            if mutant.source_path not in source_file_paths:
+                source_file_paths.append(mutant.source_path)
+
+        src_code_list = []
+        for file_path in source_file_paths:
+            with open(file_path, "r", encoding="utf-8") as f:
+                src_code = f.read()
+            src_code_list.append(
+                f"## Source File: {file_path}\n```{filename_to_lang(file_path)}\n{src_code}\n```"
+            )
+
+        prompt = {
+            "system": "",
+            "user": Template(MUTANT_ANALYSIS).render(
+                source_code="\n".join(src_code_list),
+                surviving_mutants=survived_mutants,
+            ),
+        }
+        mode_response, _, _ = self.router.generate_response(
+            prompt=prompt, streaming=False
+        )
+        with open("logs/_latest/mutant_analysis.md", "w") as f:
+            f.write(mode_response)
diff --git a/src/mutahunter/core/prompts/user.py b/src/mutahunter/core/prompts/user.py
@@ -43,3 +43,45 @@ class Mutants(BaseModel):
 
 Generate 1~{{maximum_num_of_mutants_per_function_block}} mutants for the function block provided to you. Ensure that the mutants are semantically different from the original code. Focus on critical areas such as error handling, boundary conditions, and logical branches.
 """
+
+
+MUTANT_ANALYSIS = """
+## Related Source Code:
+```
+{{source_code}}
+```
+
+## Surviving Mutants:
+```json
+{{surviving_mutants}}
+```
+
+Based on the mutation testing results only showing the surviving mutants. Please analyze the following aspects:
+- Vulnerable Code Areas: Identification of critical areas in the code that are most vulnerable based on the surviving mutants.
+- Test Case Gaps: Analysis of specific reasons why the existing test cases failed to detect these mutants.
+- Improvement Recommendations: Suggestions for new or improved test cases to effectively target and eliminate the surviving mutants.
+
+## Example Output:
+======
+### Vulnerable Code Areas
+**File:** `src/main/java/com/example/BankAccount.java`
+**Location:** Line 45
+**Description:** The method `withdraw` does not properly handle negative inputs, leading to potential incorrect account balances.
+
+### Test Case Gaps
+**File:** `src/test/java/com/example/BankAccountTest.java`
+**Location:** Test method `testWithdraw`
+**Reason:** Existing test cases do not cover edge cases such as negative withdrawals or withdrawals greater than the account balance.
+
+### Improvement Recommendations
+**New Test Cases Needed:**
+1. **Test Method:** `testWithdrawNegativeAmount`
+   - **Description:** Add a test case to check behavior when attempting to withdraw a negative amount.
+2. **Test Method:** `testWithdrawExceedingBalance`
+   - **Description:** Add a test case to ensure proper handling when withdrawing an amount greater than the current account balance.
+3. **Test Method:** `testWithdrawBoundaryConditions`
+   - **Description:** Add boundary condition tests for withdrawal amounts exactly at zero and exactly equal to the account balance.
+====
+
+To reduce cognitive load, focus on quality over quantity to ensure that the mutants analysis are meaningful and provide valuable insights into the code quality and test coverage. Output your analysis in a clear and concise manner, highlighting the key points for each aspect with less than 300 words.
+"""