Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughActivate a Tree-sitter–based C analyzer (CAnalyzer), enable discovery of .c/.h sources, and convert C analyzer tests to unit-level AST checks. The analyzer extracts functions, structs, includes, and basic symbols, and integrates with the existing analysis second-pass flow using NullLanguageServer placeholders. Changes
sequenceDiagram
participant Tester
participant CAnalyzer
participant Parser as Tree-sitter Parser
participant Filesystem
participant LSP as LanguageServer
Tester->>CAnalyzer: instantiate & analyze(file)
CAnalyzer->>Filesystem: read .c/.h file(s)
Filesystem-->>CAnalyzer: file contents
CAnalyzer->>Parser: parse contents
Parser-->>CAnalyzer: AST
rect rgba(100,150,200,0.5)
Note over CAnalyzer,Parser: Single-pass AST extraction
CAnalyzer->>CAnalyzer: extract entities (functions, structs), docstrings, include paths
CAnalyzer->>CAnalyzer: populate symbols (call, parameters, return_type)
end
CAnalyzer->>LSP: use NullLanguageServer or real LSP for resolution
LSP-->>CAnalyzer: resolution responses (or noop)
CAnalyzer-->>Tester: entities & symbols returned
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds C #include directive handling intended to create INCLUDES edges between files in the graph, along with a unit test update to validate the relationship.
Changes:
- Replaced the (previously commented) C analyzer implementation with a concrete analyzer that attempts to parse C AST and create graph nodes/edges.
- Added
process_include_directiveand wired it into the C analyzer “first pass”. - Updated
tests/test_c_analyzer.pyto assertINCLUDESedges for an included header.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
api/analyzers/c/analyzer.py |
Introduces a C analyzer implementation that attempts to create File/Function/Struct nodes and INCLUDES edges from #include. |
tests/test_c_analyzer.py |
Adds assertions for INCLUDES neighbors and updates the test to call an analyze(...) method. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
api/analyzers/c/analyzer.py
Outdated
| def process_function_definition_node(self, node: Node, path: Path, | ||
| source_code: str) -> Optional[Function]: | ||
| """ | ||
| Processes a function definition node to extract function details. | ||
|
|
||
| Args: | ||
| node (Node): The AST node representing a function definition. | ||
| path (Path): The file path where the function is defined. | ||
|
|
||
| Returns: | ||
| Optional[Function]: A Function object containing details about the function, or None if the function name cannot be determined. | ||
| """ | ||
|
|
||
| # Extract function name | ||
| res = find_child_of_type(node, 'function_declarator') | ||
| if res is None: | ||
| return None | ||
|
|
||
| function_declarator = res[0] | ||
|
|
||
| res = find_child_of_type(function_declarator, 'identifier') | ||
| if res is None: | ||
| return None | ||
|
|
||
| identifier = res[0] | ||
| function_name = identifier.text.decode('utf-8') | ||
| logger.info(f"Function declaration: {function_name}") | ||
|
|
||
| # Extract function return type | ||
| res = find_child_of_type(node, 'primitive_type') | ||
| ret_type = 'Unknown' | ||
| if res is not None: | ||
| ret_type = res[0] | ||
| ret_type = ret_type.text.decode('utf-8') | ||
|
|
||
| # Extract function parameters | ||
| args = [] | ||
| res = find_child_of_type(function_declarator, 'parameter_list') | ||
| if res is not None: | ||
| parameters = res[0] | ||
|
|
||
| # Extract arguments and their types | ||
| for child in parameters.children: | ||
| if child.type == 'parameter_declaration': | ||
| arg = self.process_parameter_declaration(child) | ||
| args.append(arg) | ||
|
|
||
| # Extract function definition line numbers | ||
| start_line = node.start_point[0] | ||
| end_line = node.end_point[0] | ||
|
|
||
| # Create Function object | ||
| docs = '' | ||
| src = source_code[node.start_byte:node.end_byte] | ||
| f = Function(str(path), function_name, docs, ret_type, src, start_line, end_line) | ||
|
|
There was a problem hiding this comment.
Fixed in f4bbb1f. The rewrite removed all references to Function/Struct types. The analyzer now uses the generic Entity class consistent with other analyzers.
api/analyzers/c/analyzer.py
Outdated
| # Create file entity | ||
| file = File(os.path.dirname(path), path.name, path.suffix) | ||
| graph.add_file(file) | ||
|
|
||
| # Parse file | ||
| source_code = f.read() | ||
| tree = self.parser.parse(source_code) |
There was a problem hiding this comment.
Fixed in f4bbb1f. CAnalyzer no longer creates File objects — file creation is handled by SourceAnalyzer.first_pass() using File(path, tree).
tests/test_c_analyzer.py
Outdated
| # Test for include_directive edge creation | ||
| included_file = g.get_file('', 'myheader.h', '.h') | ||
| self.assertIsNotNone(included_file) | ||
|
|
||
| includes = g.get_neighbors([f.id], rel='INCLUDES') | ||
| self.assertEqual(len(includes), 3) | ||
| included_files = [node['properties']['name'] for node in includes['nodes']] | ||
| self.assertIn('myheader.h', included_files) |
There was a problem hiding this comment.
Fixed in f4bbb1f. Tests were completely rewritten as standalone unit tests that don't use Graph or FalkorDB. Include extraction is tested via get_include_paths() which returns a simple list of strings.
api/analyzers/c/analyzer.py
Outdated
| if entity is not None: | ||
| # Add Function object to the graph | ||
| try: | ||
| graph.add_function(entity) |
There was a problem hiding this comment.
Fixed in f4bbb1f. The rewrite removed all add_function and add_struct calls. Entity creation is now handled by SourceAnalyzer using graph.add_entity() + connect_entities('DEFINES', ...), consistent with other analyzers.
api/analyzers/c/analyzer.py
Outdated
| # Create missing function | ||
| # Assuming this is a call to a native function e.g. 'printf' | ||
| callee_f = Function('/', callee_name, None, None, None, 0, 0) | ||
| graph.add_function(callee_f) |
There was a problem hiding this comment.
Fixed in f4bbb1f. All custom process_* methods and fallback Function creation were removed in the rewrite.
api/analyzers/c/analyzer.py
Outdated
| source_code = f.read() | ||
| tree = self.parser.parse(source_code) | ||
| try: | ||
| source_code = source_code.decode('utf-8') | ||
| except Exception as e: | ||
| logger.error(f"Failed decoding source code: {e}") | ||
| source_code = '' | ||
|
|
There was a problem hiding this comment.
Fixed in f4bbb1f. The rewrite delegates file parsing to SourceAnalyzer.first_pass() which uses file_path.read_bytes() → analyzer.parser.parse(source_code), consistent with all other analyzers.
api/analyzers/c/analyzer.py
Outdated
| if len(splitted) < 2: | ||
| logger.warning("Include path has no extension: %s", included_file_path) | ||
| return | ||
|
|
||
| # Create file entity for the included file | ||
| path = os.path.dirname(normalized_path) | ||
| name = os.path.basename(normalized_path) | ||
| ext = splitted[1] | ||
| included_file = File(path, name, ext) |
There was a problem hiding this comment.
Fixed in f4bbb1f. The old process_include_directive with os.path.splitext logic was removed. Include extraction is now handled by get_include_paths() using tree-sitter queries via self._captures().
api/analyzers/c/analyzer.py
Outdated
| # Create file entity for the included file | ||
| path = os.path.dirname(normalized_path) | ||
| name = os.path.basename(normalized_path) | ||
| ext = splitted[1] | ||
| included_file = File(path, name, ext) | ||
| graph.add_file(included_file) | ||
|
|
||
| # Connect the parent file to the included file | ||
| graph.connect_entities('INCLUDES', parent.id, included_file.id) |
There was a problem hiding this comment.
Fixed in f4bbb1f. The rewrite no longer creates File objects for included files. get_include_paths() returns raw path strings from #include directives.
tests/test_c_analyzer.py
Outdated
|
|
||
| g = Graph("c") | ||
| analyzer.analyze_local_folder(path, g) | ||
| analyzer.analyze(path, g) |
There was a problem hiding this comment.
Fixed in f4bbb1f. Tests were rewritten as standalone unit tests using CAnalyzer directly. No SourceAnalyzer or Graph dependency needed.
api/analyzers/c/analyzer.py
Outdated
| def __init__(self) -> None: | ||
| self.parser = Parser(C_LANGUAGE) | ||
|
|
There was a problem hiding this comment.
Fixed in f4bbb1f. CAnalyzer now calls super().__init__(Language(tsc.language())) and implements all required abstract methods: get_entity_label, get_entity_name, get_entity_docstring, get_entity_types, add_symbols, is_dependency, resolve_path, resolve_type, resolve_method, resolve_symbol, add_dependencies.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
api/analyzers/c/analyzer.py (3)
19-20:⚠️ Potential issue | 🟠 MajorCall the base initializer.
Skipping
AbstractAnalyzer.__init__leavesself.languageunset and bypasses the shared analyzer setup.♻️ Proposed fix
class CAnalyzer(AbstractAnalyzer): def __init__(self) -> None: - self.parser = Parser(C_LANGUAGE) + super().__init__(C_LANGUAGE)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/c/analyzer.py` around lines 19 - 20, The C analyzer's __init__ is not calling the base class initializer, so shared setup (like self.language) is skipped; update the C analyzer __init__ to call AbstractAnalyzer.__init__ (or super().__init__()) before setting self.parser = Parser(C_LANGUAGE), or explicitly set self.language = C_LANGUAGE if the base init requires parameters—ensure you invoke the base initializer (super().__init__()) in the __init__ method where Parser(C_LANGUAGE) is currently assigned.
394-405:⚠️ Potential issue | 🔴 CriticalParse first, then construct
File(path, tree)from bytes.
Fileis created before the tree exists, andfis typed asio.TextIOWrapper, sof.read()returnsstr. Line 402 then always falls into the exception path becausestr.decode()does not exist, which wipes every extracted function body.♻️ Proposed fix
- # Create file entity - file = File(os.path.dirname(path), path.name, path.suffix) - graph.add_file(file) - # Parse file - source_code = f.read() - tree = self.parser.parse(source_code) - try: - source_code = source_code.decode('utf-8') - except Exception as e: - logger.error(f"Failed decoding source code: {e}") - source_code = '' + source_bytes = f.buffer.read() + tree = self.parser.parse(source_bytes) + source_code = source_bytes.decode('utf-8', errors='replace') + + # Create file entity + file = File(path, tree) + graph.add_file(file)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/c/analyzer.py` around lines 394 - 405, The code creates a File object before parsing and incorrectly calls str.decode(), which always raises for io.TextIOWrapper reads and clears source_code; fix by first reading and parsing the source (use self.parser.parse on the string returned by f.read()), then construct and add the File entity (File(...)) afterwards using the original path and parsed tree as needed, and remove the decode() call — handle bytes only if f is opened in binary mode (decode then), otherwise treat f.read() as str and pass it directly to parser.parse and downstream logic (refer to File, graph.add_file, self.parser.parse, and source_code variables).
3-5:⚠️ Potential issue | 🟠 MajorReplace the wildcard imports.
This file now trips Ruff
F403/F405, and the*imports hide the actual symbols the analyzer depends on.♻️ Proposed fix
-from ..utils import * +from ..utils import find_child_of_type from pathlib import Path -from ...entities import * +from ...entities import File, Function, Struct from ...graph import Graph🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/c/analyzer.py` around lines 3 - 5, Replace the wildcard imports in analyzer.py: remove "from ..utils import *" and "from ...entities import *" and instead import only the specific functions/classes the module uses (inspect usages of symbols in analyzer.py such as any utility helpers or entity classes referenced) and list them explicitly (e.g., from ..utils import func_a, helper_b; from ...entities import EntityX, EntityY). Ensure any renamed or aliased imports match existing references in functions like the analyzer class or helper functions, and update any __all__ or re-exports if present.
🧹 Nitpick comments (1)
tests/test_c_analyzer.py (1)
63-70: Please add the new coverage in pytest style.This extends a
unittest.TestCasesuite, but backend tests undertests/are expected to use pytest.As per coding guidelines, "Backend tests should use pytest and be organized in tests/ directory".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_c_analyzer.py` around lines 63 - 70, Convert the unittest-style assertions in tests/test_c_analyzer.py into a pytest-style test function: create a standalone function (e.g., test_include_directive_edge_creation) that uses the existing fixture/object g, replace self.assertIsNotNone(included_file) with assert included_file is not None, replace self.assertEqual(len(includes), 3) with assert len(includes) == 3, and replace self.assertIn('myheader.h', included_files) with assert 'myheader.h' in included_files; remove any unittest.TestCase class wrapper and self usage so the test runs under pytest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 341-357: The code creates a synthetic File node from the raw
include text (using os.path.dirname(normalized_path)) instead of resolving the
include relative to the including file; change the logic to first resolve the
include against the including file's directory (e.g. resolved =
Path(parent_of_including_file) / normalized_path), then check the graph for an
existing File entity for that resolved path and reuse it (lookup via whatever
graph method exists, e.g. graph.get_file_by_path/find_file), and only create a
new File and call graph.add_file(File(...)) if no real entity is found; update
references to use included_file = existing_or_new_file so INCLUDES point to the
actual analyzed header rather than a top-level synthetic node.
- Around line 408-419: first_pass() and second_pass() incorrectly call
query.captures() on a Query (which doesn't exist) and treat the result as a
tuple; replace those calls with the existing helper or a QueryCursor: use
self._captures(tree, query) (the inherited helper that returns a dict keyed by
capture name) when invoking C_LANGUAGE.query("(function_definition) `@function`")
and iterate captures['function'] to call process_function_definition(file, node,
path, graph, source_code); in second_pass() also remove any indexing like [0] on
capture results and instead access the dict entries returned by _captures() or
use a QueryCursor created from the Query and call
cursor.captures(tree.root_node) before processing nodes.
In `@tests/test_c_analyzer.py`:
- Line 24: The test calls analyzer.analyze(path, g) but SourceAnalyzer does not
implement analyze(); either add a public analyze(self, path, g) wrapper on
SourceAnalyzer that delegates to the appropriate existing method (for example
call analyze_local_folder(path, g) or analyze_sources(path, g) based on input)
or update the test to call an existing method such as analyze_local_folder(path,
g); additionally ensure the analyzers dispatch map enables CAnalyzer by
uncommenting the ".c" and ".h" entries in the analyzers dict so C files are
actually handled when the wrapper or chosen method delegates to CAnalyzer.
---
Duplicate comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 19-20: The C analyzer's __init__ is not calling the base class
initializer, so shared setup (like self.language) is skipped; update the C
analyzer __init__ to call AbstractAnalyzer.__init__ (or super().__init__())
before setting self.parser = Parser(C_LANGUAGE), or explicitly set self.language
= C_LANGUAGE if the base init requires parameters—ensure you invoke the base
initializer (super().__init__()) in the __init__ method where Parser(C_LANGUAGE)
is currently assigned.
- Around line 394-405: The code creates a File object before parsing and
incorrectly calls str.decode(), which always raises for io.TextIOWrapper reads
and clears source_code; fix by first reading and parsing the source (use
self.parser.parse on the string returned by f.read()), then construct and add
the File entity (File(...)) afterwards using the original path and parsed tree
as needed, and remove the decode() call — handle bytes only if f is opened in
binary mode (decode then), otherwise treat f.read() as str and pass it directly
to parser.parse and downstream logic (refer to File, graph.add_file,
self.parser.parse, and source_code variables).
- Around line 3-5: Replace the wildcard imports in analyzer.py: remove "from
..utils import *" and "from ...entities import *" and instead import only the
specific functions/classes the module uses (inspect usages of symbols in
analyzer.py such as any utility helpers or entity classes referenced) and list
them explicitly (e.g., from ..utils import func_a, helper_b; from ...entities
import EntityX, EntityY). Ensure any renamed or aliased imports match existing
references in functions like the analyzer class or helper functions, and update
any __all__ or re-exports if present.
---
Nitpick comments:
In `@tests/test_c_analyzer.py`:
- Around line 63-70: Convert the unittest-style assertions in
tests/test_c_analyzer.py into a pytest-style test function: create a standalone
function (e.g., test_include_directive_edge_creation) that uses the existing
fixture/object g, replace self.assertIsNotNone(included_file) with assert
included_file is not None, replace self.assertEqual(len(includes), 3) with
assert len(includes) == 3, and replace self.assertIn('myheader.h',
included_files) with assert 'myheader.h' in included_files; remove any
unittest.TestCase class wrapper and self usage so the test runs under pytest.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: db7fbc7c-059d-4357-be05-71bacb011725
📒 Files selected for processing (2)
api/analyzers/c/analyzer.pytests/test_c_analyzer.py
996c299 to
b8cdeb2
Compare
gkorland
left a comment
There was a problem hiding this comment.
This PR has been substantially rewritten to address all 17 review comments. The original implementation bypassed the AbstractAnalyzer interface entirely (used non-existent Function/Struct types, missing Graph.add_function/add_struct methods, wrong File constructor, text/bytes confusion, etc.).
What changed:
-
Complete rewrite of
CAnalyzer: Now properly extendsAbstractAnalyzerwithsuper().__init__()and implements all required abstract methods (get_entity_label,get_entity_name,get_entity_docstring,get_entity_types,add_symbols,is_dependency,resolve_path,resolve_type,resolve_method,resolve_symbol,add_dependencies). -
Include directive support: Added
get_include_paths()helper method for extracting#includepaths from parsed C files using tree-sitter queries viaself._captures(). -
source_analyzer.pyintegration: Uncommented the C analyzer import, added.c/.hto the analyzers dict, addedNullLanguageServerfor C/H extensions in the second pass, and added.c/.hto the file globbing inanalyze_sources(). -
Test rewrite: Rewrote
test_c_analyzer.pyas standalone unit tests that don't require FalkorDB. Tests cover entity extraction (functions, structs), labels, call symbols, include extraction, andis_dependency. Added#includedirectives to the test fixturesrc.c. -
Removed dead code: Removed all the old custom
process_*methods, theutilsimport, andFunction/Structtype usage.
All 8 tests pass, lint is clean.
b8cdeb2 to
f4bbb1f
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
api/analyzers/c/analyzer.py (1)
108-119:⚠️ Potential issue | 🟠 MajorDon't erase whether an include was quoted or system.
strip('"<>')collapses#include "foo.h"and#include <foo.h>into the samefoo.htoken. That removes the information the graph layer needs to resolve quoted includes relative to the current file and to avoid binding system headers to an unrelated project header with the same basename. Return the normalized path together with the include kind, or keep the original node type.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/c/analyzer.py` around lines 108 - 119, The get_include_paths method currently strips quotes/angle brackets (via path_text = node.text.decode(...).strip('"<>')) which loses whether an include was quoted or system; change get_include_paths to preserve the include kind by returning both the normalized path and its kind (e.g., ("foo.h", "quote") or ("foo.h", "system")) or by returning the original capture node type alongside the path; adjust the captures handling in get_include_paths and consumers to build includes as tuples (path, kind) or objects instead of plain strings so quoted includes can be resolved relative to the current file and system includes remain distinct (refer to get_include_paths, captures, and path_text for where to implement).tests/test_c_analyzer.py (1)
82-85:⚠️ Potential issue | 🟠 MajorThe new tests still don't cover include relationships end to end.
This only checks raw extraction from
CAnalyzer. It will still pass whenSourceAnalyzernever creates any file-to-file include edge, so the PR’s main behavior remains untested. Please add one integration test that runs throughSourceAnalyzerand asserts the include relationship for this fixture.Based on learnings, "Analyzer orchestration should be centralized in api/analyzers/source_analyzer.py".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_c_analyzer.py` around lines 82 - 85, The test only asserts raw includes from CAnalyzer but not the end-to-end include edge creation; add an integration test that uses SourceAnalyzer (the orchestration entrypoint in api/analyzers/source_analyzer.py) to analyze the fixture and then assert that the resulting graph/analysis contains a file-to-file include relationship (e.g., that the analyzed source file has an include edge to "myheader.h" or "stdio.h"); locate the existing test function test_include_extraction and add a new test (or extend it) that instantiates SourceAnalyzer (or calls its analyze/run method) on the same fixture and inspects SourceAnalyzer's output structure (graph, edges, or returned model) to assert the include edge exists.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 30-35: get_entity_name currently raises ValueError for unnamed
struct_specifier which aborts SourceAnalyzer.create_hierarchy; instead update
get_entity_name to not throw: if node.type == 'struct_specifier' and name_node
is missing, return a stable synthesized anonymous name (e.g.
"<anon_struct_{node.start_byte}>" or similar deterministic token derived from
the node) or return None/empty and let callers skip unnamed structs—do not
raise; locate get_entity_name in analyzer.py and replace the raise with the
synthetic-name logic (or a clear sentinel) and ensure callers like
SourceAnalyzer.create_hierarchy handle that sentinel by skipping anonymous
structs.
In `@api/analyzers/source_analyzer.py`:
- Around line 27-28: The analyzer registers '.c' and '.h' but never materializes
include edges; add a dedicated post-registration non-LSP include pass in
api/analyzers/source_analyzer.py that runs after all File nodes are created
(outside the NullLanguageServer branch) and for each file-type handled by
CAnalyzer call CAnalyzer.get_include_paths(file_path) to resolve headers and
then create graph edges from the source File node to the resolved header File
node(s); ensure this pass is invoked centrally in the analyzer orchestration
(where File nodes are finalized) so the C/header include relationships are added
to the graph.
In `@tests/test_c_analyzer.py`:
- Around line 63-80: Each test currently only asserts inside the loop and
silently passes if the target entity is missing; update test_function_label,
test_struct_label, and test_call_symbols to first check that the target entity
exists and fail if not found (e.g., set a found flag while iterating
self.file.entities and after the loop assert found), then perform the existing
assertions using _entity_name(self.analyzer, entity),
analyzer.get_entity_label(entity.node), and entity.symbols to ensure "add",
"exp", and "main" are actually present before checking labels or call symbols.
---
Duplicate comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 108-119: The get_include_paths method currently strips
quotes/angle brackets (via path_text = node.text.decode(...).strip('"<>')) which
loses whether an include was quoted or system; change get_include_paths to
preserve the include kind by returning both the normalized path and its kind
(e.g., ("foo.h", "quote") or ("foo.h", "system")) or by returning the original
capture node type alongside the path; adjust the captures handling in
get_include_paths and consumers to build includes as tuples (path, kind) or
objects instead of plain strings so quoted includes can be resolved relative to
the current file and system includes remain distinct (refer to
get_include_paths, captures, and path_text for where to implement).
In `@tests/test_c_analyzer.py`:
- Around line 82-85: The test only asserts raw includes from CAnalyzer but not
the end-to-end include edge creation; add an integration test that uses
SourceAnalyzer (the orchestration entrypoint in
api/analyzers/source_analyzer.py) to analyze the fixture and then assert that
the resulting graph/analysis contains a file-to-file include relationship (e.g.,
that the analyzed source file has an include edge to "myheader.h" or "stdio.h");
locate the existing test function test_include_extraction and add a new test (or
extend it) that instantiates SourceAnalyzer (or calls its analyze/run method) on
the same fixture and inspects SourceAnalyzer's output structure (graph, edges,
or returned model) to assert the include edge exists.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8e5bad5a-7034-4f99-bece-6bb8cc5b586c
📒 Files selected for processing (5)
api/analyzers/c/__init__.pyapi/analyzers/c/analyzer.pyapi/analyzers/source_analyzer.pytests/source_files/c/src.ctests/test_c_analyzer.py
| def get_entity_name(self, node: Node) -> str: | ||
| if node.type == 'struct_specifier': | ||
| name_node = node.child_by_field_name('name') | ||
| if name_node: | ||
| return name_node.text.decode('utf-8') | ||
| raise ValueError("Struct has no name") |
There was a problem hiding this comment.
Anonymous structs will crash analysis of valid C code.
struct_specifier is legal without a name, but this branch raises ValueError in that case. Because SourceAnalyzer.create_hierarchy() does not catch it, a file containing struct { ... } value; will abort the whole analysis. Skip unnamed structs or synthesize a stable anonymous name instead of throwing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/analyzers/c/analyzer.py` around lines 30 - 35, get_entity_name currently
raises ValueError for unnamed struct_specifier which aborts
SourceAnalyzer.create_hierarchy; instead update get_entity_name to not throw: if
node.type == 'struct_specifier' and name_node is missing, return a stable
synthesized anonymous name (e.g. "<anon_struct_{node.start_byte}>" or similar
deterministic token derived from the node) or return None/empty and let callers
skip unnamed structs—do not raise; locate get_entity_name in analyzer.py and
replace the raise with the synthetic-name logic (or a clear sentinel) and ensure
callers like SourceAnalyzer.create_hierarchy handle that sentinel by skipping
anonymous structs.
| '.c': CAnalyzer(), | ||
| '.h': CAnalyzer(), |
There was a problem hiding this comment.
C includes are discovered, but never materialized into graph edges.
Registering .c/.h here is not enough for the feature: neither pass calls CAnalyzer.get_include_paths(), and the C/header branch is skipped behind NullLanguageServer, so the graph still ends up with no file-to-file include relationship at all. Add a dedicated non-LSP include pass after all File nodes are registered and connect each source file to the resolved header file there.
Based on learnings, "Analyzer orchestration should be centralized in api/analyzers/source_analyzer.py".
Also applies to: 155-158, 191-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/analyzers/source_analyzer.py` around lines 27 - 28, The analyzer
registers '.c' and '.h' but never materializes include edges; add a dedicated
post-registration non-LSP include pass in api/analyzers/source_analyzer.py that
runs after all File nodes are created (outside the NullLanguageServer branch)
and for each file-type handled by CAnalyzer call
CAnalyzer.get_include_paths(file_path) to resolve headers and then create graph
edges from the source File node to the resolved header File node(s); ensure this
pass is invoked centrally in the analyzer orchestration (where File nodes are
finalized) so the C/header include relationships are added to the graph.
| def test_function_label(self): | ||
| """Functions should get the 'Function' label.""" | ||
| for entity in self.file.entities.values(): | ||
| if _entity_name(self.analyzer, entity) == "add": | ||
| self.assertEqual(self.analyzer.get_entity_label(entity.node), "Function") | ||
|
|
||
| def test_struct_label(self): | ||
| """Structs should get the 'Struct' label.""" | ||
| for entity in self.file.entities.values(): | ||
| if _entity_name(self.analyzer, entity) == "exp": | ||
| self.assertEqual(self.analyzer.get_entity_label(entity.node), "Struct") | ||
|
|
||
| def test_call_symbols(self): | ||
| """Function 'main' should have call symbols (calls to 'add').""" | ||
| for entity in self.file.entities.values(): | ||
| if _entity_name(self.analyzer, entity) == "main": | ||
| call_syms = entity.symbols.get("call", []) | ||
| self.assertTrue(len(call_syms) > 0, "main should have call symbols") |
There was a problem hiding this comment.
These tests pass even if the target entity disappears.
test_function_label, test_struct_label, and test_call_symbols only assert inside the loop. If "add", "exp", or "main" stops being extracted, each test exits without any failure and still passes.
Suggested tightening
def test_function_label(self):
"""Functions should get the 'Function' label."""
for entity in self.file.entities.values():
if _entity_name(self.analyzer, entity) == "add":
self.assertEqual(self.analyzer.get_entity_label(entity.node), "Function")
+ return
+ self.fail("Function 'add' not found")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_c_analyzer.py` around lines 63 - 80, Each test currently only
asserts inside the loop and silently passes if the target entity is missing;
update test_function_label, test_struct_label, and test_call_symbols to first
check that the target entity exists and fail if not found (e.g., set a found
flag while iterating self.file.entities and after the loop assert found), then
perform the existing assertions using _entity_name(self.analyzer, entity),
analyzer.get_entity_label(entity.node), and entity.symbols to ensure "add",
"exp", and "main" are actually present before checking labels or call symbols.
Agent Review SummaryComments AddressedAll 17 original review comments (from github-code-quality, Copilot, and CodeRabbit) were addressed by the author in commit f4bbb1f with a complete CAnalyzer rewrite. Reviewers acknowledged the fixes. No unresolved, non-outdated review threads remain. Tests Added
Added in commit a5136d6. All 15 C analyzer tests pass. CI StatusAll checks passing ✅ (Build, Docker, CodeQL, Playwright shards 1 & 2, CodeRabbit) |
Migrated from FalkorDB/code-graph-backend PR #57. Original issue: FalkorDB/code-graph-backend#46 Resolves #544 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous implementation bypassed the AbstractAnalyzer interface entirely, using non-existent types (Function, Struct), missing methods (graph.add_function, graph.add_struct), wrong File constructor args, and text/bytes confusion. This rewrite: - Properly extends AbstractAnalyzer with super().__init__() - Implements all required abstract methods (get_entity_label, get_entity_name, get_entity_docstring, get_entity_types, add_symbols, etc.) - Uses self._captures() for correct QueryCursor-based tree-sitter queries - Adds get_include_paths() helper for #include directive extraction - Enables C/H support in source_analyzer.py (uncommented import, added to analyzers dict, LSP setup, and file globbing) - Rewrites test_c_analyzer.py as standalone unit tests (no DB required) - Adds #include directives to test fixture (src.c) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Covers get_entity_docstring (with and without comments), ValueError for unknown entity types in get_entity_label/get_entity_name, and resolve_path passthrough behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add test for empty include extraction and resolve_symbol unknown key error path, consistent with JavaScript analyzer test conventions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
97b05ef to
c0162e6
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
tests/test_c_analyzer.py (1)
63-80:⚠️ Potential issue | 🟡 MinorTests pass silently if target entity is missing.
test_function_label,test_struct_label, andtest_call_symbolsonly assert inside the loop. If the target entity ("add","exp", or"main") isn't extracted, these tests exit without failure. This was noted in a previous review.The
test_docstring_extractionandtest_no_docstringmethods correctly use thereturn+self.fail()pattern — apply the same here:💚 Proposed fix
def test_function_label(self): """Functions should get the 'Function' label.""" for entity in self.file.entities.values(): if _entity_name(self.analyzer, entity) == "add": self.assertEqual(self.analyzer.get_entity_label(entity.node), "Function") + return + self.fail("Function 'add' not found") def test_struct_label(self): """Structs should get the 'Struct' label.""" for entity in self.file.entities.values(): if _entity_name(self.analyzer, entity) == "exp": self.assertEqual(self.analyzer.get_entity_label(entity.node), "Struct") + return + self.fail("Struct 'exp' not found") def test_call_symbols(self): """Function 'main' should have call symbols (calls to 'add').""" for entity in self.file.entities.values(): if _entity_name(self.analyzer, entity) == "main": call_syms = entity.symbols.get("call", []) self.assertTrue(len(call_syms) > 0, "main should have call symbols") + return + self.fail("Function 'main' not found")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_c_analyzer.py` around lines 63 - 80, Each test currently only asserts inside the loop so it passes silently if the target entity isn't present; update test_function_label, test_struct_label, and test_call_symbols so that when you find the matching entity (using _entity_name(self.analyzer, entity) == "add" / "exp" / "main") you perform the existing assertion and immediately return, and after the loop call self.fail("... not found") with a clear message (e.g., "function 'add' not found", "struct 'exp' not found", "function 'main' not found"); keep using analyzer.get_entity_label(entity.node) and entity.symbols.get("call", []) for the assertions.api/analyzers/c/analyzer.py (1)
30-35:⚠️ Potential issue | 🟠 MajorAnonymous structs will crash the analysis pipeline.
This was flagged in a previous review and remains unaddressed. C allows anonymous structs (e.g.,
struct { int x; } value;), and the current code raisesValueErrorwhenname_nodeis missing. SinceSourceAnalyzer.create_hierarchy()doesn't catch this exception, files containing anonymous structs will abort the entire analysis.Consider returning a synthesized name or skipping unnamed structs:
🛡️ Proposed fix: synthesize a stable anonymous name
def get_entity_name(self, node: Node) -> str: if node.type == 'struct_specifier': name_node = node.child_by_field_name('name') if name_node: return name_node.text.decode('utf-8') - raise ValueError("Struct has no name") + # Anonymous struct - synthesize stable name from position + return f"<anon_struct_{node.start_byte}>" elif node.type == 'function_definition':🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/c/analyzer.py` around lines 30 - 35, The get_entity_name method currently raises ValueError for anonymous structs which crashes analysis; instead, detect missing name_node in get_entity_name (in analyzer.get_entity_name) and return a stable synthesized name (e.g., "anon_struct_<row>_<col>" using node.start_point or a deterministic counter/hash of node.start_byte) rather than raising; ensure the synthesized name is stable across runs so create_hierarchy and other callers can use it consistently and do not change exception behavior elsewhere.
🧹 Nitpick comments (1)
api/analyzers/c/analyzer.py (1)
100-106: Consider returning an empty list instead of raising for unknown keys.The
resolve_symbolmethod raisesValueErrorfor unknown keys, but the call site inSourceAnalyzer.second_pass()(context snippet 2) has no exception handling. Whileadd_symbolscurrently only adds the three known keys, a future change adding new symbol types would crash the entire pipeline.Returning an empty list is a safer fallback that matches how other analyzers handle unknown cases:
♻️ Defensive fallback
def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, key: str, symbol: Node) -> list[Entity]: if key in ["parameters", "return_type"]: return self.resolve_type(files, lsp, file_path, path, symbol) elif key in ["call"]: return self.resolve_method(files, lsp, file_path, path, symbol) else: - raise ValueError(f"Unknown key {key}") + logger.warning(f"Unknown symbol key '{key}' in C analyzer, skipping") + return []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/c/analyzer.py` around lines 100 - 106, The resolve_symbol method currently raises ValueError for unknown keys causing uncaught exceptions in callers like SourceAnalyzer.second_pass; change its fallback to return an empty list instead of raising so unknown keys are safely ignored. Specifically, in resolve_symbol (near resolve_type and resolve_method usages), replace the final raise ValueError(f"Unknown key {key}") with return [] so callers such as SourceAnalyzer.second_pass and any code that uses add_symbols won’t crash if new symbol kinds appear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 30-35: The get_entity_name method currently raises ValueError for
anonymous structs which crashes analysis; instead, detect missing name_node in
get_entity_name (in analyzer.get_entity_name) and return a stable synthesized
name (e.g., "anon_struct_<row>_<col>" using node.start_point or a deterministic
counter/hash of node.start_byte) rather than raising; ensure the synthesized
name is stable across runs so create_hierarchy and other callers can use it
consistently and do not change exception behavior elsewhere.
In `@tests/test_c_analyzer.py`:
- Around line 63-80: Each test currently only asserts inside the loop so it
passes silently if the target entity isn't present; update test_function_label,
test_struct_label, and test_call_symbols so that when you find the matching
entity (using _entity_name(self.analyzer, entity) == "add" / "exp" / "main") you
perform the existing assertion and immediately return, and after the loop call
self.fail("... not found") with a clear message (e.g., "function 'add' not
found", "struct 'exp' not found", "function 'main' not found"); keep using
analyzer.get_entity_label(entity.node) and entity.symbols.get("call", []) for
the assertions.
---
Nitpick comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 100-106: The resolve_symbol method currently raises ValueError for
unknown keys causing uncaught exceptions in callers like
SourceAnalyzer.second_pass; change its fallback to return an empty list instead
of raising so unknown keys are safely ignored. Specifically, in resolve_symbol
(near resolve_type and resolve_method usages), replace the final raise
ValueError(f"Unknown key {key}") with return [] so callers such as
SourceAnalyzer.second_pass and any code that uses add_symbols won’t crash if new
symbol kinds appear.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 641c4a68-4a6b-4e02-8ff0-7b486b29d144
📒 Files selected for processing (5)
api/analyzers/c/__init__.pyapi/analyzers/c/analyzer.pyapi/analyzers/source_analyzer.pytests/source_files/c/src.ctests/test_c_analyzer.py
🚧 Files skipped from review as they are similar to previous changes (1)
- api/analyzers/source_analyzer.py
Migrated from falkordb/code-graph-backend#57
Summary
Add support for processing
include_directivein C files, creating edges between files.Changes:
process_include_directivemethod to C analyzerResolves #544
Originally authored by @gkorland in falkordb/code-graph-backend#57
Summary by CodeRabbit
New Features
Behavior Change
Tests