Add support for include_directive in C by gkorland · Pull Request #596 · FalkorDB/code-graph

gkorland · 2026-03-10T09:01:33Z

Migrated from falkordb/code-graph-backend#57

Summary

Add support for processing include_directive in C files, creating edges between files.

Changes:

Added process_include_directive method to C analyzer
Modified first pass to process include directive nodes
Added test case for include directive relationship tracking

Resolves #544

Originally authored by @gkorland in falkordb/code-graph-backend#57

Summary by CodeRabbit

New Features
- Enabled C/.h analysis: detects functions and structs, extracts names/labels, parameters, return types, call sites, include directives, and adjacent comment docstrings.
Behavior Change
- Dependency/path resolution simplified: external dependency checks are disabled and paths are preserved as provided.
Tests
- Added focused unit tests validating entity discovery, include extraction, docstring handling, symbol resolution, and error cases.

vercel · 2026-03-10T09:01:36Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
code-graph	Error		Mar 10, 2026 9:01am

coderabbitai · 2026-03-10T09:01:57Z

📝 Walkthrough

Walkthrough

Activate a Tree-sitter–based C analyzer (CAnalyzer), enable discovery of .c/.h sources, and convert C analyzer tests to unit-level AST checks. The analyzer extracts functions, structs, includes, and basic symbols, and integrates with the existing analysis second-pass flow using NullLanguageServer placeholders.

Changes

Cohort / File(s)	Summary
C Analyzer Implementation `api/analyzers/c/analyzer.py`	New active `CAnalyzer` class using Tree-sitter Language: entity labeling/naming, docstring extraction from preceding `comment`, symbol population (`call`, `parameters`, `return_type`), simplified dependency handling, resolution helpers, and `get_include_paths`. Removed prior multi-pass call-graph helpers.
Analyzer Registration / Source discovery `api/analyzers/source_analyzer.py`	Registers `CAnalyzer` for `.c`/`.h`, adds `.c`/`.h` to discovery, and creates NullLanguageServer placeholders in second pass for C files while preserving skip behavior for NullLanguageServer.
Tests — C analyzer `tests/test_c_analyzer.py`	Rewrote test to exercise `CAnalyzer` with in-memory `File`/`Entity`: asserts entity types, names, labels, docstrings, include extraction, dependency check, `resolve_path`, symbol presence, and error cases for unknown keys. Removed Graph-based integration assertions.
Test fixtures `tests/source_files/c/src.c`	Sample C file updated to include `"myheader.h"` and `<stdio.h>` and a block comment above `add` for docstring testing.
Tests — housekeeping `tests/...`	Removed unused filesystem path construction and adjusted test imports/structure to unit-style assertions.

sequenceDiagram
    participant Tester
    participant CAnalyzer
    participant Parser as Tree-sitter Parser
    participant Filesystem
    participant LSP as LanguageServer

    Tester->>CAnalyzer: instantiate & analyze(file)
    CAnalyzer->>Filesystem: read .c/.h file(s)
    Filesystem-->>CAnalyzer: file contents
    CAnalyzer->>Parser: parse contents
    Parser-->>CAnalyzer: AST
    rect rgba(100,150,200,0.5)
    Note over CAnalyzer,Parser: Single-pass AST extraction
    CAnalyzer->>CAnalyzer: extract entities (functions, structs), docstrings, include paths
    CAnalyzer->>CAnalyzer: populate symbols (call, parameters, return_type)
    end
    CAnalyzer->>LSP: use NullLanguageServer or real LSP for resolution
    LSP-->>CAnalyzer: resolution responses (or noop)
    CAnalyzer-->>Tester: entities & symbols returned

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble the AST with whiskers bright,

I find the funcs and headers hidden in light.
A hop through comments, includes in a row,
Calls and types i’v gathered — watch them grow.
Hooray — the C-forest ready to show!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.88% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add support for include_directive in C' directly aligns with the PR's main objective of detecting include directives and creating file-to-file edges in the C analyzer.
Linked Issues check	✅ Passed	The PR implements include_directive detection via get_include_paths() method, which extracts `#include` targets from parsed trees, enabling downstream file-to-file relationship tracking as required by issue `#544`.
Out of Scope Changes check	✅ Passed	All changes are focused on C analyzer implementation, test updates, and enabling C/H support in SourceAnalyzer, directly supporting the include_directive objective without unrelated modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch backend/add-include-directive-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Adds C #include directive handling intended to create INCLUDES edges between files in the graph, along with a unit test update to validate the relationship.

Changes:

Replaced the (previously commented) C analyzer implementation with a concrete analyzer that attempts to parse C AST and create graph nodes/edges.
Added process_include_directive and wired it into the C analyzer “first pass”.
Updated tests/test_c_analyzer.py to assert INCLUDES edges for an included header.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.

File	Description
`api/analyzers/c/analyzer.py`	Introduces a C analyzer implementation that attempts to create File/Function/Struct nodes and `INCLUDES` edges from `#include`.
`tests/test_c_analyzer.py`	Adds assertions for `INCLUDES` neighbors and updates the test to call an `analyze(...)` method.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gkorland · 2026-03-22T16:35:55Z

+    def process_function_definition_node(self, node: Node, path: Path,
+                                         source_code: str) -> Optional[Function]:
+        """
+        Processes a function definition node to extract function details.
+
+        Args:
+            node (Node): The AST node representing a function definition.
+            path (Path): The file path where the function is defined.
+
+        Returns:
+            Optional[Function]: A Function object containing details about the function, or None if the function name cannot be determined.
+        """
+
+        # Extract function name
+        res = find_child_of_type(node, 'function_declarator')
+        if res is None:
+            return None
+
+        function_declarator = res[0]
+
+        res = find_child_of_type(function_declarator, 'identifier')
+        if res is None:
+            return None
+
+        identifier = res[0]
+        function_name = identifier.text.decode('utf-8')
+        logger.info(f"Function declaration: {function_name}")
+
+        # Extract function return type
+        res = find_child_of_type(node, 'primitive_type')
+        ret_type = 'Unknown'
+        if res is not None:
+            ret_type = res[0]
+            ret_type = ret_type.text.decode('utf-8')
+
+        # Extract function parameters
+        args = []
+        res = find_child_of_type(function_declarator, 'parameter_list')
+        if res is not None:
+            parameters = res[0]
+
+            # Extract arguments and their types
+            for child in parameters.children:
+                if child.type == 'parameter_declaration':
+                    arg = self.process_parameter_declaration(child)
+                    args.append(arg)
+
+        # Extract function definition line numbers
+        start_line = node.start_point[0]
+        end_line = node.end_point[0]
+
+        # Create Function object
+        docs = ''
+        src = source_code[node.start_byte:node.end_byte]
+        f = Function(str(path), function_name, docs, ret_type, src, start_line, end_line)
+


Fixed in f4bbb1f. The rewrite removed all references to Function/Struct types. The analyzer now uses the generic Entity class consistent with other analyzers.

gkorland · 2026-03-22T16:35:57Z

+        # Create file entity
+        file = File(os.path.dirname(path), path.name, path.suffix)
+        graph.add_file(file)
+
+        # Parse file
+        source_code = f.read()
+        tree = self.parser.parse(source_code)


Fixed in f4bbb1f. CAnalyzer no longer creates File objects — file creation is handled by SourceAnalyzer.first_pass() using File(path, tree).

gkorland · 2026-03-22T16:35:58Z

+        # Test for include_directive edge creation
+        included_file = g.get_file('', 'myheader.h', '.h')
+        self.assertIsNotNone(included_file)
+
+        includes = g.get_neighbors([f.id], rel='INCLUDES')
+        self.assertEqual(len(includes), 3)
+        included_files = [node['properties']['name'] for node in includes['nodes']]
+        self.assertIn('myheader.h', included_files)


Fixed in f4bbb1f. Tests were completely rewritten as standalone unit tests that don't use Graph or FalkorDB. Include extraction is tested via get_include_paths() which returns a simple list of strings.

gkorland · 2026-03-22T16:35:59Z

+        if entity is not None:
+            # Add Function object to the graph
+            try:
+                graph.add_function(entity)


Fixed in f4bbb1f. The rewrite removed all add_function and add_struct calls. Entity creation is now handled by SourceAnalyzer using graph.add_entity() + connect_entities('DEFINES', ...), consistent with other analyzers.

gkorland · 2026-03-22T16:36:00Z

+                    # Create missing function
+                    # Assuming this is a call to a native function e.g. 'printf'
+                    callee_f = Function('/', callee_name, None, None, None, 0, 0)
+                    graph.add_function(callee_f)


Fixed in f4bbb1f. All custom process_* methods and fallback Function creation were removed in the rewrite.

gkorland · 2026-03-22T16:36:03Z

+        source_code = f.read()
+        tree = self.parser.parse(source_code)
+        try:
+            source_code = source_code.decode('utf-8')
+        except Exception as e:
+            logger.error(f"Failed decoding source code: {e}")
+            source_code = ''
+


Fixed in f4bbb1f. The rewrite delegates file parsing to SourceAnalyzer.first_pass() which uses file_path.read_bytes() → analyzer.parser.parse(source_code), consistent with all other analyzers.

gkorland · 2026-03-22T16:36:04Z

+        if len(splitted) < 2:
+            logger.warning("Include path has no extension: %s", included_file_path)
+            return
+
+        # Create file entity for the included file
+        path = os.path.dirname(normalized_path)
+        name = os.path.basename(normalized_path)
+        ext = splitted[1]
+        included_file = File(path, name, ext)


Fixed in f4bbb1f. The old process_include_directive with os.path.splitext logic was removed. Include extraction is now handled by get_include_paths() using tree-sitter queries via self._captures().

gkorland · 2026-03-22T16:36:06Z

+        # Create file entity for the included file
+        path = os.path.dirname(normalized_path)
+        name = os.path.basename(normalized_path)
+        ext = splitted[1]
+        included_file = File(path, name, ext)
+        graph.add_file(included_file)
+
+        # Connect the parent file to the included file
+        graph.connect_entities('INCLUDES', parent.id, included_file.id)


Fixed in f4bbb1f. The rewrite no longer creates File objects for included files. get_include_paths() returns raw path strings from #include directives.

gkorland · 2026-03-22T16:36:07Z


        g = Graph("c")
-        analyzer.analyze_local_folder(path, g)
+        analyzer.analyze(path, g)


Fixed in f4bbb1f. Tests were rewritten as standalone unit tests using CAnalyzer directly. No SourceAnalyzer or Graph dependency needed.

gkorland · 2026-03-22T16:36:09Z

+    def __init__(self) -> None:
+        self.parser = Parser(C_LANGUAGE)
+


Fixed in f4bbb1f. CAnalyzer now calls super().__init__(Language(tsc.language())) and implements all required abstract methods: get_entity_label, get_entity_name, get_entity_docstring, get_entity_types, add_symbols, is_dependency, resolve_path, resolve_type, resolve_method, resolve_symbol, add_dependencies.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

api/analyzers/c/analyzer.py (3)

19-20: ⚠️ Potential issue | 🟠 Major

Call the base initializer.

Skipping AbstractAnalyzer.__init__ leaves self.language unset and bypasses the shared analyzer setup.

♻️ Proposed fix

 class CAnalyzer(AbstractAnalyzer):
     def __init__(self) -> None:
-        self.parser = Parser(C_LANGUAGE)
+        super().__init__(C_LANGUAGE)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 19 - 20, The C analyzer's __init__
is not calling the base class initializer, so shared setup (like self.language)
is skipped; update the C analyzer __init__ to call AbstractAnalyzer.__init__ (or
super().__init__()) before setting self.parser = Parser(C_LANGUAGE), or
explicitly set self.language = C_LANGUAGE if the base init requires
parameters—ensure you invoke the base initializer (super().__init__()) in the
__init__ method where Parser(C_LANGUAGE) is currently assigned.

394-405: ⚠️ Potential issue | 🔴 Critical

Parse first, then construct File(path, tree) from bytes.

File is created before the tree exists, and f is typed as io.TextIOWrapper, so f.read() returns str. Line 402 then always falls into the exception path because str.decode() does not exist, which wipes every extracted function body.

♻️ Proposed fix

-        # Create file entity
-        file = File(os.path.dirname(path), path.name, path.suffix)
-        graph.add_file(file)
-
         # Parse file
-        source_code = f.read()
-        tree = self.parser.parse(source_code)
-        try:
-            source_code = source_code.decode('utf-8')
-        except Exception as e:
-            logger.error(f"Failed decoding source code: {e}")
-            source_code = ''
+        source_bytes = f.buffer.read()
+        tree = self.parser.parse(source_bytes)
+        source_code = source_bytes.decode('utf-8', errors='replace')
+
+        # Create file entity
+        file = File(path, tree)
+        graph.add_file(file)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 394 - 405, The code creates a File
object before parsing and incorrectly calls str.decode(), which always raises
for io.TextIOWrapper reads and clears source_code; fix by first reading and
parsing the source (use self.parser.parse on the string returned by f.read()),
then construct and add the File entity (File(...)) afterwards using the original
path and parsed tree as needed, and remove the decode() call — handle bytes only
if f is opened in binary mode (decode then), otherwise treat f.read() as str and
pass it directly to parser.parse and downstream logic (refer to File,
graph.add_file, self.parser.parse, and source_code variables).

3-5: ⚠️ Potential issue | 🟠 Major

Replace the wildcard imports.

This file now trips Ruff F403/F405, and the * imports hide the actual symbols the analyzer depends on.

♻️ Proposed fix

-from ..utils import *
+from ..utils import find_child_of_type
 from pathlib import Path
-from ...entities import *
+from ...entities import File, Function, Struct
 from ...graph import Graph

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 3 - 5, Replace the wildcard imports
in analyzer.py: remove "from ..utils import *" and "from ...entities import *"
and instead import only the specific functions/classes the module uses (inspect
usages of symbols in analyzer.py such as any utility helpers or entity classes
referenced) and list them explicitly (e.g., from ..utils import func_a,
helper_b; from ...entities import EntityX, EntityY). Ensure any renamed or
aliased imports match existing references in functions like the analyzer class
or helper functions, and update any __all__ or re-exports if present.

🧹 Nitpick comments (1)

tests/test_c_analyzer.py (1)

63-70: Please add the new coverage in pytest style.

This extends a unittest.TestCase suite, but backend tests under tests/ are expected to use pytest.

As per coding guidelines, "Backend tests should use pytest and be organized in tests/ directory".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_c_analyzer.py` around lines 63 - 70, Convert the unittest-style
assertions in tests/test_c_analyzer.py into a pytest-style test function: create
a standalone function (e.g., test_include_directive_edge_creation) that uses the
existing fixture/object g, replace self.assertIsNotNone(included_file) with
assert included_file is not None, replace self.assertEqual(len(includes), 3)
with assert len(includes) == 3, and replace self.assertIn('myheader.h',
included_files) with assert 'myheader.h' in included_files; remove any
unittest.TestCase class wrapper and self usage so the test runs under pytest.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 341-357: The code creates a synthetic File node from the raw
include text (using os.path.dirname(normalized_path)) instead of resolving the
include relative to the including file; change the logic to first resolve the
include against the including file's directory (e.g. resolved =
Path(parent_of_including_file) / normalized_path), then check the graph for an
existing File entity for that resolved path and reuse it (lookup via whatever
graph method exists, e.g. graph.get_file_by_path/find_file), and only create a
new File and call graph.add_file(File(...)) if no real entity is found; update
references to use included_file = existing_or_new_file so INCLUDES point to the
actual analyzed header rather than a top-level synthetic node.
- Around line 408-419: first_pass() and second_pass() incorrectly call
query.captures() on a Query (which doesn't exist) and treat the result as a
tuple; replace those calls with the existing helper or a QueryCursor: use
self._captures(tree, query) (the inherited helper that returns a dict keyed by
capture name) when invoking C_LANGUAGE.query("(function_definition) `@function`")
and iterate captures['function'] to call process_function_definition(file, node,
path, graph, source_code); in second_pass() also remove any indexing like [0] on
capture results and instead access the dict entries returned by _captures() or
use a QueryCursor created from the Query and call
cursor.captures(tree.root_node) before processing nodes.

In `@tests/test_c_analyzer.py`:
- Line 24: The test calls analyzer.analyze(path, g) but SourceAnalyzer does not
implement analyze(); either add a public analyze(self, path, g) wrapper on
SourceAnalyzer that delegates to the appropriate existing method (for example
call analyze_local_folder(path, g) or analyze_sources(path, g) based on input)
or update the test to call an existing method such as analyze_local_folder(path,
g); additionally ensure the analyzers dispatch map enables CAnalyzer by
uncommenting the ".c" and ".h" entries in the analyzers dict so C files are
actually handled when the wrapper or chosen method delegates to CAnalyzer.

---

Duplicate comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 19-20: The C analyzer's __init__ is not calling the base class
initializer, so shared setup (like self.language) is skipped; update the C
analyzer __init__ to call AbstractAnalyzer.__init__ (or super().__init__())
before setting self.parser = Parser(C_LANGUAGE), or explicitly set self.language
= C_LANGUAGE if the base init requires parameters—ensure you invoke the base
initializer (super().__init__()) in the __init__ method where Parser(C_LANGUAGE)
is currently assigned.
- Around line 394-405: The code creates a File object before parsing and
incorrectly calls str.decode(), which always raises for io.TextIOWrapper reads
and clears source_code; fix by first reading and parsing the source (use
self.parser.parse on the string returned by f.read()), then construct and add
the File entity (File(...)) afterwards using the original path and parsed tree
as needed, and remove the decode() call — handle bytes only if f is opened in
binary mode (decode then), otherwise treat f.read() as str and pass it directly
to parser.parse and downstream logic (refer to File, graph.add_file,
self.parser.parse, and source_code variables).
- Around line 3-5: Replace the wildcard imports in analyzer.py: remove "from
..utils import *" and "from ...entities import *" and instead import only the
specific functions/classes the module uses (inspect usages of symbols in
analyzer.py such as any utility helpers or entity classes referenced) and list
them explicitly (e.g., from ..utils import func_a, helper_b; from ...entities
import EntityX, EntityY). Ensure any renamed or aliased imports match existing
references in functions like the analyzer class or helper functions, and update
any __all__ or re-exports if present.

---

Nitpick comments:
In `@tests/test_c_analyzer.py`:
- Around line 63-70: Convert the unittest-style assertions in
tests/test_c_analyzer.py into a pytest-style test function: create a standalone
function (e.g., test_include_directive_edge_creation) that uses the existing
fixture/object g, replace self.assertIsNotNone(included_file) with assert
included_file is not None, replace self.assertEqual(len(includes), 3) with
assert len(includes) == 3, and replace self.assertIn('myheader.h',
included_files) with assert 'myheader.h' in included_files; remove any
unittest.TestCase class wrapper and self usage so the test runs under pytest.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db7fbc7c-059d-4357-be05-71bacb011725

📥 Commits

Reviewing files that changed from the base of the PR and between 887b82f and 996c299.

📒 Files selected for processing (2)

api/analyzers/c/analyzer.py
tests/test_c_analyzer.py

gkorland

This PR has been substantially rewritten to address all 17 review comments. The original implementation bypassed the AbstractAnalyzer interface entirely (used non-existent Function/Struct types, missing Graph.add_function/add_struct methods, wrong File constructor, text/bytes confusion, etc.).

What changed:

Complete rewrite of CAnalyzer: Now properly extends AbstractAnalyzer with super().__init__() and implements all required abstract methods (get_entity_label, get_entity_name, get_entity_docstring, get_entity_types, add_symbols, is_dependency, resolve_path, resolve_type, resolve_method, resolve_symbol, add_dependencies).
Include directive support: Added get_include_paths() helper method for extracting #include paths from parsed C files using tree-sitter queries via self._captures().
source_analyzer.py integration: Uncommented the C analyzer import, added .c/.h to the analyzers dict, added NullLanguageServer for C/H extensions in the second pass, and added .c/.h to the file globbing in analyze_sources().
Test rewrite: Rewrote test_c_analyzer.py as standalone unit tests that don't require FalkorDB. Tests cover entity extraction (functions, structs), labels, call symbols, include extraction, and is_dependency. Added #include directives to the test fixture src.c.
Removed dead code: Removed all the old custom process_* methods, the utils import, and Function/Struct type usage.

All 8 tests pass, lint is clean.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

api/analyzers/c/analyzer.py (1)
108-119: ⚠️ Potential issue | 🟠 Major

Don't erase whether an include was quoted or system.

strip('"<>') collapses #include "foo.h" and #include <foo.h> into the same foo.h token. That removes the information the graph layer needs to resolve quoted includes relative to the current file and to avoid binding system headers to an unrelated project header with the same basename. Return the normalized path together with the include kind, or keep the original node type.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 108 - 119, The get_include_paths
method currently strips quotes/angle brackets (via path_text =
node.text.decode(...).strip('"<>')) which loses whether an include was quoted or
system; change get_include_paths to preserve the include kind by returning both
the normalized path and its kind (e.g., ("foo.h", "quote") or ("foo.h",
"system")) or by returning the original capture node type alongside the path;
adjust the captures handling in get_include_paths and consumers to build
includes as tuples (path, kind) or objects instead of plain strings so quoted
includes can be resolved relative to the current file and system includes remain
distinct (refer to get_include_paths, captures, and path_text for where to
implement).
tests/test_c_analyzer.py (1)
82-85: ⚠️ Potential issue | 🟠 Major

The new tests still don't cover include relationships end to end.

This only checks raw extraction from CAnalyzer. It will still pass when SourceAnalyzer never creates any file-to-file include edge, so the PR’s main behavior remains untested. Please add one integration test that runs through SourceAnalyzer and asserts the include relationship for this fixture.

Based on learnings, "Analyzer orchestration should be centralized in api/analyzers/source_analyzer.py".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_c_analyzer.py` around lines 82 - 85, The test only asserts raw
includes from CAnalyzer but not the end-to-end include edge creation; add an
integration test that uses SourceAnalyzer (the orchestration entrypoint in
api/analyzers/source_analyzer.py) to analyze the fixture and then assert that
the resulting graph/analysis contains a file-to-file include relationship (e.g.,
that the analyzed source file has an include edge to "myheader.h" or "stdio.h");
locate the existing test function test_include_extraction and add a new test (or
extend it) that instantiates SourceAnalyzer (or calls its analyze/run method) on
the same fixture and inspects SourceAnalyzer's output structure (graph, edges,
or returned model) to assert the include edge exists.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 30-35: get_entity_name currently raises ValueError for unnamed
struct_specifier which aborts SourceAnalyzer.create_hierarchy; instead update
get_entity_name to not throw: if node.type == 'struct_specifier' and name_node
is missing, return a stable synthesized anonymous name (e.g.
"<anon_struct_{node.start_byte}>" or similar deterministic token derived from
the node) or return None/empty and let callers skip unnamed structs—do not
raise; locate get_entity_name in analyzer.py and replace the raise with the
synthetic-name logic (or a clear sentinel) and ensure callers like
SourceAnalyzer.create_hierarchy handle that sentinel by skipping anonymous
structs.

In `@api/analyzers/source_analyzer.py`:
- Around line 27-28: The analyzer registers '.c' and '.h' but never materializes
include edges; add a dedicated post-registration non-LSP include pass in
api/analyzers/source_analyzer.py that runs after all File nodes are created
(outside the NullLanguageServer branch) and for each file-type handled by
CAnalyzer call CAnalyzer.get_include_paths(file_path) to resolve headers and
then create graph edges from the source File node to the resolved header File
node(s); ensure this pass is invoked centrally in the analyzer orchestration
(where File nodes are finalized) so the C/header include relationships are added
to the graph.

In `@tests/test_c_analyzer.py`:
- Around line 63-80: Each test currently only asserts inside the loop and
silently passes if the target entity is missing; update test_function_label,
test_struct_label, and test_call_symbols to first check that the target entity
exists and fail if not found (e.g., set a found flag while iterating
self.file.entities and after the loop assert found), then perform the existing
assertions using _entity_name(self.analyzer, entity),
analyzer.get_entity_label(entity.node), and entity.symbols to ensure "add",
"exp", and "main" are actually present before checking labels or call symbols.

---

Duplicate comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 108-119: The get_include_paths method currently strips
quotes/angle brackets (via path_text = node.text.decode(...).strip('"<>')) which
loses whether an include was quoted or system; change get_include_paths to
preserve the include kind by returning both the normalized path and its kind
(e.g., ("foo.h", "quote") or ("foo.h", "system")) or by returning the original
capture node type alongside the path; adjust the captures handling in
get_include_paths and consumers to build includes as tuples (path, kind) or
objects instead of plain strings so quoted includes can be resolved relative to
the current file and system includes remain distinct (refer to
get_include_paths, captures, and path_text for where to implement).

In `@tests/test_c_analyzer.py`:
- Around line 82-85: The test only asserts raw includes from CAnalyzer but not
the end-to-end include edge creation; add an integration test that uses
SourceAnalyzer (the orchestration entrypoint in
api/analyzers/source_analyzer.py) to analyze the fixture and then assert that
the resulting graph/analysis contains a file-to-file include relationship (e.g.,
that the analyzed source file has an include edge to "myheader.h" or "stdio.h");
locate the existing test function test_include_extraction and add a new test (or
extend it) that instantiates SourceAnalyzer (or calls its analyze/run method) on
the same fixture and inspects SourceAnalyzer's output structure (graph, edges,
or returned model) to assert the include edge exists.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e5bad5a-7034-4f99-bece-6bb8cc5b586c

📥 Commits

Reviewing files that changed from the base of the PR and between 996c299 and a5136d6.

📒 Files selected for processing (5)

api/analyzers/c/__init__.py
api/analyzers/c/analyzer.py
api/analyzers/source_analyzer.py
tests/source_files/c/src.c
tests/test_c_analyzer.py

coderabbitai · 2026-03-22T21:20:09Z

+    def get_entity_name(self, node: Node) -> str:
+        if node.type == 'struct_specifier':
+            name_node = node.child_by_field_name('name')
+            if name_node:
+                return name_node.text.decode('utf-8')
+            raise ValueError("Struct has no name")


⚠️ Potential issue | 🟠 Major

Anonymous structs will crash analysis of valid C code.

struct_specifier is legal without a name, but this branch raises ValueError in that case. Because SourceAnalyzer.create_hierarchy() does not catch it, a file containing struct { ... } value; will abort the whole analysis. Skip unnamed structs or synthesize a stable anonymous name instead of throwing.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/analyzers/c/analyzer.py` around lines 30 - 35, get_entity_name currently raises ValueError for unnamed struct_specifier which aborts SourceAnalyzer.create_hierarchy; instead update get_entity_name to not throw: if node.type == 'struct_specifier' and name_node is missing, return a stable synthesized anonymous name (e.g. "<anon_struct_{node.start_byte}>" or similar deterministic token derived from the node) or return None/empty and let callers skip unnamed structs—do not raise; locate get_entity_name in analyzer.py and replace the raise with the synthetic-name logic (or a clear sentinel) and ensure callers like SourceAnalyzer.create_hierarchy handle that sentinel by skipping anonymous structs.

coderabbitai · 2026-03-22T21:20:09Z

+    '.c': CAnalyzer(),
+    '.h': CAnalyzer(),


⚠️ Potential issue | 🔴 Critical

C includes are discovered, but never materialized into graph edges.

Registering .c/.h here is not enough for the feature: neither pass calls CAnalyzer.get_include_paths(), and the C/header branch is skipped behind NullLanguageServer, so the graph still ends up with no file-to-file include relationship at all. Add a dedicated non-LSP include pass after all File nodes are registered and connect each source file to the resolved header file there.

Based on learnings, "Analyzer orchestration should be centralized in api/analyzers/source_analyzer.py".

Also applies to: 155-158, 191-191

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/analyzers/source_analyzer.py` around lines 27 - 28, The analyzer registers '.c' and '.h' but never materializes include edges; add a dedicated post-registration non-LSP include pass in api/analyzers/source_analyzer.py that runs after all File nodes are created (outside the NullLanguageServer branch) and for each file-type handled by CAnalyzer call CAnalyzer.get_include_paths(file_path) to resolve headers and then create graph edges from the source File node to the resolved header File node(s); ensure this pass is invoked centrally in the analyzer orchestration (where File nodes are finalized) so the C/header include relationships are added to the graph.

coderabbitai · 2026-03-22T21:20:09Z

+    def test_function_label(self):
+        """Functions should get the 'Function' label."""
+        for entity in self.file.entities.values():
+            if _entity_name(self.analyzer, entity) == "add":
+                self.assertEqual(self.analyzer.get_entity_label(entity.node), "Function")
+
+    def test_struct_label(self):
+        """Structs should get the 'Struct' label."""
+        for entity in self.file.entities.values():
+            if _entity_name(self.analyzer, entity) == "exp":
+                self.assertEqual(self.analyzer.get_entity_label(entity.node), "Struct")
+
+    def test_call_symbols(self):
+        """Function 'main' should have call symbols (calls to 'add')."""
+        for entity in self.file.entities.values():
+            if _entity_name(self.analyzer, entity) == "main":
+                call_syms = entity.symbols.get("call", [])
+                self.assertTrue(len(call_syms) > 0, "main should have call symbols")


⚠️ Potential issue | 🟠 Major

These tests pass even if the target entity disappears.

test_function_label, test_struct_label, and test_call_symbols only assert inside the loop. If "add", "exp", or "main" stops being extracted, each test exits without any failure and still passes.

Suggested tightening

def test_function_label(self): """Functions should get the 'Function' label.""" for entity in self.file.entities.values(): if _entity_name(self.analyzer, entity) == "add": self.assertEqual(self.analyzer.get_entity_label(entity.node), "Function") + return + self.fail("Function 'add' not found")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_c_analyzer.py` around lines 63 - 80, Each test currently only asserts inside the loop and silently passes if the target entity is missing; update test_function_label, test_struct_label, and test_call_symbols to first check that the target entity exists and fail if not found (e.g., set a found flag while iterating self.file.entities and after the loop assert found), then perform the existing assertions using _entity_name(self.analyzer, entity), analyzer.get_entity_label(entity.node), and entity.symbols to ensure "add", "exp", and "main" are actually present before checking labels or call symbols.

gkorland · 2026-03-22T21:22:19Z

Agent Review Summary

Comments Addressed

All 17 original review comments (from github-code-quality, Copilot, and CodeRabbit) were addressed by the author in commit f4bbb1f with a complete CAnalyzer rewrite. Reviewers acknowledged the fixes.

No unresolved, non-outdated review threads remain.

Tests Added

test_include_extraction_empty — verifies get_include_paths() returns [] for files without #include directives
test_resolve_symbol_unknown_key_raises — verifies error path for unknown symbol keys (follows JS analyzer convention)

Added in commit a5136d6. All 15 C analyzer tests pass.

CI Status

All checks passing ✅ (Build, Docker, CodeQL, Playwright shards 1 & 2, CodeRabbit)

Migrated from FalkorDB/code-graph-backend PR #57. Original issue: FalkorDB/code-graph-backend#46 Resolves #544 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The previous implementation bypassed the AbstractAnalyzer interface entirely, using non-existent types (Function, Struct), missing methods (graph.add_function, graph.add_struct), wrong File constructor args, and text/bytes confusion. This rewrite: - Properly extends AbstractAnalyzer with super().__init__() - Implements all required abstract methods (get_entity_label, get_entity_name, get_entity_docstring, get_entity_types, add_symbols, etc.) - Uses self._captures() for correct QueryCursor-based tree-sitter queries - Adds get_include_paths() helper for #include directive extraction - Enables C/H support in source_analyzer.py (uncommented import, added to analyzers dict, LSP setup, and file globbing) - Rewrites test_c_analyzer.py as standalone unit tests (no DB required) - Adds #include directives to test fixture (src.c) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Covers get_entity_docstring (with and without comments), ValueError for unknown entity types in get_entity_label/get_entity_name, and resolve_path passthrough behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add test for empty include extraction and resolve_symbol unknown key error path, consistent with JavaScript analyzer test conventions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai

♻️ Duplicate comments (2)

tests/test_c_analyzer.py (1)

63-80: ⚠️ Potential issue | 🟡 Minor

Tests pass silently if target entity is missing.

test_function_label, test_struct_label, and test_call_symbols only assert inside the loop. If the target entity ("add", "exp", or "main") isn't extracted, these tests exit without failure. This was noted in a previous review.

The test_docstring_extraction and test_no_docstring methods correctly use the return + self.fail() pattern — apply the same here:

💚 Proposed fix

     def test_function_label(self):
         """Functions should get the 'Function' label."""
         for entity in self.file.entities.values():
             if _entity_name(self.analyzer, entity) == "add":
                 self.assertEqual(self.analyzer.get_entity_label(entity.node), "Function")
+                return
+        self.fail("Function 'add' not found")

     def test_struct_label(self):
         """Structs should get the 'Struct' label."""
         for entity in self.file.entities.values():
             if _entity_name(self.analyzer, entity) == "exp":
                 self.assertEqual(self.analyzer.get_entity_label(entity.node), "Struct")
+                return
+        self.fail("Struct 'exp' not found")

     def test_call_symbols(self):
         """Function 'main' should have call symbols (calls to 'add')."""
         for entity in self.file.entities.values():
             if _entity_name(self.analyzer, entity) == "main":
                 call_syms = entity.symbols.get("call", [])
                 self.assertTrue(len(call_syms) > 0, "main should have call symbols")
+                return
+        self.fail("Function 'main' not found")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_c_analyzer.py` around lines 63 - 80, Each test currently only
asserts inside the loop so it passes silently if the target entity isn't
present; update test_function_label, test_struct_label, and test_call_symbols so
that when you find the matching entity (using _entity_name(self.analyzer,
entity) == "add" / "exp" / "main") you perform the existing assertion and
immediately return, and after the loop call self.fail("... not found") with a
clear message (e.g., "function 'add' not found", "struct 'exp' not found",
"function 'main' not found"); keep using analyzer.get_entity_label(entity.node)
and entity.symbols.get("call", []) for the assertions.

api/analyzers/c/analyzer.py (1)

30-35: ⚠️ Potential issue | 🟠 Major

Anonymous structs will crash the analysis pipeline.

This was flagged in a previous review and remains unaddressed. C allows anonymous structs (e.g., struct { int x; } value;), and the current code raises ValueError when name_node is missing. Since SourceAnalyzer.create_hierarchy() doesn't catch this exception, files containing anonymous structs will abort the entire analysis.

Consider returning a synthesized name or skipping unnamed structs:
🛡️ Proposed fix: synthesize a stable anonymous name
     def get_entity_name(self, node: Node) -> str:
         if node.type == 'struct_specifier':
             name_node = node.child_by_field_name('name')
             if name_node:
                 return name_node.text.decode('utf-8')
-            raise ValueError("Struct has no name")
+            # Anonymous struct - synthesize stable name from position
+            return f"<anon_struct_{node.start_byte}>"
         elif node.type == 'function_definition':
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 30 - 35, The get_entity_name method
currently raises ValueError for anonymous structs which crashes analysis;
instead, detect missing name_node in get_entity_name (in
analyzer.get_entity_name) and return a stable synthesized name (e.g.,
"anon_struct_<row>_<col>" using node.start_point or a deterministic counter/hash
of node.start_byte) rather than raising; ensure the synthesized name is stable
across runs so create_hierarchy and other callers can use it consistently and do
not change exception behavior elsewhere.

🧹 Nitpick comments (1)

api/analyzers/c/analyzer.py (1)

100-106: Consider returning an empty list instead of raising for unknown keys.

The resolve_symbol method raises ValueError for unknown keys, but the call site in SourceAnalyzer.second_pass() (context snippet 2) has no exception handling. While add_symbols currently only adds the three known keys, a future change adding new symbol types would crash the entire pipeline.

Returning an empty list is a safer fallback that matches how other analyzers handle unknown cases:

♻️ Defensive fallback

     def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, key: str, symbol: Node) -> list[Entity]:
         if key in ["parameters", "return_type"]:
             return self.resolve_type(files, lsp, file_path, path, symbol)
         elif key in ["call"]:
             return self.resolve_method(files, lsp, file_path, path, symbol)
         else:
-            raise ValueError(f"Unknown key {key}")
+            logger.warning(f"Unknown symbol key '{key}' in C analyzer, skipping")
+            return []

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 100 - 106, The resolve_symbol
method currently raises ValueError for unknown keys causing uncaught exceptions
in callers like SourceAnalyzer.second_pass; change its fallback to return an
empty list instead of raising so unknown keys are safely ignored. Specifically,
in resolve_symbol (near resolve_type and resolve_method usages), replace the
final raise ValueError(f"Unknown key {key}") with return [] so callers such as
SourceAnalyzer.second_pass and any code that uses add_symbols won’t crash if new
symbol kinds appear.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 30-35: The get_entity_name method currently raises ValueError for
anonymous structs which crashes analysis; instead, detect missing name_node in
get_entity_name (in analyzer.get_entity_name) and return a stable synthesized
name (e.g., "anon_struct_<row>_<col>" using node.start_point or a deterministic
counter/hash of node.start_byte) rather than raising; ensure the synthesized
name is stable across runs so create_hierarchy and other callers can use it
consistently and do not change exception behavior elsewhere.

In `@tests/test_c_analyzer.py`:
- Around line 63-80: Each test currently only asserts inside the loop so it
passes silently if the target entity isn't present; update test_function_label,
test_struct_label, and test_call_symbols so that when you find the matching
entity (using _entity_name(self.analyzer, entity) == "add" / "exp" / "main") you
perform the existing assertion and immediately return, and after the loop call
self.fail("... not found") with a clear message (e.g., "function 'add' not
found", "struct 'exp' not found", "function 'main' not found"); keep using
analyzer.get_entity_label(entity.node) and entity.symbols.get("call", []) for
the assertions.

---

Nitpick comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 100-106: The resolve_symbol method currently raises ValueError for
unknown keys causing uncaught exceptions in callers like
SourceAnalyzer.second_pass; change its fallback to return an empty list instead
of raising so unknown keys are safely ignored. Specifically, in resolve_symbol
(near resolve_type and resolve_method usages), replace the final raise
ValueError(f"Unknown key {key}") with return [] so callers such as
SourceAnalyzer.second_pass and any code that uses add_symbols won’t crash if new
symbol kinds appear.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 641c4a68-4a6b-4e02-8ff0-7b486b29d144

📥 Commits

Reviewing files that changed from the base of the PR and between a5136d6 and c0162e6.

📒 Files selected for processing (5)

api/analyzers/c/__init__.py
api/analyzers/c/analyzer.py
api/analyzers/source_analyzer.py
tests/source_files/c/src.c
tests/test_c_analyzer.py

🚧 Files skipped from review as they are similar to previous changes (1)

api/analyzers/source_analyzer.py

vercel Bot had a problem deploying to Preview March 10, 2026 09:01 Failure

github-code-quality Bot found potential problems Mar 10, 2026

View reviewed changes

Comment thread api/analyzers/c/analyzer.py Fixed

Comment thread api/analyzers/c/analyzer.py Fixed

Comment thread api/analyzers/c/analyzer.py Fixed

gkorland requested a review from Copilot March 18, 2026 07:52

Copilot started reviewing on behalf of gkorland March 18, 2026 07:52 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread api/analyzers/c/analyzer.py Outdated

Comment thread api/analyzers/c/analyzer.py Outdated

Comment thread tests/test_c_analyzer.py Outdated

gkorland force-pushed the backend/add-include-directive-support branch from 996c299 to b8cdeb2 Compare March 21, 2026 11:29

gkorland commented Mar 21, 2026

View reviewed changes

gkorland force-pushed the backend/add-include-directive-support branch from b8cdeb2 to f4bbb1f Compare March 22, 2026 16:31

coderabbitai Bot reviewed Mar 22, 2026

View reviewed changes

gkorland and others added 4 commits March 23, 2026 16:13

Add support for include_directive in C

0bfe094

Migrated from FalkorDB/code-graph-backend PR #57. Original issue: FalkorDB/code-graph-backend#46 Resolves #544 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: add edge-case coverage for C analyzer

c0162e6

Add test for empty include extraction and resolve_symbol unknown key error path, consistent with JavaScript analyzer test conventions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

gkorland force-pushed the backend/add-include-directive-support branch from 97b05ef to c0162e6 Compare March 23, 2026 14:14

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

Conversation

gkorland commented Mar 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes:

Summary by CodeRabbit

Uh oh!

vercel Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gkorland left a comment

Choose a reason for hiding this comment

What changed:

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gkorland commented Mar 22, 2026

Agent Review Summary

Comments Addressed

Tests Added

CI Status

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

gkorland commented Mar 10, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Mar 10, 2026 •

edited

Loading

coderabbitai Bot commented Mar 10, 2026 •

edited

Loading