Skip to content

Commit 7c03b52

Browse files
shivasuryaclaude
andauthored
fix(python/sast): Fix instance method return type propagation for OOP patterns (#484)
* docs(claude): Update CLAUDE.md to remove deprecated query command references - Remove outdated 'pathfinder query' CLI command (removed pre-v1.0.0) - Replace Query Language section with Python DSL Rules documentation - Update architecture diagram to reflect current MCP server + call graph design - Update Core Packages to include callgraph/resolution modules - Remove pagination references (feature no longer exists) - Update test examples to reference actual test files - Add current CLI commands: serve, scan, ci, resolution-report, diagnose * fix(python/sast): Fix instance method return type propagation for OOP patterns **Problem:** Instance method calls like obj.method() were failing with 80% of Python OOP code being unresolved. Two critical bugs were found: 1. UpdateVariableBindingsWithFunctionReturns() couldn't distinguish between instance.method() and module.function() patterns 2. Return type extraction didn't track class context, so method return types were indexed without class-qualified FQNs **Example of failing code:** ```python manager = UserManager() user = manager.create_user("bob") # Creates placeholder "call:manager.create_user" user.is_valid() # FAILED - couldn't resolve ``` **Root Causes:** Bug 1 (inference.go:136-139): - When funcName contained dots (e.g., "manager.create_user"), code assumed it was a module path and used it as-is - But "manager.create_user" is receiver.method, NOT module.function - Should have checked if "manager" is a variable, extracted its type, and built class-qualified FQN "main.UserManager.create_user" Bug 2 (return_type.go:56-68): - Return type extraction only tracked function_definition nodes, not class_definition - Methods inside classes got FQNs like "main.create_user" instead of "main.UserManager.create_user" - Return type lookup failed because keys didn't match **Fixes:** 1. Enhanced UpdateVariableBindingsWithFunctionReturns() to: - Split dotted funcNames into receiver + method - Check if receiver is a variable in current scope - If yes, use receiver's type to build class-qualified FQN - If no, assume it's a module path (backward compatible) 2. Enhanced return type extraction to: - Track class_definition nodes and build class context - Build class-qualified FQNs for methods (module.ClassName.methodName) - Added extractClassNameFromNode() helper function **Impact:** - Resolution rate improved from 80% → 93.3% (+13.3%) - Type inference usage increased from 33.3% → 42.9% of resolved calls - Instance method calls now work: obj.method() → Class.method ✅ - Chained calls now work: user.get_profile().update() ✅ - Module functions still work: logging.getLogger() ✅ **Test Coverage:** Added 5 comprehensive test functions covering: - Basic instance method calls (receiver.method) - Chained instance methods (multi-pass resolution) - Instance.method vs module.function disambiguation - Unresolved receiver type handling (edge case) - Nested method calls (a.b.c.method pattern) All quality gates pass: ✅ gradle testGo - all tests pass ✅ gradle buildGo - clean build ✅ gradle lintGo - no lint errors Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * test(python/sast): Add comprehensive tests for class context tracking in return type extraction - Add 6 new test functions covering class-qualified FQN generation - Test class methods, nested classes, and extractClassNameFromNode helper - Achieves 100% coverage for new code in return_type.go (lines 60-71, 426-440) - All tests pass: gradle testGo, lintGo clean Tests added: - TestExtractReturnTypes_ClassMethods: Methods get class-qualified FQNs - TestExtractReturnTypes_NestedClasses: Nested class methods work correctly - TestExtractReturnTypes_ModuleLevelAndClassMethod: Distinguish module vs class functions - TestExtractClassNameFromNode: Direct test of helper function - TestExtractReturnTypes_EmptyClass: Edge case with no methods - TestExtractReturnTypes_ClassWithMultipleMethods: Multiple methods in same class Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent e0efe83 commit 7c03b52

File tree

5 files changed

+643
-111
lines changed

5 files changed

+643
-111
lines changed

CLAUDE.md

Lines changed: 99 additions & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -27,60 +27,72 @@ golangci-lint run
2727

2828
### Running the Binary
2929
```bash
30-
# Interactive query mode
31-
./build/go/pathfinder query --project <path> --stdin
30+
# MCP server mode (for AI coding assistants)
31+
./build/go/pathfinder serve --project <path>
32+
./build/go/pathfinder serve --http --address :8080 --project <path>
3233

33-
# CI mode (loads rules from remote/local)
34+
# Scan mode (using Python DSL rules)
35+
./build/go/pathfinder scan --project <path> --ruleset <path_to_rules>
36+
37+
# CI mode (loads rules from remote/local, outputs SARIF/JSON/CSV)
3438
./build/go/pathfinder ci --project <path> --ruleset cpf/java --output sarif
3539

36-
# Scan mode
37-
./build/go/pathfinder scan --project <path> --ruleset <path_to_rules>
40+
# Resolution diagnostics
41+
./build/go/pathfinder resolution-report --project <path>
3842

39-
# With pagination
40-
./build/go/pathfinder query --project <path> --page 1 --size 10
43+
# Taint analysis diagnostics
44+
./build/go/pathfinder diagnose --project <path>
4145
```
4246

4347
### Running a Single Test
4448
```bash
45-
go test -v -run TestPaginationSorting ./cmd/
49+
go test -v -run TestBuildCallGraph ./graph/callgraph/builder/
50+
go test -v -run TestTypeInference ./graph/callgraph/resolution/
4651
```
4752

4853
## High-Level Architecture
4954

50-
Code Pathfinder is a multi-stage security analysis pipeline:
55+
Code Pathfinder is a multi-stage analysis pipeline:
5156

5257
```
53-
Source Files (.java, .py)
58+
Source Files (.py, .java, Dockerfile, docker-compose.yml)
59+
60+
Tree-Sitter AST Parsing (parallel workers)
5461
55-
Tree-Sitter AST Parsing (5 parallel workers)
62+
Code Graph (Functions, Classes, Call Edges)
5663
57-
Code Graph (Nodes + Edges)
64+
Type Inference Engine (bidirectional, return types, variable assignments)
5865
59-
Query Language (ANTLR parser)
66+
Call Graph Builder (5-pass algorithm)
6067
61-
Query Engine (expr-lang evaluation)
68+
MCP Server / Python DSL Rules
6269
63-
Output Formats (JSON, SARIF, Table)
70+
Output Formats (JSON, SARIF, CSV, Text)
6471
```
6572

6673
### Core Packages
6774

6875
**sast-engine/graph/** - Code graph construction and management
69-
- `initialize.go`: Multi-threaded file parsing with 5 workers
76+
- `initialize.go`: Multi-threaded file parsing with parallel workers
7077
- `parser.go`: AST traversal orchestrator (language-agnostic entry point)
7178
- `parser_java.go`: Java-specific node parsing
7279
- `parser_python.go`: Python-specific node parsing
73-
- `query.go`: Query execution engine with Cartesian product optimization
7480
- `utils.go`: SHA256-based ID generation, file operations
7581

76-
**sast-engine/antlr/** - Query language parsing
77-
- `Query.g4`: ANTLR grammar for PathFinder query language
78-
- `listener_impl.go`: Semantic analysis of parsed queries
82+
**sast-engine/graph/callgraph/** - Call graph and type inference
83+
- `builder/builder.go`: 5-pass call graph construction algorithm
84+
- `resolution/inference.go`: Type inference engine with bidirectional inference
85+
- `resolution/return_type.go`: Return type extraction from AST
86+
- `extraction/variables.go`: Variable assignment type tracking
87+
- `registry/attribute.go`: Class attribute registry
88+
- `registry/builtin.go`: Python builtin types registry
7989

8090
**sast-engine/cmd/** - CLI interface
81-
- `query.go`: Interactive/batch query execution with pagination
82-
- `ci.go`: CI/CD integration with rule loading from codepathfinder.dev
91+
- `serve.go`: MCP server for AI coding assistants
8392
- `scan.go`: Scan project against local ruleset
93+
- `ci.go`: CI/CD integration with rule loading from codepathfinder.dev
94+
- `resolution_report.go`: Call resolution diagnostics
95+
- `diagnose.go`: Taint analysis diagnostics
8496

8597
**sast-engine/model/** - AST data models
8698
- `stmt.go`: Statement models (if/while/for/blocks)
@@ -222,73 +234,55 @@ This project **requires CGO** due to `go-tree-sitter` C bindings. Build fails wi
222234
- Variable assignments (no type information)
223235
- Simplified compared to Java (no invocation linking yet)
224236

225-
## Query Language
226-
227-
### Syntax
237+
## Python DSL Rules (v1.0.0+)
238+
239+
Code Pathfinder uses **Python DSL** for writing security rules. Rules are Python functions that query the call graph using the MCP interface.
240+
241+
### Example Rule
242+
```python
243+
from pathfinder import Rule, Severity
244+
245+
@Rule(
246+
id="CUSTOM-001",
247+
name="Detect SQL Injection Risk",
248+
severity=Severity.HIGH,
249+
description="Methods that process payments should not directly execute SQL queries"
250+
)
251+
def detect_sql_injection(graph):
252+
"""Find payment methods calling SQL execution."""
253+
payment_methods = graph.find_symbol(name="processPayment", type="method")
254+
255+
for method in payment_methods:
256+
callees = graph.get_callees(function=method.fqn)
257+
for callee in callees:
258+
if "executeQuery" in callee.target_fqn or "execute" in callee.target_fqn:
259+
yield {
260+
"file": method.file,
261+
"line": callee.call_line,
262+
"message": f"SQL execution in payment method: {method.fqn}"
263+
}
228264
```
229-
FROM <entity_type> AS <alias> [, <entity_type> AS <alias>]
230-
[WHERE <expression>]
231-
SELECT <output_fields>
232-
```
233-
234-
### Entity Types
235-
- `method_declaration` - Java methods, Python functions
236-
- `class_declaration` - Java/Python classes
237-
- `variable_declaration` - Java fields, Python variables
238-
- `method_invocation` - Java method calls
239-
- `*_expression` - Binary operations (add_expression, eq_expression, etc.)
240-
- `*_statement` - Control flow (if_statement, while_statement, etc.)
241-
242-
### Entity Environment Methods
243-
244-
Each entity type exposes specific methods in WHERE/SELECT clauses:
245-
246-
**method_declaration**:
247-
- `getName()`, `getVisibility()`, `getReturnType()`
248-
- `getArgumentTypes()`, `getArgumentName()`
249-
- `getDoc()`, `getAnnotation()`, `hasAccess()`
250-
251-
**class_declaration**:
252-
- `getName()`, `getSuperClass()`, `getInterface()`
253-
- `getVisibility()`, `getDoc()`
254265

255-
**variable_declaration**:
256-
- `getName()`, `getVisibility()`
257-
- `getVariableDataType()`, `getVariableValue()`
266+
### Available MCP Tools in Rules
267+
- `find_symbol(name, type)` - Find symbols by name and type
268+
- `get_callees(function)` - Get functions called by a function
269+
- `get_callers(function)` - Get functions that call a function
270+
- `get_call_details(caller, callee)` - Get specific call site details
271+
- `find_module(name)` - Find modules by name
272+
- `list_modules()` - List all modules
258273

259-
**method_invocation**:
260-
- `getName()`, `getAccessFromClass()`, `getAccessFromMethod()`
274+
### Running Rules
275+
```bash
276+
# Single rule file
277+
pathfinder scan --rules rules/my_rule.py --project /path/to/project
261278

262-
### Query Execution Flow
263-
```
264-
Query String
265-
266-
ANTLR Parse → Query AST
267-
268-
Generate Cartesian Product of Entity Types
269-
270-
Build Environment Map (pooled)
271-
272-
Compile Expression (expr-lang)
273-
274-
Filter Entities (evaluate each combination)
275-
276-
Sort Results (File → LineNumber → ID)
277-
278-
Apply Pagination (if --page/--size specified)
279-
280-
Format Output (json/table/sarif)
281-
```
279+
# Directory of rules
280+
pathfinder scan --rules rules/ --project /path/to/project
282281

283-
### Example Query
284-
```
285-
FROM method_declaration AS md, method_invocation AS mi
286-
WHERE md.getName() == "processPayment" && mi.getName() == "executeQuery"
287-
SELECT md, mi, "SQL injection risk in payment processing"
282+
# Remote ruleset bundle
283+
pathfinder scan --ruleset docker/security --project /path/to/project
288284
```
289285

290-
This finds methods named `processPayment` that invoke methods named `executeQuery`, potentially indicating SQL injection vulnerabilities.
291-
292286
## Testing Patterns
293287

294288
### Table-Driven Tests
@@ -312,10 +306,11 @@ for _, tt := range tests {
312306
```
313307

314308
### Test Organization
315-
- `graph/query_test.go` - Query execution tests
316-
- `graph/initialize_test.go` - Graph initialization tests
317-
- `antlr/listener_impl_test.go` - Query parsing tests
318-
- `cmd/query_test.go` - CLI and pagination tests
309+
- `graph/callgraph/builder/builder_test.go` - Call graph builder tests
310+
- `graph/callgraph/resolution/*_test.go` - Type inference and resolution tests
311+
- `cmd/scan_test.go` - Scan command tests
312+
- `cmd/ci_test.go` - CI command tests
313+
- `cmd/resolution_report_test.go` - Resolution diagnostics tests
319314

320315
### Type Matching in Tests
321316
When creating test nodes, ensure types match the Node struct:
@@ -377,12 +372,11 @@ result := run.CreateResultForRule(ruleID)
377372

378373
SARIF output integrates with GitHub Code Scanning, VSCode, and other security platforms.
379374

380-
### Pagination Determinism
381-
Results are sorted **before** pagination to ensure consistency across runs:
375+
### Result Determinism
376+
Results are sorted to ensure consistency across runs:
382377
```go
383-
// In cmd/query.go
384-
sort.SliceStable(pairs, func(i, j int) bool {
385-
// Sort by File → LineNumber → ID
378+
// Sort by File → LineNumber → ID
379+
sort.SliceStable(results, func(i, j int) bool {
386380
if nodeI.File != nodeJ.File {
387381
return nodeI.File < nodeJ.File
388382
}
@@ -419,22 +413,22 @@ The npm package downloads pre-built binaries from GitHub releases:
419413

420414
Releases must include binaries for linux-amd64, darwin-amd64, darwin-arm64, and windows-amd64.
421415

422-
## Query Performance Considerations
416+
## Performance Considerations
423417

424418
### Memory Usage
425-
- Small codebase (<1k methods): ~100 MB
426-
- Large codebase (27k methods): ~2.18 GB with lazy loading
427-
- Pagination does NOT reduce memory (sorting requires all results in memory)
428-
- Pagination reduces output size (37 MB → 4.7 KB for page size 10)
419+
- Small codebase (<1k functions): ~100 MB
420+
- Large codebase (27k functions): ~2.18 GB with lazy loading
421+
- MCP server keeps index in memory for fast queries
422+
- Call graph includes functions, edges, and type information
429423

430-
### Execution Time
431-
- Graph building: ~5 seconds for 27k methods (5 workers)
432-
- Query execution: <1 second for simple queries
433-
- Multi-entity queries: O(n²) for 2 entities, can be slow for large graphs
424+
### Indexing Time
425+
- Graph building: ~5 seconds for 27k functions (parallel workers)
426+
- Return type extraction: Parallel across all files
427+
- Variable assignment extraction: Parallel across all files
428+
- Class attribute extraction: Parallel across all files
434429

435430
### Optimization Tips
436-
1. Use specific entity types in FROM clause (not wildcards)
437-
2. Add WHERE conditions that filter early
438-
3. Avoid multi-entity queries on unrelated types
439-
4. Use pagination for large result sets (output size, not memory)
440-
5. Run with `--verbose` to debug slow queries
431+
1. Use `--verbose` flag to see indexing statistics
432+
2. Large projects benefit from more CPU cores (PATHFINDER_MAX_WORKERS env var)
433+
3. MCP queries are fast (index pre-built)
434+
4. Python DSL rules run sequentially - keep rules focused

sast-engine/graph/callgraph/resolution/inference.go

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -132,11 +132,31 @@ func (te *TypeInferenceEngine) UpdateVariableBindingsWithFunctionReturns() {
132132
// Build FQN for the function call
133133
var funcFQN string
134134

135-
// Check if funcName already contains dots (e.g., "logging.getLogger", "MySerializer")
135+
// Check if funcName contains dots (could be module.function OR receiver.method)
136136
if strings.Contains(funcName, ".") {
137-
// Already qualified (e.g., imported module.function)
138-
// Try as-is first
139-
funcFQN = funcName
137+
// Split to check if it's an instance method call (receiver.method)
138+
// vs. a module function call (module.function)
139+
parts := strings.SplitN(funcName, ".", 2)
140+
receiver := parts[0]
141+
methodName := parts[1]
142+
143+
// Check if receiver is a variable in current scope (instance method)
144+
if receiverBinding, exists := scope.Variables[receiver]; exists {
145+
// This is an instance method call: obj.method()
146+
if receiverBinding.Type != nil && !strings.HasPrefix(receiverBinding.Type.TypeFQN, "call:") {
147+
// Receiver has a concrete type - build class-qualified FQN
148+
// Example: receiver="manager" with type="main.UserManager", method="create_user"
149+
// Result: "main.UserManager.create_user"
150+
funcFQN = receiverBinding.Type.TypeFQN + "." + methodName
151+
} else {
152+
// Receiver type is unresolved placeholder - skip for now
153+
// This variable will be resolved in a future iteration
154+
continue
155+
}
156+
} else {
157+
// Not a variable - assume it's a module path (e.g., "logging.getLogger")
158+
funcFQN = funcName
159+
}
140160
} else {
141161
// Simple name - need to qualify with current scope
142162
lastDotIndex := strings.LastIndex(scope.FunctionFQN, ".")

0 commit comments

Comments
 (0)