Description
Is your feature request related to a problem or challenge?
The current filter pushdown APIs in DataFusion (FilterPushdownPropagation, PredicateSupports, etc.) have grown organically but now appear convoluted and redundant. The complex layering of abstractions makes the filter pushdown mechanism difficult to understand, maintain, and extend.
Specific issues include:
- Multiple overlapping abstraction layers (PredicateSupport, PredicateSupports, FilterDescription, etc.)
- Redundant helper methods with inconsistent naming patterns (.unsupported(), .transparent(), .with_filters(), .with_updated_node(), .new_with_supported_check(), .collect_supported(), .is_all_supported(), etc.)
- Complex mental model requiring developers to track multiple states and transformations
- Lack of clear documentation about the conceptual model and flow
- Inconsistent naming conventions (e.g., all_supported creates new objects while make_supported transforms existing ones)
These issues increase the learning curve for new contributors and make maintenance more challenging for all developers.
Describe the solution you'd like
Redesign the filter pushdown APIs with a focus on simplicity, consistency, and clarity:
-
Reduce abstraction layers: Consolidate the multiple wrappers into fewer, more focused data structures.
-
Consistent API patterns: Use clear naming conventions:
- with_* for non-mutating methods that return new objects
- mark_* for transformations
- collect_* for extraction methods
- Simplified core data structures:
/// A predicate with its support status for pushdown
enum PredicateWithSupport {
Supported(Arc<dyn PhysicalExpr>),
Unsupported(Arc<dyn PhysicalExpr>),
}
/// Collection of predicates with clearly defined operations
struct Predicates {
// Core operations that are intuitive to use
// ...
}
/// Clear result type for pushdown operations
struct FilterPushdownResult<T> {
pushed_predicates: Vec<Arc<dyn PhysicalExpr>>,
retained_predicates: Vec<Arc<dyn PhysicalExpr>>,
updated_plan: Option<T>,
}
-
More declarative approach: Let execution plan nodes declare which predicates they support rather than relying on complex negotiation.
-
Better documentation: Add clear documentation about the mental model, flow, and expected usage patterns.
-
Test coverage: Ensure robust test coverage for the new APIs to prevent regressions.
This redesign should aim to reduce cognitive load for developers while maintaining all current functionality. It should also make future extensions to the filter pushdown system more straightforward.
Describe alternatives you've considered
No response
Additional context
The current implementation reflects the complexity of the problem space, but I believe it could be made more approachable with a clearer design focused on the essential operations and better documentation of the conceptual model.