Skip to content

Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience #16188

Open
@kosiew

Description

@kosiew

Is your feature request related to a problem or challenge?

The current filter pushdown APIs in DataFusion (FilterPushdownPropagation, PredicateSupports, etc.) have grown organically but now appear convoluted and redundant. The complex layering of abstractions makes the filter pushdown mechanism difficult to understand, maintain, and extend.

Specific issues include:

  • Multiple overlapping abstraction layers (PredicateSupport, PredicateSupports, FilterDescription, etc.)
  • Redundant helper methods with inconsistent naming patterns (.unsupported(), .transparent(), .with_filters(), .with_updated_node(), .new_with_supported_check(), .collect_supported(), .is_all_supported(), etc.)
  • Complex mental model requiring developers to track multiple states and transformations
  • Lack of clear documentation about the conceptual model and flow
  • Inconsistent naming conventions (e.g., all_supported creates new objects while make_supported transforms existing ones)

These issues increase the learning curve for new contributors and make maintenance more challenging for all developers.

Describe the solution you'd like

Redesign the filter pushdown APIs with a focus on simplicity, consistency, and clarity:

  1. Reduce abstraction layers: Consolidate the multiple wrappers into fewer, more focused data structures.

  2. Consistent API patterns: Use clear naming conventions:

  • with_* for non-mutating methods that return new objects
  • mark_* for transformations
  • collect_* for extraction methods
  1. Simplified core data structures:
/// A predicate with its support status for pushdown
enum PredicateWithSupport {
    Supported(Arc<dyn PhysicalExpr>),
    Unsupported(Arc<dyn PhysicalExpr>),
}

/// Collection of predicates with clearly defined operations
struct Predicates {
    // Core operations that are intuitive to use
    // ...
}

/// Clear result type for pushdown operations
struct FilterPushdownResult<T> {
    pushed_predicates: Vec<Arc<dyn PhysicalExpr>>,
    retained_predicates: Vec<Arc<dyn PhysicalExpr>>,
    updated_plan: Option<T>,
}
  1. More declarative approach: Let execution plan nodes declare which predicates they support rather than relying on complex negotiation.

  2. Better documentation: Add clear documentation about the mental model, flow, and expected usage patterns.

  3. Test coverage: Ensure robust test coverage for the new APIs to prevent regressions.

This redesign should aim to reduce cognitive load for developers while maintaining all current functionality. It should also make future extensions to the filter pushdown system more straightforward.

Describe alternatives you've considered

No response

Additional context

The current implementation reflects the complexity of the problem space, but I believe it could be made more approachable with a clearer design focused on the essential operations and better documentation of the conceptual model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions