Fast parquet order inversion

### Is your feature request related to a problem or challenge?

Often parquet files are written in time-ascending order, i.e. new values are appended at the end. However many users would like to know the last value(s) of a given context. Currently this means that DataFusion would scan the entire parquet file and just keep the relevant bit at the end. However that seems wasteful and I think we can do better.

### Describe the solution you'd like

The solution here overlaps with things that we would need to implement in the arrow-rs repository, but I am using an umbrella ticket to illustrate the rough idea.

## Part 1 -- Order Inversion
If a parquet file is order `col1 ASC, col2 DESC, col3 ASC`, then reading it "back to front" should result in data ordered `col1 DESC, col2 ASC, col3 DESC`, i.e. the sequence of expressions is the same, but the polarity (ascending or descending) of each expression is flipped. So the DataFusion planner should (after all of this is implemented) that it is better to inverse the order and potentially apply a top-k on that than reading an entire file and perform a simple `Sort` operation.

## Part 2 -- Decode Parquet Pages in Reverse Order
Ideally we would be able to decode the parquet files backwards, i.e. "last value first". However this only works for a subset of [encodings](https://parquet.apache.org/docs/file-format/data-pages/encodings/), while others are clearly order-dependent. But if we look at a parquet file, this order only matters on a page level and technically pages (of a column) can be decoded in reverse order. That is true for ALL encodings.

## Part 3 -- Flip a Page
Reading the pages in reverse order however is still not enough. Imagine the following layout:

```yaml
pages:
 - values:
   - A
   - B
   - C
 - values:
   - D
   - E
```

Decoding pages in reverse order would yield `D, E, A, B C`. That's not right and we need to reverse the order within a page. Now my hypothesis is that "flipping" a page would only require inverting the order of an arrow buffer and in contrast to the generic [`take`](https://docs.rs/arrow/latest/arrow/compute/kernels/take/fn.take.html) kernel, this can be implemented in-place and without random access. This highly simplified code for an integer array shows the principle:

```rust
fn flip_u64_array(array: &mut [u64]) {
    // if you do this iterator instead of index-based,
    // you can even get rid of the bounds-checking
    for idx in 0..(array.len() / 2) {
        array.swap(idx, array.len() - idx);
    }
}
```

One can imagine how this even works for more complex types:
- dictionary encoded data: flip the array that contains the per-row dict keys, but NOT the actual dict
- nested data (like list): flip top-level array, but not the children (i.e. the lists themselves are NOT flipped)
- run-offset encoded data: flip both the offset and the value array
- string views: flip the offset array but NOT the values

A `flip` kernel would even be useful for DataFusion in general (outside the parquet context), i.e. when we know that we can use this optimization instead of a full `sort`.

### Describe alternatives you've considered

\-

### Additional context

\-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fast parquet order inversion #17172

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Part 1 -- Order Inversion

Part 2 -- Decode Parquet Pages in Reverse Order

Part 3 -- Flip a Page

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fast parquet order inversion #17172

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Part 1 -- Order Inversion

Part 2 -- Decode Parquet Pages in Reverse Order

Part 3 -- Flip a Page

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions