Skip to content

Add a way to show the contents of the ListFilesCache in datafusion-cli #19055

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

As we roll out the ListingFileCache from @BlakeOrth in #18855 it would be very helpful to be able to see its contents to debug any potential issues we see

@nuno-faria made a really nice feature to view the contents of the cache: metadata_cache()

For example:

> select * from metadata_cache();
+---------------------------------------------------+---------------------+-----------------+--------------------------------------+---------+---------------------+------+------------------+
| path                                              | file_modified       | file_size_bytes | e_tag                                | version | metadata_size_bytes | hits | extra            |
+---------------------------------------------------+---------------------+-----------------+--------------------------------------+---------+---------------------+------+------------------+
| hits_compatible/athena_partitioned/hits_1.parquet | 2022-07-03T15:33:57 | 174965044       | "1f5da68e097309811a675c849491ac48-9" | NULL    | 165128              | 0    | page_index=false |
+---------------------------------------------------+---------------------+-----------------+--------------------------------------+---------+---------------------+------+------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.

Describe the solution you'd like

I would like a table function similar to metadata_cache() for the listing files cache. Since each entry is a Vec<ObjectMeta> one option would be to flatten the entries so there is one entry per ObjectMeta stored:

Someting like

select * from list_files_cache();
path file_modified file_size_bytes e_tag version metadata_size_bytes expires
/foo/bar 2022-07-03T15:33:57 1234 ... ... 132 NULL
/foo/baz 2022-07-03T15:33:57 5678 ... ... 3112 2026-07-03T15:33:57
... ... ... ... ... ... ...

Where metadata_size_bytes shows the size of the statistics, in bytes and expires shows when the entry expires

This would mean that a single ListFilesEntry object is displayed as multiple rows.

It would also mean we would have to find some way to represent a ListFilesEntry that had no entries (e.g. metas is an empty Vec). Perhaps it could have a row entirely of nulls:

path file_modified file_size_bytes e_tag version metadata_size_bytes expires
NULL NULL NULL NULL NULL NULL 2026-07-03T15:33:57

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions