Skip to content

Conversation

@cj-zhukov
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This PR is for consolidating all the dataframe examples (dataframe, default_column_values, deserialize_to_struct) into a single example binary. We are agreed on the pattern and we can apply it to the remaining examples

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@cj-zhukov
Copy link
Contributor Author

High-Level Overview

This PR consolidates all dataframe examples (dataframe, default_column_values, deserialize_to_struct) into a single example binary.
Previously, each example had its own file, but now they can be executed via subcommands using:

cargo run --example dataframe -- [dataframe|default_column_values|deserialize_to_struct]

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cj-zhukov and @martin-g

I have a suggestion about moving one of these examples to another file -- let me know if it makes sense

///
/// The metadata-based approach provides a flexible way to store default values as strings
/// and cast them to the appropriate types at query time.
pub async fn default_column_values() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure that this is a "dataframe" example -- it does use the DataFrame API but I think it is really illustrating how to provide a custom data source that fills in default column values

I suggest we move it to https://github.com/apache/datafusion/tree/main/datafusion-examples/examples/custom_data_source instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you, let's move the example

let usage = format!(
"Usage: cargo run --example {} -- [{}]",
ExampleKind::EXAMPLE_NAME,
ExampleKind::variants().join("|")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW for some reason

cargo run --example dataframe

doesn't show me the valid options, it only tells me the argument is missing 🤔

Usage: cargo run --example dataframe -- [dataframe|default_column_values|deserialize_to_struct]
Error: Execution("Missing argument")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options are listed here: '-- [dataframe|default_column_values|deserialize_to_struct]'

Maybe another format should be used...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I see what you mean - while the valid options are included in the usage line, they are not very visible, especially since they appear only after the “Missing argument” error.
I agree the formatting could be improved to make the valid options clearer. We could print them separately (as a list) or improve the usage string so it’s easier to read.
I'm happy to adjust the formatting if you have a preferred style, or I can propose one:

Usage:
    cargo run --example dataframe <EXAMPLE>
Examples:
    dataframe
    default_column_values
    deserialize_to_struct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just missed the options in my haste. Maybe we can improve the formatting for these errors as some follow on PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants