Skip to content

Conversation

@klion26
Copy link
Member

@klion26 klion26 commented Oct 27, 2025

Which issue does this PR close?

What changes are included in this PR?

  • Add try_value/value function for VariantArray
  • Add test for VariantArray::try_value

Are these changes tested?

Covered by existing tests and added new tests

Are there any user-facing changes?

Yes, add a new function for VariantArray::try_value, and the VariantArray::value changed to panic from returning Variant::Null if there is some cast error.

The `try_value` will return Result<Variant, ArrowError> and `value` unwrap from `try_value`
@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Oct 27, 2025
}

fn deserialize_metadata(_metadata: Option<&str>) -> Result<Self::Metadata, ArrowError> {
fn deserialize_metadata(_metadata: Option<&str>) -> Result<Self::Metadata> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modifications here were made because arrow::error::Result was introduced.

Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb @scovich Please help review this when you're free, thanks.

let result = variant_get(&variant_array, options).unwrap();
assert_eq!(3, result.len());

for i in 0..3 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is all null instead of an Array of Variant::Null because

  • builder.append_value(target.try_value(i)).unwrap_or(Variant::Null) in shredded_get_path will call builder.append_value with Variant::Null
  • the builder is VariantToPrimitiveArrowRowBuilder, and will call T::from_variant(value) in builder.append_value
  • then will call PrimitiveFromVariant for Time64Microsecond
  • This will Variant::as_time_utc currently, and will return None as the input is not Variant::Time(_)

Not sure if we need to change the return value of variant_get here from result.is_null(i) to Variant::Null

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we need to return an arrow null instead of Variant::Null here, as the doc of CastOptions, /// If true, return error on conversion failure. If false, insert null for failed conversions., seems the return

@scovich
Copy link
Contributor

scovich commented Oct 27, 2025

@alamb @scovich Please help review this when you're free, thanks.

Sorry, last week and this week are crazy, I probably won't get to this until next week. But thanks for tackling it!

builder.append_value(value)?;
} else {
builder.append_value(target.value(i))?;
builder.append_value(target.try_value(i).unwrap_or(Variant::Null))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
builder.append_value(target.try_value(i).unwrap_or(Variant::Null))?;
match target.try_value(i) {
Ok(v) => builder.append_value(v)?,
Err(_) => builder.append_null()?,
}

uses append_null() because it is a bit smarter

let v = arr.value($index);
match ($cast_fn)(v) {
Ok(var) => Ok(Variant::from(var)),
Err(e) => Err(ArrowError::CastError(format!("Cast failed: {e}"))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Err(e) => Err(ArrowError::CastError(format!("Cast failed: {e}"))),
Err(e) => Err(ArrowError::CastError(format!(
"Cast failed at index {idx} (array type: {ty}): {e}",
idx = $index,
ty = <$t as ArrowPrimitiveType>::DATA_TYPE
))),

to give some more details in the error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-g Thanks for the review, I've updated the code. Please take another look.

let v = arr.value($index);
match ($cast_fn)(v) {
Ok(var) => Ok(Variant::from(var)),
Err(e) => Err(ArrowError::CastError(format!("Cast failed: {e}"))),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

let result = variant_get(&variant_array, options).unwrap();
assert_eq!(3, result.len());

for i in 0..3 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we need to return an arrow null instead of Variant::Null here, as the doc of CastOptions, /// If true, return error on conversion failure. If false, insert null for failed conversions., seems the return

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @klion26 and @martin-g -- this looks good to me

cc @scovich or @liamzwbao in case you would like to review as well

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing add_variant_try_value/value (d6eb7f8) to a7572eb diff
BENCH_NAME=variant_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=add_variant_try_value_value
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

🤖: Benchmark completed

Details

group                                                                add_variant_try_value_value            main
-----                                                                ---------------------------            ----
batch_json_string_to_variant json_list 8k string                     1.02     24.6±0.15ms        ? ?/sec    1.00     24.0±0.11ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00    314.7±2.70ms        ? ?/sec    1.00    314.3±4.66ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.00      7.4±0.03ms        ? ?/sec    1.05      7.7±0.02ms        ? ?/sec
variant_get_primitive                                                1.00    931.5±4.01ns        ? ?/sec    1.01    938.6±4.83ns        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing add_variant_try_value/value (d6eb7f8) to a7572eb diff
BENCH_NAME=variant_builder
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_builder
BENCH_FILTER=
BENCH_BRANCH_NAME=add_variant_try_value_value
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

🤖: Benchmark completed

Details

group                                       add_variant_try_value_value            main
-----                                       ---------------------------            ----
bench_extend_metadata_builder               1.00     59.3±2.21ms        ? ?/sec    1.01     59.6±1.66ms        ? ?/sec
bench_object_field_names_reverse_order      1.00     19.4±0.96ms        ? ?/sec    1.03     20.1±0.42ms        ? ?/sec
bench_object_list_partially_same_schema     1.00  1253.7±14.88µs        ? ?/sec    1.00  1251.7±14.93µs        ? ?/sec
bench_object_list_same_schema               1.00     24.9±0.17ms        ? ?/sec    1.00     25.0±0.17ms        ? ?/sec
bench_object_list_unknown_schema            1.00     13.4±0.08ms        ? ?/sec    1.00     13.4±0.11ms        ? ?/sec
bench_object_partially_same_schema          1.00      3.3±0.01ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
bench_object_same_schema                    1.00     38.4±0.24ms        ? ?/sec    1.00     38.5±0.07ms        ? ?/sec
bench_object_unknown_schema                 1.00     16.2±0.04ms        ? ?/sec    1.00     16.1±0.04ms        ? ?/sec
iteration/unvalidated_fallible_iteration    1.00      2.5±0.01ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
iteration/validated_iteration               1.00     47.6±0.08µs        ? ?/sec    1.00     47.4±0.08µs        ? ?/sec
validation/unvalidated_construction         1.00      6.5±0.01µs        ? ?/sec    1.00      6.5±0.01µs        ? ?/sec
validation/validated_construction           1.00     60.4±0.32µs        ? ?/sec    1.00     60.5±0.23µs        ? ?/sec
validation/validation_cost                  1.00     53.7±0.06µs        ? ?/sec    1.00     53.4±0.09µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing add_variant_try_value/value (d6eb7f8) to a7572eb diff
BENCH_NAME=variant_validation
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_validation
BENCH_FILTER=
BENCH_BRANCH_NAME=add_variant_try_value_value
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

🤖: Benchmark completed

Details

group                               add_variant_try_value_value            main
-----                               ---------------------------            ----
bench_validate_complex_object       1.00    228.6±0.52µs        ? ?/sec    1.01    229.9±0.27µs        ? ?/sec
bench_validate_large_nested_list    1.17     22.4±0.04ms        ? ?/sec    1.00     19.2±0.30ms        ? ?/sec
bench_validate_large_object         1.00     55.4±0.07ms        ? ?/sec    1.00     55.2±0.12ms        ? ?/sec

@klion26
Copy link
Member Author

klion26 commented Oct 29, 2025

It's strange that the bench_validate_large_nested_list benchmark regressed. I tried to change the benchmark code Variant::try_new(&metadata, &value).unwrap() to ut and debug (with stop point at VariantArray::try_value/value and variant_get), seems there is no code path to VariantArray::try_value/value and variant_get.

@klion26
Copy link
Member Author

klion26 commented Oct 29, 2025

Tried to run the benchmark on my laptap with the following steps, the results were roughly the same

  • run cargo bench --features=arrow,async,test_common,experimental --bench variant_validation -- --save-baseline add_variant_try_value on current branch
  • run cargo bench --features=arrow,async,test_common,experimental --bench variant_validation -- --save-baseline main on main branch
  • run critcmp main add_variant_try_value get the result

result

group                               add_variant_try_value                  main
-----                               ---------------------                  ----
bench_validate_complex_object       1.01    235.6±7.32µs        ? ?/sec    1.00    232.6±8.15µs        ? ?/sec
bench_validate_large_nested_list    1.01     18.7±0.57ms        ? ?/sec    1.00     18.6±0.70ms        ? ?/sec
bench_validate_large_object         1.00     53.2±2.31ms        ? ?/sec    1.01     53.9±1.72ms        ? ?/sec
---------------------------------------------------------------------------------------------------------------

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

Tried to run the benchmark on my laptap with the following steps, the results were roughly the same

I agree -- it seems like maybe measurement error -- I have queued up another run

Copy link
Contributor

@liamzwbao liamzwbao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for the improvement

Comment on lines +149 to +155
let _ = match target.try_value(i) {
Ok(v) => builder.append_value(v)?,
Err(_) => {
builder.append_null()?;
false // add this to make match arms have the same return type
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit better as it drops the value early

Suggested change
let _ = match target.try_value(i) {
Ok(v) => builder.append_value(v)?,
Err(_) => {
builder.append_null()?;
false // add this to make match arms have the same return type
}
};
match target.try_value(i) {
Ok(v) => {
let _ = builder.append_value(v)?;
}
Err(_) => builder.append_null()?,
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but changed to the current solution.
We need to change the return type for builder.append_null(), (we need both match arms to return the same type), and we need to return Ok(false) for builder.append_null() to satisfy the semantics here, but the builder.append_null always succeeds(builder.append_value() will return Ok(true) if succeed). It seems weird to have different semantics for the return result of builder.append_value and builder.append_null.

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing add_variant_try_value/value (d6eb7f8) to a7572eb diff
BENCH_NAME=variant_validation
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_validation
BENCH_FILTER=
BENCH_BRANCH_NAME=add_variant_try_value_value
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

🤖: Benchmark completed

Details

group                               add_variant_try_value_value            main
-----                               ---------------------------            ----
bench_validate_complex_object       1.00    229.3±0.33µs        ? ?/sec    1.00    228.5±0.35µs        ? ?/sec
bench_validate_large_nested_list    1.01     19.4±0.04ms        ? ?/sec    1.00     19.2±0.04ms        ? ?/sec
bench_validate_large_object         1.00     54.3±0.07ms        ? ?/sec    1.02     55.2±0.11ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

On rerun there appears to be no performance difference, so merging this one in

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

Thank you @klion26 @martin-g and @liamzwbao

@alamb alamb merged commit 1b18582 into apache:main Oct 30, 2025
17 checks passed
@klion26 klion26 deleted the add_variant_try_value/value branch October 31, 2025 09:41
@klion26
Copy link
Member Author

klion26 commented Oct 31, 2025

@alamb @martin-g @liamzwbao Thanks for the review and merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Varaint] Support VariantArray::value to return a Result<Variant>

5 participants