[Variant] Add try_value/value for VariantArray #8719

klion26 · 2025-10-27T11:08:40Z

Which issue does this PR close?

Closes [Varaint] Support VariantArray::value to return a Result<Variant> #8672 .

What changes are included in this PR?

Add try_value/value function for VariantArray
Add test for VariantArray::try_value

Are these changes tested?

Covered by existing tests and added new tests

Are there any user-facing changes?

Yes, add a new function for VariantArray::try_value, and the VariantArray::value changed to panic from returning Variant::Null if there is some cast error.

The `try_value` will return Result<Variant, ArrowError> and `value` unwrap from `try_value`

klion26 · 2025-10-27T11:25:07Z

parquet-variant-compute/src/variant_array.rs

    }

-    fn deserialize_metadata(_metadata: Option<&str>) -> Result<Self::Metadata, ArrowError> {
+    fn deserialize_metadata(_metadata: Option<&str>) -> Result<Self::Metadata> {


The modifications here were made because arrow::error::Result was introduced.

parquet-variant-compute/src/variant_array.rs

klion26

@alamb @scovich Please help review this when you're free, thanks.

klion26 · 2025-10-27T12:47:00Z

parquet-variant-compute/src/variant_get.rs

+        let result = variant_get(&variant_array, options).unwrap();
+        assert_eq!(3, result.len());
+
+        for i in 0..3 {


The result is all null instead of an Array of Variant::Null because

builder.append_value(target.try_value(i)).unwrap_or(Variant::Null) in shredded_get_path will call builder.append_value with Variant::Null

the builder is VariantToPrimitiveArrowRowBuilder, and will call T::from_variant(value) in builder.append_value

then will call PrimitiveFromVariant for Time64Microsecond

This will Variant::as_time_utc currently, and will return None as the input is not Variant::Time(_)

Not sure if we need to change the return value of variant_get here from result.is_null(i) to Variant::Null

Seems we need to return an arrow null instead of Variant::Null here, as the doc of CastOptions, /// If true, return error on conversion failure. If false, insert null for failed conversions., seems the return

scovich · 2025-10-27T13:48:29Z

@alamb @scovich Please help review this when you're free, thanks.

Sorry, last week and this week are crazy, I probably won't get to this until next week. But thanks for tackling it!

martin-g · 2025-10-27T14:04:45Z

parquet-variant-compute/src/variant_get.rs

+                    builder.append_value(value)?;
                } else {
-                    builder.append_value(target.value(i))?;
+                    builder.append_value(target.try_value(i).unwrap_or(Variant::Null))?;


Suggested change

builder.append_value(target.try_value(i).unwrap_or(Variant::Null))?;

match target.try_value(i) {

Ok(v) => builder.append_value(v)?,

Err(_) => builder.append_null()?,

}

uses append_null() because it is a bit smarter

martin-g · 2025-10-27T14:07:50Z

parquet-variant-compute/src/type_conversion.rs

+        let v = arr.value($index);
+        match ($cast_fn)(v) {
+            Ok(var) => Ok(Variant::from(var)),
+            Err(e) => Err(ArrowError::CastError(format!("Cast failed: {e}"))),


Suggested change

Err(e) => Err(ArrowError::CastError(format!("Cast failed: {e}"))),

Err(e) => Err(ArrowError::CastError(format!(

"Cast failed at index {idx} (array type: {ty}): {e}",

idx = $index,

ty = <$t as ArrowPrimitiveType>::DATA_TYPE

))),

to give some more details in the error message

klion26

@martin-g Thanks for the review, I've updated the code. Please take another look.

klion26 · 2025-10-28T06:24:06Z

parquet-variant-compute/src/type_conversion.rs

+        let v = arr.value($index);
+        match ($cast_fn)(v) {
+            Ok(var) => Ok(Variant::from(var)),
+            Err(e) => Err(ArrowError::CastError(format!("Cast failed: {e}"))),


klion26 · 2025-10-28T06:57:55Z

parquet-variant-compute/src/variant_get.rs

+        let result = variant_get(&variant_array, options).unwrap();
+        assert_eq!(3, result.len());
+
+        for i in 0..3 {


Seems we need to return an arrow null instead of Variant::Null here, as the doc of CastOptions, /// If true, return error on conversion failure. If false, insert null for failed conversions., seems the return

alamb

Thank you @klion26 and @martin-g -- this looks good to me

cc @scovich or @liamzwbao in case you would like to review as well

alamb · 2025-10-29T00:08:00Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing add_variant_try_value/value (d6eb7f8) to a7572eb diff
BENCH_NAME=variant_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=add_variant_try_value_value
Results will be posted here when complete

alamb · 2025-10-29T00:12:33Z

🤖: Benchmark completed

Details

group                                                                add_variant_try_value_value            main
-----                                                                ---------------------------            ----
batch_json_string_to_variant json_list 8k string                     1.02     24.6±0.15ms        ? ?/sec    1.00     24.0±0.11ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00    314.7±2.70ms        ? ?/sec    1.00    314.3±4.66ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.00      7.4±0.03ms        ? ?/sec    1.05      7.7±0.02ms        ? ?/sec
variant_get_primitive                                                1.00    931.5±4.01ns        ? ?/sec    1.01    938.6±4.83ns        ? ?/sec

alamb · 2025-10-29T00:12:36Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing add_variant_try_value/value (d6eb7f8) to a7572eb diff
BENCH_NAME=variant_builder
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_builder
BENCH_FILTER=
BENCH_BRANCH_NAME=add_variant_try_value_value
Results will be posted here when complete

alamb · 2025-10-29T00:16:56Z

🤖: Benchmark completed

Details

group                                       add_variant_try_value_value            main
-----                                       ---------------------------            ----
bench_extend_metadata_builder               1.00     59.3±2.21ms        ? ?/sec    1.01     59.6±1.66ms        ? ?/sec
bench_object_field_names_reverse_order      1.00     19.4±0.96ms        ? ?/sec    1.03     20.1±0.42ms        ? ?/sec
bench_object_list_partially_same_schema     1.00  1253.7±14.88µs        ? ?/sec    1.00  1251.7±14.93µs        ? ?/sec
bench_object_list_same_schema               1.00     24.9±0.17ms        ? ?/sec    1.00     25.0±0.17ms        ? ?/sec
bench_object_list_unknown_schema            1.00     13.4±0.08ms        ? ?/sec    1.00     13.4±0.11ms        ? ?/sec
bench_object_partially_same_schema          1.00      3.3±0.01ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
bench_object_same_schema                    1.00     38.4±0.24ms        ? ?/sec    1.00     38.5±0.07ms        ? ?/sec
bench_object_unknown_schema                 1.00     16.2±0.04ms        ? ?/sec    1.00     16.1±0.04ms        ? ?/sec
iteration/unvalidated_fallible_iteration    1.00      2.5±0.01ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
iteration/validated_iteration               1.00     47.6±0.08µs        ? ?/sec    1.00     47.4±0.08µs        ? ?/sec
validation/unvalidated_construction         1.00      6.5±0.01µs        ? ?/sec    1.00      6.5±0.01µs        ? ?/sec
validation/validated_construction           1.00     60.4±0.32µs        ? ?/sec    1.00     60.5±0.23µs        ? ?/sec
validation/validation_cost                  1.00     53.7±0.06µs        ? ?/sec    1.00     53.4±0.09µs        ? ?/sec

alamb · 2025-10-29T00:16:59Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing add_variant_try_value/value (d6eb7f8) to a7572eb diff
BENCH_NAME=variant_validation
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_validation
BENCH_FILTER=
BENCH_BRANCH_NAME=add_variant_try_value_value
Results will be posted here when complete

alamb · 2025-10-29T00:18:04Z

🤖: Benchmark completed

Details

group                               add_variant_try_value_value            main
-----                               ---------------------------            ----
bench_validate_complex_object       1.00    228.6±0.52µs        ? ?/sec    1.01    229.9±0.27µs        ? ?/sec
bench_validate_large_nested_list    1.17     22.4±0.04ms        ? ?/sec    1.00     19.2±0.30ms        ? ?/sec
bench_validate_large_object         1.00     55.4±0.07ms        ? ?/sec    1.00     55.2±0.12ms        ? ?/sec

klion26 · 2025-10-29T11:13:18Z

It's strange that the bench_validate_large_nested_list benchmark regressed. I tried to change the benchmark code Variant::try_new(&metadata, &value).unwrap() to ut and debug (with stop point at VariantArray::try_value/value and variant_get), seems there is no code path to VariantArray::try_value/value and variant_get.

klion26 · 2025-10-29T11:48:34Z

Tried to run the benchmark on my laptap with the following steps, the results were roughly the same

run cargo bench --features=arrow,async,test_common,experimental --bench variant_validation -- --save-baseline add_variant_try_value on current branch
run cargo bench --features=arrow,async,test_common,experimental --bench variant_validation -- --save-baseline main on main branch
run critcmp main add_variant_try_value get the result

result

group                               add_variant_try_value                  main
-----                               ---------------------                  ----
bench_validate_complex_object       1.01    235.6±7.32µs        ? ?/sec    1.00    232.6±8.15µs        ? ?/sec
bench_validate_large_nested_list    1.01     18.7±0.57ms        ? ?/sec    1.00     18.6±0.70ms        ? ?/sec
bench_validate_large_object         1.00     53.2±2.31ms        ? ?/sec    1.01     53.9±1.72ms        ? ?/sec
---------------------------------------------------------------------------------------------------------------

alamb · 2025-10-29T20:47:57Z

Tried to run the benchmark on my laptap with the following steps, the results were roughly the same

I agree -- it seems like maybe measurement error -- I have queued up another run

liamzwbao

LGTM! thanks for the improvement

liamzwbao · 2025-10-30T01:20:29Z

parquet-variant-compute/src/variant_get.rs

+                    let _ = match target.try_value(i) {
+                        Ok(v) => builder.append_value(v)?,
+                        Err(_) => {
+                            builder.append_null()?;
+                            false // add this to make match arms have the same return type
+                        }
+                    };


This might be a bit better as it drops the value early

Suggested change

let _ = match target.try_value(i) {

Ok(v) => builder.append_value(v)?,

Err(_) => {

builder.append_null()?;

false // add this to make match arms have the same return type

}

};

match target.try_value(i) {

Ok(v) => {

let _ = builder.append_value(v)?;

}

Err(_) => builder.append_null()?,

}

I tried this, but changed to the current solution.
We need to change the return type for builder.append_null(), (we need both match arms to return the same type), and we need to return Ok(false) for builder.append_null() to satisfy the semantics here, but the builder.append_null always succeeds(builder.append_value() will return Ok(true) if succeed). It seems weird to have different semantics for the return result of builder.append_value and builder.append_null.

alamb · 2025-10-30T09:27:43Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing add_variant_try_value/value (d6eb7f8) to a7572eb diff
BENCH_NAME=variant_validation
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_validation
BENCH_FILTER=
BENCH_BRANCH_NAME=add_variant_try_value_value
Results will be posted here when complete

alamb · 2025-10-30T09:29:03Z

🤖: Benchmark completed

Details

group                               add_variant_try_value_value            main
-----                               ---------------------------            ----
bench_validate_complex_object       1.00    229.3±0.33µs        ? ?/sec    1.00    228.5±0.35µs        ? ?/sec
bench_validate_large_nested_list    1.01     19.4±0.04ms        ? ?/sec    1.00     19.2±0.04ms        ? ?/sec
bench_validate_large_object         1.00     54.3±0.07ms        ? ?/sec    1.02     55.2±0.11ms        ? ?/sec

alamb · 2025-10-30T11:00:40Z

On rerun there appears to be no performance difference, so merging this one in

alamb · 2025-10-30T11:01:24Z

Thank you @klion26 @martin-g and @liamzwbao

klion26 · 2025-10-31T09:41:57Z

@alamb @martin-g @liamzwbao Thanks for the review and merging!

[Variant] Add try_value/value for VariantArray

1adf768

The `try_value` will return Result<Variant, ArrowError> and `value` unwrap from `try_value`

github-actions bot added the parquet-variant parquet-variant* crates label Oct 27, 2025

klion26 commented Oct 27, 2025

View reviewed changes

variant_get use VariantArray::try_value

a6a362a

klion26 commented Oct 27, 2025

View reviewed changes

martin-g reviewed Oct 27, 2025

View reviewed changes

address comment

d6eb7f8

klion26 commented Oct 28, 2025

View reviewed changes

martin-g approved these changes Oct 28, 2025

View reviewed changes

alamb approved these changes Oct 28, 2025

View reviewed changes

liamzwbao approved these changes Oct 30, 2025

View reviewed changes

alamb merged commit 1b18582 into apache:main Oct 30, 2025
17 checks passed

klion26 deleted the add_variant_try_value/value branch October 31, 2025 09:41

-            Err(e) => Err(ArrowError::CastError(format!("Cast failed: {e}"))),
+            Err(e) => Err(ArrowError::CastError(format!(
+                "Cast failed at index {idx} (array type: {ty}): {e}",
+                idx = $index,
+                ty = <$t as ArrowPrimitiveType>::DATA_TYPE
+            ))),

[Variant] Add try_value/value for VariantArray #8719

[Variant] Add try_value/value for VariantArray #8719

Uh oh!

Conversation

klion26 commented Oct 27, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

klion26 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scovich commented Oct 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klion26 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

klion26 commented Oct 29, 2025

Uh oh!

klion26 commented Oct 29, 2025

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

liamzwbao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

Uh oh!

klion26 commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development