Skip to content

Feat/require target state keys to fit into type stable key#1625

Merged
georgeh0 merged 13 commits intococoindex-io:v1from
Geoff-Robin:feat/Require_target_state_keys_to_fit_into_type_StableKey
Feb 12, 2026
Merged

Feat/require target state keys to fit into type stable key#1625
georgeh0 merged 13 commits intococoindex-io:v1from
Geoff-Robin:feat/Require_target_state_keys_to_fit_into_type_StableKey

Conversation

@Geoff-Robin
Copy link
Copy Markdown
Contributor

@Geoff-Robin Geoff-Robin commented Feb 5, 2026

Key Changes

  • Removed PyKey

    • The custom PyKey struct and its serialization logic were entirely deleted from rust/py/src/value.rs.
    • This type previously allowed wrapping arbitrary Python objects as keys by serializing them.
  • Adoption of StableKey

    • The code was updated to use StableKey instead.
    • This enforces a stricter, more deterministic type system for keys used in target states, likely improving reliability and cross-connector compatibility.
  • Rust Implementation Update

    • Target State: Updated TargetState in rust/py/src/target_state.rs to interact with this new key type.
    • Path Bridging: Added support code in rust/core/src/state/stable_path.rs and rust/py/src/stable_path.rs to bridge the new key references.

Resolves this particular issue

@badmonster0 badmonster0 requested a review from georgeh0 February 9, 2026 02:42
@badmonster0
Copy link
Copy Markdown
Member

thanks a lot @Geoff-Robin will get back on this shortly! thanks a lot for your contributions!

Comment on lines +189 to +216
impl crate::engine::profile::Persist for StableKey {
fn to_bytes(&self) -> Result<bytes::Bytes> {
let buf = storekey::encode_vec(self)
.map_err(|e| internal_error!("Failed to encode StableKey: {e}"))?;
Ok(bytes::Bytes::from(buf))
}

fn from_bytes(data: &[u8]) -> Result<Self> {
storekey::decode(std::io::Cursor::new(data))
.map_err(|e| internal_error!("Failed to decode StableKey: {e}"))
}
}

impl crate::engine::profile::StableFingerprint for StableKey {
fn stable_fingerprint(&self) -> utils::fingerprint::Fingerprint {
match self {
StableKey::Fingerprint(fp) => *fp,
_ => {
let mut fingerprinter = utils::fingerprint::Fingerprinter::default();
let bytes = crate::engine::profile::Persist::to_bytes(self)
.expect("StableKey encoding for fingerprint");
fingerprinter.write_raw_bytes(&bytes);
fingerprinter.into_fingerprint()
}
}
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to add these. StableKey itself is already serializable and deserializable.

Since StableKey is a type defined in the core crate, we can remove TargetStateKey and directly use StableKey. So we can directly serialize/deserialize it in the engine code.


use cocoindex_core::state::stable_path::{StableKey, StablePath};

impl PyStableKey {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to implement IntoPyObject for PyStableKey instead (which matches FromPyObject below)

#[pyfunction]
pub fn declare_target_state<'py>(
py: Python<'py>,
_py: Python<'py>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need this argument.

):
if isinstance(key, tuple) and not isinstance(key, _TableKey):
key = _TableKey(*key)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation key: _TableKey is not right now. I think we want to use key: StableKey instead, since it's the actual type being passed around. We can remove KeyT / KeyT_contra here, and directly use StableKey.

This will also force each reconcile() implementation explicitly convert the type, so implementers won't forget.

Copy link
Copy Markdown
Contributor Author

@Geoff-Robin Geoff-Robin Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@georgeh0 I've tried doing this but I've encountered a data loss bug in SQLite caused by unstable instance-bound sinks and fixed it by using stable global sinks. Is this fine or maybe I could take a different approach
The tests that i started to fail were:

  • test_sqlite_target.py::test_regular_table_vs_vec0_switch
    started failing around line 1232 > assert len(data) == 0
  • test_sqlite_target.py::test_vec0_virtual_table_basic
    start failing around line 660 > assert len(data) == 1

I will be pushing the code for this to get the whole view

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a data loss bug in SQLite caused by unstable instance-bound sinks and fixed it by using stable global sinks.

Thanks for trying out. We want to understand what happens here.

Why the instance-bound sinks were unstable? They're instance-bound but they're stable. And I don't know why a key type change will affect the logic there.

As I commented above, I think this PR is changing more than we need: we still want to use Fingerprint as TargetStatePath. I'm not sure if the failed tests are related to some unexpected changes like this. Please fix that and try again if the sqlite tests will work with the previous code (with necessary changes for key types). Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like using StableKey directly in TargetStatePath caused the errors, I will be reverting back to fingerprint.
This will undo the changes i've made just now

+ Persist
+ StableFingerprint
+ 'static;
// TargetStateKey is removed in favor of using StableKey directly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to keep this line of comment?


#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize, Deserialize)]
pub struct TargetStatePath(Arc<[utils::fingerprint::Fingerprint]>);
pub struct TargetStatePath(pub Arc<[StableKey]>);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to change this part. We still want to use fingerprint here. (We only change whatever was TargetStateKey to StableKey, but Fingerprint is a different thing)

For the TargetStatePath, the only thing we want to change is when we create/concat the path with a new key part, i.e. we compute fingerprint from StableKey instead of the old key.

.register(
TargetStatePath::new(
utils::fingerprint::Fingerprint::from(&name).into_py_result()?,
cocoindex_core::state::stable_path::StableKey::Str(Arc::from(name)),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to change this. It's OK to still get fingerprint from the name directly.

Comment on lines +451 to +455
# Should be tuple for row key
if not isinstance(key, tuple):
# This might happen if single-value PK is passed as scalar
key = (key,)
key = cast(_RowKey, key)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the key type is not a expected, raise an exception.

Then we won't need the cast (when the isinstance() pass, it's always a tuple that satisfies _RowKey). cast itself doesn't turn value in an unexpected type to the expected type.

Please update reconcile() for other targets accordingly.

self._table_name = table_name
self._table_schema = table_schema
self._is_virtual_table = is_virtual_table
self._sink = coco.TargetActionSink[_RowAction, None].from_fn(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revert this change, and only keep necessary changes related to StableKey?

};
let effect_key = Prof::TargetStateKey::from_bytes(item.key.as_ref())?;
let effect_key: StableKey = storekey::decode(item.key.as_ref())
.map_err(|e| internal_error!("Failed to decode StableKey: {e}"))?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid this map_err() here. It'll strip structured information (like backtrace) from the original error. A simple ? will just work.

for (target_state_path, target_state) in declared_target_states_to_process {
let effect_key_bytes = target_state.key.to_bytes()?;
let effect_key_bytes = storekey::encode_vec(&target_state.item_key)
.map_err(|e| internal_error!("Failed to encode StableKey: {e}"))?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, map_err is not needed.

@georgeh0
Copy link
Copy Markdown
Member

Hi @Geoff-Robin, thanks for contributing! I think the PR is almost there.

Since this is a breaking change, we want to make this PR merged sooner. Let me know if you have any pending changes and want to commit soon. Otherwise, I can also merge the PR first and do a quick cleanup on my side for these comments. Thanks again!

@Geoff-Robin
Copy link
Copy Markdown
Contributor Author

@georgeh0 Yeah, I've just got started. I'll push them in around an hour.

@Geoff-Robin
Copy link
Copy Markdown
Contributor Author

@georgeh0 I have made the changes completely however
this review couldn't be adressed because because its trait-object error type doesn't satisfy the Sized bound required for automatic conversion to the internal error type.

I'm not sure why the tests are failing on merging with v1 and before. Any idea why it's causing? To my knowledge python tests are failing due to circular imports

@georgeh0
Copy link
Copy Markdown
Member

@georgeh0 I have made the changes completely however this review couldn't be adressed because because its trait-object error type doesn't satisfy the Sized bound required for automatic conversion to the internal error type.

I'm not sure why the tests are failing on merging with v1 and before. Any idea why it's causing? To my knowledge python tests are failing due to circular imports

The actual problem I can see from the failed test is an enum variant (StableKey::Symbol, which just added yesterday) is not handled in your new function:

Screenshot 2026-02-12 at 11 06 59 AM

@Geoff-Robin
Copy link
Copy Markdown
Contributor Author

Yeah just realised I'll commit it right now

@georgeh0
Copy link
Copy Markdown
Member

Thanks for the contribution! Merging.

@georgeh0 georgeh0 merged commit 90b1f52 into cocoindex-io:v1 Feb 12, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants