Feat/require target state keys to fit into type stable key#1625
Conversation
|
thanks a lot @Geoff-Robin will get back on this shortly! thanks a lot for your contributions! |
rust/core/src/state/stable_path.rs
Outdated
| impl crate::engine::profile::Persist for StableKey { | ||
| fn to_bytes(&self) -> Result<bytes::Bytes> { | ||
| let buf = storekey::encode_vec(self) | ||
| .map_err(|e| internal_error!("Failed to encode StableKey: {e}"))?; | ||
| Ok(bytes::Bytes::from(buf)) | ||
| } | ||
|
|
||
| fn from_bytes(data: &[u8]) -> Result<Self> { | ||
| storekey::decode(std::io::Cursor::new(data)) | ||
| .map_err(|e| internal_error!("Failed to decode StableKey: {e}")) | ||
| } | ||
| } | ||
|
|
||
| impl crate::engine::profile::StableFingerprint for StableKey { | ||
| fn stable_fingerprint(&self) -> utils::fingerprint::Fingerprint { | ||
| match self { | ||
| StableKey::Fingerprint(fp) => *fp, | ||
| _ => { | ||
| let mut fingerprinter = utils::fingerprint::Fingerprinter::default(); | ||
| let bytes = crate::engine::profile::Persist::to_bytes(self) | ||
| .expect("StableKey encoding for fingerprint"); | ||
| fingerprinter.write_raw_bytes(&bytes); | ||
| fingerprinter.into_fingerprint() | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
We don't need to add these. StableKey itself is already serializable and deserializable.
Since StableKey is a type defined in the core crate, we can remove TargetStateKey and directly use StableKey. So we can directly serialize/deserialize it in the engine code.
rust/py/src/stable_path.rs
Outdated
|
|
||
| use cocoindex_core::state::stable_path::{StableKey, StablePath}; | ||
|
|
||
| impl PyStableKey { |
There was a problem hiding this comment.
I think we want to implement IntoPyObject for PyStableKey instead (which matches FromPyObject below)
rust/py/src/target_state.rs
Outdated
| #[pyfunction] | ||
| pub fn declare_target_state<'py>( | ||
| py: Python<'py>, | ||
| _py: Python<'py>, |
There was a problem hiding this comment.
We no longer need this argument.
| ): | ||
| if isinstance(key, tuple) and not isinstance(key, _TableKey): | ||
| key = _TableKey(*key) | ||
|
|
There was a problem hiding this comment.
The type annotation key: _TableKey is not right now. I think we want to use key: StableKey instead, since it's the actual type being passed around. We can remove KeyT / KeyT_contra here, and directly use StableKey.
This will also force each reconcile() implementation explicitly convert the type, so implementers won't forget.
There was a problem hiding this comment.
@georgeh0 I've tried doing this but I've encountered a data loss bug in SQLite caused by unstable instance-bound sinks and fixed it by using stable global sinks. Is this fine or maybe I could take a different approach
The tests that i started to fail were:
- test_sqlite_target.py::test_regular_table_vs_vec0_switch
started failing around line 1232 >assert len(data) == 0 - test_sqlite_target.py::test_vec0_virtual_table_basic
start failing around line 660 >assert len(data) == 1
I will be pushing the code for this to get the whole view
There was a problem hiding this comment.
a data loss bug in SQLite caused by unstable instance-bound sinks and fixed it by using stable global sinks.
Thanks for trying out. We want to understand what happens here.
Why the instance-bound sinks were unstable? They're instance-bound but they're stable. And I don't know why a key type change will affect the logic there.
As I commented above, I think this PR is changing more than we need: we still want to use Fingerprint as TargetStatePath. I'm not sure if the failed tests are related to some unexpected changes like this. Please fix that and try again if the sqlite tests will work with the previous code (with necessary changes for key types). Thanks!
There was a problem hiding this comment.
It seems like using StableKey directly in TargetStatePath caused the errors, I will be reverting back to fingerprint.
This will undo the changes i've made just now
rust/core/src/engine/profile.rs
Outdated
| + Persist | ||
| + StableFingerprint | ||
| + 'static; | ||
| // TargetStateKey is removed in favor of using StableKey directly. |
There was a problem hiding this comment.
Why do we need to keep this line of comment?
|
|
||
| #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize, Deserialize)] | ||
| pub struct TargetStatePath(Arc<[utils::fingerprint::Fingerprint]>); | ||
| pub struct TargetStatePath(pub Arc<[StableKey]>); |
There was a problem hiding this comment.
We don't want to change this part. We still want to use fingerprint here. (We only change whatever was TargetStateKey to StableKey, but Fingerprint is a different thing)
For the TargetStatePath, the only thing we want to change is when we create/concat the path with a new key part, i.e. we compute fingerprint from StableKey instead of the old key.
rust/py/src/target_state.rs
Outdated
| .register( | ||
| TargetStatePath::new( | ||
| utils::fingerprint::Fingerprint::from(&name).into_py_result()?, | ||
| cocoindex_core::state::stable_path::StableKey::Str(Arc::from(name)), |
There was a problem hiding this comment.
We don't need to change this. It's OK to still get fingerprint from the name directly.
| # Should be tuple for row key | ||
| if not isinstance(key, tuple): | ||
| # This might happen if single-value PK is passed as scalar | ||
| key = (key,) | ||
| key = cast(_RowKey, key) |
There was a problem hiding this comment.
If the key type is not a expected, raise an exception.
Then we won't need the cast (when the isinstance() pass, it's always a tuple that satisfies _RowKey). cast itself doesn't turn value in an unexpected type to the expected type.
Please update reconcile() for other targets accordingly.
| self._table_name = table_name | ||
| self._table_schema = table_schema | ||
| self._is_virtual_table = is_virtual_table | ||
| self._sink = coco.TargetActionSink[_RowAction, None].from_fn( |
There was a problem hiding this comment.
Can we revert this change, and only keep necessary changes related to StableKey?
rust/core/src/engine/execution.rs
Outdated
| }; | ||
| let effect_key = Prof::TargetStateKey::from_bytes(item.key.as_ref())?; | ||
| let effect_key: StableKey = storekey::decode(item.key.as_ref()) | ||
| .map_err(|e| internal_error!("Failed to decode StableKey: {e}"))?; |
There was a problem hiding this comment.
Let's avoid this map_err() here. It'll strip structured information (like backtrace) from the original error. A simple ? will just work.
| for (target_state_path, target_state) in declared_target_states_to_process { | ||
| let effect_key_bytes = target_state.key.to_bytes()?; | ||
| let effect_key_bytes = storekey::encode_vec(&target_state.item_key) | ||
| .map_err(|e| internal_error!("Failed to encode StableKey: {e}"))?; |
There was a problem hiding this comment.
Same here, map_err is not needed.
|
Hi @Geoff-Robin, thanks for contributing! I think the PR is almost there. Since this is a breaking change, we want to make this PR merged sooner. Let me know if you have any pending changes and want to commit soon. Otherwise, I can also merge the PR first and do a quick cleanup on my side for these comments. Thanks again! |
|
@georgeh0 Yeah, I've just got started. I'll push them in around an hour. |
|
@georgeh0 I have made the changes completely however I'm not sure why the tests are failing on merging with v1 and before. Any idea why it's causing? To my knowledge python tests are failing due to circular imports |
The actual problem I can see from the failed test is an enum variant (
|
|
Yeah just realised I'll commit it right now |
|
Thanks for the contribution! Merging. |

Key Changes
Removed
PyKeyPyKeystruct and its serialization logic were entirely deleted fromrust/py/src/value.rs.Adoption of
StableKeyStableKeyinstead.Rust Implementation Update
TargetStateinrust/py/src/target_state.rsto interact with this new key type.rust/core/src/state/stable_path.rsandrust/py/src/stable_path.rsto bridge the new key references.Resolves this particular issue