-
Notifications
You must be signed in to change notification settings - Fork 20
[POP-2992] Implement construct-graph-ptxt binary #1711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality looks very similar to generate_benchmark_data. I know there are some differences, but I feel we should reconcile the abstractions and have a single implementation. init_test_db
s also has overlapping functionality, which adds to the confusion.
Of course, it can be rectified later if data analysis needs this specific implementation now.
} | ||
|
||
info!("Opening iris codes input stream"); | ||
let file = File::open(args.iris_codes_path.as_path()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could repurpose the read_json
function from py_bindings/io.rs
let mut graph = GraphMem::new(); | ||
let prf_seed = (args.hnsw_prf_key as u128).to_le_bytes(); | ||
|
||
for json_ptxt in stream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you initialize PlaintextStore
, then you can simply call PlaintextStore::generate_graph
to do the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to change it slightly to emit info!
events.
Thanks for the input! @mcalancea definitely this is intended as a one-off binary that we need for short-term usage. Agreed that we should eventually have a utility that combines this with the functionality provided by the |
478c09b
to
934b04f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Good idea to use the parallel insert algorithm.
PR implements a small binary to construct an HNSW graph from plaintext iris code input, with output serialized to file using the standard "single-graph" binary output format. This binary will be used primarily for upcoming data analysis tasks.