-
Notifications
You must be signed in to change notification settings - Fork 69
Add Elasticsearch #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Elasticsearch #215
Conversation
Hey Evan, this is looking pretty good. A few comments:
|
Thanks for the feedback! I'll need the weekend but I'll make the changes and some more progress soon. |
@kosmikdc I think this is what you're after? I've added a test script (I'll move it to a test folder or something before this is ready for merge) that was helpful for me. I'd like to review the dense vectors and script score documentation a bit and make sure I didn't screw that up, and obviously I still need to test and implement some things (LTR, semantic, etc) From the perspective of the schema management, though, is this better? I'm committed to finishing this PR and making the book accessible to ES users as well (I simply cannot convince my org to switch products, but I'd love to teach members of my team how to use these concepts and tools). It'll be a weekend thing, but I can spend enough weekends on this to get ES support for the book. Please keep the feedback coming :) |
Mornin' @evanvolgas, love your enthusiasm and contributions here. The test script is an excellent idea serving many purposes in the integration of ES :) The schema management is better in the sense that it will render the correct format, but perhaps the opensearch's config should be generalized (a simple rename and directory move) and then used by ElasticsearchEngine (As opposed to duplicated), but it seems functional at the time at least. Another mention is that the AIPS system is set up to utilize Spark for data processing which simplifies and standardizes many data operations, mainly batch the batch reading and writing ( I'll be contributing how I can in the coming weeks |
I'm still working on the LTR, SemanticSearch, etc classes, but I've got the basics of an Elasticsearch engine up and running and have tested chapter 4 (further tests are coming).
I'm still working on this, and will comment below when I believe it is ready for testing / merge,
I'd like a quick set of eyeballs if possible to make sure I'm not wasting my time with this. I know the LTR and semantic search classes aren't there yet, and I'm still working on them. The Collection and Engine should both be in good shape, I believe
Tested from the notebook server using existing Jupyter notebooks and the following:
Also, would you be open to PR to format everything with Black?