Skip to content

FlyVec-10 nearest neighbor words in the hash code space #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SizhaoXu opened this issue Sep 2, 2022 · 3 comments
Open

FlyVec-10 nearest neighbor words in the hash code space #1

SizhaoXu opened this issue Sep 2, 2022 · 3 comments

Comments

@SizhaoXu
Copy link

SizhaoXu commented Sep 2, 2022

I'm also doing the FlyVec evaluation experiment, but I don't know how to get 10 nearest neighbor words in the hash code space. After reading your code, I think the top_k_similar_words in your experiment is not the 10 nearest neighbor words in the paper. Do you have any ideas about this experiment?

@Flowshu
Copy link
Owner

Flowshu commented Sep 2, 2022

Hi.
Are you referring to the sim function?
That should calculate the similarity between two embeddings. Since the sparse embeddings are binary, we can just compare two embeddings across all dimensions with equality (==) and sum them up.
This should be equivalent to calculating a proper distance (e.g. L2) since the (squared) differences between 0 and 1 can only be 0 or 1. I think this is also how it is described in the paper in section 3.1.
Does that make sense or did I misunderstand your question?

@SizhaoXu
Copy link
Author

SizhaoXu commented Sep 4, 2022

Hi. Are you referring to the sim function? That should calculate the similarity between two embeddings. Since the sparse embeddings are binary, we can just compare two embeddings across all dimensions with equality (==) and sum them up. This should be equivalent to calculating a proper distance (e.g. L2) since the (squared) differences between 0 and 1 can only be 0 or 1. I think this is also how it is described in the paper in section 3.1. Does that make sense or did I misunderstand your question?

Thank you, I also do the same for sim function. My question is Firgure 4 in the paper, I don't know how to get it. Emmm, do you know how to use FlyVec to get the context-dependent word embeddings?

@Flowshu
Copy link
Owner

Flowshu commented Sep 5, 2022

The functions get_sparse_embeddings and get_dense_embeddings provided in the library only produce static embeddings.
I am not sure about the context-dependent case.
The authors of the original paper can probably help you here and I saw you already reached out to them in their repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants