Skip to content

Question on the vector graph search query #1393

@adelanl

Description

@adelanl
    OPTIONAL MATCH (chunk)-[:HAS_ENTITY]->(e)
    WITH e, count(*) AS numChunks 
    ORDER BY numChunks DESC 
    LIMIT {no_of_entites}

    WITH 
    CASE 
        WHEN e.embedding IS NULL OR ({embedding_match_min} <= vector.similarity.cosine($query_vector, e.embedding) AND vector.similarity.cosine($query_vector, e.embedding) <= {embedding_match_max}) THEN 
            collect {{
                OPTIONAL MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){{0,1}}(:!Chunk&!Document&!__Community__) 
                RETURN path LIMIT {entity_limit_minmax_case}
            }}
        WHEN e.embedding IS NOT NULL AND vector.similarity.cosine($query_vector, e.embedding) >  {embedding_match_max} THEN
            collect {{
                OPTIONAL MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){{0,2}}(:!Chunk&!Document&!__Community__) 
                RETURN path LIMIT {entity_limit_max_case} 
            }} 
        ELSE 
            collect {{ 
                MATCH path=(e) 
                RETURN path 
            }}
    END AS paths, e

This is a snippet from the VECTOR_GRAPH_SEARCH_QUERY from data source context retrieval.
From I am able to gather, here is where we take not just the entities, but also the relations between them.
Hence, we have the ELSE here to just return the entities without the relationships if the entity embeddings match the query within a satisfiable threshold. It also looks like we grab more when the entities are well above the satisfaction threshold (by hopping twice? and increasing the limit).

If that understanding is correct, then I guess this might be more of a general cypher question for my learning, but why is the path match denoted by
path = (e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){{0,2}}(:!Chunk&!Document&!__Community__)
?

What would be wrong with rewriting this query as:
path = (e) -[]-{{0,2}} (:__Entity__)
(we can just rely on the landing node restriction since the incoming nodes e are always entities from chunks and we want every relationship between entities, but this is not what I am inquiring about)

I know the current path definition hops on the pattern fragment, rather the other hops within the individual relationship, but we are at most hopping ahead by two entities, and the pattern fragment just defines the restriction. I can't figure a more concise way to ask the question. Is there a particular reason for the current path definition here? Both result in the same output with my toy entity graph.

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions