Skip to content

Conversation

@hahahahbenny
Copy link
Contributor

@hahahahbenny hahahahbenny commented Aug 20, 2025

Purpose of the PR

  • Add the vector index type and the detection of related fields

Main Changes

  1. Add the vector index type to the index label.
  2. Verify that the requested fields are of type float list.
  3. Check whether dimension and metric in userdata are null.
  4. Add vector-index–related logic to the CreateIndex RESTful API.

In following API
POST http://localhost:8080/graphs/hugegraph/schema/indexlabels

now support this request json body

{
  "name": "personBy",
  "base_type": "VERTEX_LABEL",
  "base_value": "person",
  "index_type": "VECTOR",
  "fields": ["embedding"],
  "user_data": {
      "dimension" : 1526,
      "metric" : "cosine"
  }
}

attention

  1. The user_data object must explicitly declare both dimension and metric.
  2. The API only accepts fields whose type is float list (array of floats).

主要变化

  1. 将 vector 索引类型添加到 index label。
  2. 校验请求中的字段类型必须为 float list。
  3. 检查 user_data 中的 dimension 和 metric 是否为空。
  4. 在 CreateIndex RESTful API 中增加 vector 索引相关逻辑。
    目前在createInLabel 的接口中
    POST http://localhost:8080/graphs/hugegraph/schema/indexlabels

现已支持如下请求 JSON 体:

{
  "name": "personBy",
  "base_type": "VERTEX_LABEL",
  "base_value": "person",
  "index_type": "VECTOR",
  "fields": ["embedding"],
  "user_data": {
      "dimension" : 1526,
      "metric" : "cosine"
  }
}

注意

  1. user_data 对象必须显式声明 dimension 与 metric。
  2. 向量索引类型仅接受类型为 float list(浮点数组)的fields的property key。

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • xxx

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. api Changes of API labels Aug 20, 2025
"The user_data(dimension and metric) of vector index " +
"label '%s' " +
"can't be null", this.name);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will fix it

(cardinality == Cardinality.LIST),
"vector index can only build on Float List, " +
"but got %s(%s)", dataType, cardinality);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will fix it

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Aug 27, 2025
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Sep 25, 2025
@codecov
Copy link

codecov bot commented Sep 25, 2025

Codecov Report

❌ Patch coverage is 8.62069% with 212 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.67%. Comparing base (0946d5d) to head (e1183dd).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
...hugegraph/backend/serializer/VectorSerializer.java 0.00% 85 Missing ⚠️
...gegraph/backend/serializer/VectorBackendEntry.java 0.00% 53 Missing ⚠️
...java/org/apache/hugegraph/api/graph/VertexAPI.java 0.00% 37 Missing ⚠️
...he/hugegraph/schema/builder/IndexLabelBuilder.java 0.00% 12 Missing and 1 partial ⚠️
...he/hugegraph/backend/tx/GraphIndexTransaction.java 0.00% 11 Missing ⚠️
...org/apache/hugegraph/api/schema/IndexLabelAPI.java 0.00% 2 Missing and 2 partials ⚠️
...ain/java/org/apache/hugegraph/schema/Userdata.java 0.00% 3 Missing ⚠️
...va/org/apache/hugegraph/type/define/IndexType.java 50.00% 1 Missing and 1 partial ⚠️
...ugegraph/backend/serializer/SerializerFactory.java 0.00% 1 Missing ⚠️
...pache/hugegraph/backend/store/BackendFeatures.java 0.00% 1 Missing ⚠️
... and 2 more

❗ There is a different number of reports uploaded between BASE (0946d5d) and HEAD (e1183dd). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (0946d5d) HEAD (e1183dd)
3 1
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2856      +/-   ##
============================================
- Coverage     40.91%   33.67%   -7.24%     
+ Complexity      333      264      -69     
============================================
  Files           747      749       +2     
  Lines         60168    60436     +268     
  Branches       7683     7729      +46     
============================================
- Hits          24615    20351    -4264     
- Misses        32975    37778    +4803     
+ Partials       2578     2307     -271     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@VGalaxies
Copy link
Contributor

VGalaxies commented Sep 25, 2025

@hahahahbenny please run the script to fix this check, could also update the licenses for the newly added dependencies at the same time

@VGalaxies VGalaxies requested a review from Copilot September 25, 2025 12:03
@hahahahbenny hahahahbenny changed the title feat(server): Add the vector index type and the detection of related fields feat(server): Add vector index Sep 25, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds vector index support to Apache HugeGraph, enabling the storage and querying of vector data for similarity search operations. The implementation introduces a new vector backend module using the JVector library and integrates vector index functionality into the existing schema and API layers.

  • Adds a new VECTOR index type to support vector similarity search
  • Implements vector backend store using JVector for vector index operations
  • Extends REST API with vector index creation and ANN search endpoints

Reviewed Changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pom.xml Adds JVector dependency for vector operations
hugegraph-server/pom.xml Includes new vector module
hugegraph-vector/*.java New vector backend implementation with JVector integration
hugegraph-core/src/main/java/org/apache/hugegraph/type/define/IndexType.java Adds VECTOR enum value
hugegraph-core/src/main/java/org/apache/hugegraph/type/HugeType.java Adds VECTOR_INDEX type
hugegraph-core/src/main/java/org/apache/hugegraph/schema/builder/IndexLabelBuilder.java Adds vector index validation
hugegraph-core/src/main/java/org/apache/hugegraph/schema/Userdata.java Adds validation for vector index metadata
hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphIndexTransaction.java Handles vector index transactions
hugegraph-core/src/main/java/org/apache/hugegraph/backend/store/BackendFeatures.java Adds vector index feature flag
hugegraph-core/src/main/java/org/apache/hugegraph/backend/serializer/*.java New serializers for vector data
hugegraph-api/src/main/java/org/apache/hugegraph/api/schema/IndexLabelAPI.java REST API for vector index creation
hugegraph-api/src/main/java/org/apache/hugegraph/api/graph/VertexAPI.java REST API for ANN search

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 129 to 133
private VectorSerializer getVectorSerializer() {
// Create VectorSerializer using the same pattern as the main serializer
HugeConfig config = this.params().configuration();
return new VectorSerializer(config);
}
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new VectorSerializer instance on every call is inefficient. Consider caching the VectorSerializer instance as a class member to avoid repeated object creation.

Copilot uses AI. Check for mistakes.
}

// ANN search request class
private static class AnnSearchRequest {
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AnnSearchRequest class should be public or moved to a separate public class to allow proper API documentation generation and client code usage.

Suggested change
private static class AnnSearchRequest {
public static class AnnSearchRequest {

Copilot uses AI. Check for mistakes.
Comment on lines 49 to 52
public VectorSerializer() {
super();
}

Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The no-argument constructor calls super() unnecessarily since it's implicitly called. Consider removing this constructor if it's not needed or documenting its purpose.

Suggested change
public VectorSerializer() {
super();
}

Copilot uses AI. Check for mistakes.
}
}
this.tables.clear();
return false;
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method always returns false despite successfully closing tables. Consider returning true on successful closure or changing the return type to void if the boolean return value is not needed.

Suggested change
return false;
return true;

Copilot uses AI. Check for mistakes.

@Override
protected boolean opened() {
return false;
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opened() method always returns false, which contradicts the opened field being set to true in the Session.open() method. This should return the actual state of the session.

Suggested change
return false;
return !this.closed;

Copilot uses AI. Check for mistakes.
Comment on lines +570 to +571
return String.format("AnnSearchRequest{vertex_label=%s, properties=%s, user_vector=%s, metric=%s, dimension=%s}",
vertex_label, properties, Arrays.toString(user_vector), metric, dimension);
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing import for Arrays class. Add 'import java.util.Arrays;' to the imports section.

Copilot uses AI. Check for mistakes.
}

// Vector index must build on float list
if(this.indexType.isVector()){
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after 'if' keyword. Should be 'if (this.indexType.isVector()) {' to follow Java coding conventions.

Suggested change
if(this.indexType.isVector()){
if (this.indexType.isVector()) {

Copilot uses AI. Check for mistakes.
Comment on lines 36 to 45
// 基础字段(实现BackendEntry接口)
private final HugeType type; // VECTOR_INDEX
private final Id id; // 索引ID
private final Id subId; // 顶点ID

// 向量核心字段
private final String vectorId; // 向量唯一标识
private final float[] vector; // 向量数据
private final String metricType; // 度量类型 (L2, COSINE, DOT)
private final Integer dimension; // 向量维度
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should be in English to maintain consistency with the rest of the codebase. Consider translating these Chinese comments to English.

Copilot uses AI. Check for mistakes.
@hahahahbenny
Copy link
Contributor Author

ok, I have run this script

@hahahahbenny please run the script to fix this check, could also update the licenses for the newly added dependencies at the same time

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Oct 29, 2025
@hahahahbenny hahahahbenny changed the base branch from master to vector-index October 31, 2025 02:17
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 31, 2025
Copy link
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge PR first, need enhance it in the future work & merge into master

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 31, 2025
@imbajin imbajin merged commit c92710c into apache:vector-index Oct 31, 2025
4 of 13 checks passed
hahahahbenny added a commit to hahahahbenny/incubator-hugegraph that referenced this pull request Nov 23, 2025
# This is the 1st commit message:

add Licensed to files

# This is the commit message apache#2:

feat(server): support vector index in graphdb  (apache#2856)

* feat(server): Add the vector index type and the detection of related fields to the index label.

* fix code format

* add annsearch API

* add doc to explain the plan

delete redundency in vertexapi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Changes of API lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants