Skip to content

[BUG] Can not load graph from S3 on k8s #4314

@atberium

Description

@atberium

Describe the bug
Have k8s cluster. When try to load graph from data storing in S3, get an error

To Reproduce
Steps to reproduce the behavior:

  1. Setup and run k8s cluster
  2. Be sure, that the following python script runs properly and rises no error. Also be sure, that it starts all required GS pods on k8s cluster
import graphscope
from graphscope.framework.loader import Loader

session = graphscope.session() # depends on your setup, you could have some parameters set

#<placeholder>

session.close()
  1. Be sure, that all S3 settings are correct and you can access files in bucket from every GS pods directly (using curl or s3cmd, etc)
  2. Then, add the following code (replace <placeholder>):
graph = session.g()
graph = graph.add_vertices(Loader('s3://bucket/vertices.csv', key='{{ s3_access_key }}', secret='{{ s3_secret_key }}', endpoint_url='{{ s3_endpoint_url }}', delimiter='|'), label='vertex')
graph = graph.add_edges(Loader('s3://bucket/edges.csv', key='{{ s3_access_key }}', secret='{{ s3_secret_key }}', endpoint_url='{{ s3_endpoint_url }}', delimiter='|'), src_label='vertex', dst_label='vertex', label='knows')
  1. See error:
E1106 17:01:31.000000   481 /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/arrow_fragment_loader.cc:432] Failed to read from stream o04c02d48a740008a: Object not exists: failed to get metadata for 'o04c02d48a740008a': failed to read get_data reply: {"content":null,"type":"get_data_reply"}
E1106 17:01:31.000000   435 /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/arrow_fragment_loader.cc:432] Failed to read from stream o04c02d48a740008a: Object not exists: failed to get metadata for 'o04c02d48a740008a': failed to read get_data reply: {"content":null,"type":"get_data_reply"}
E1106 17:01:31.000000   114 /home/graphscope/GraphScope/analytical_engine/core/server/dispatcher.cc:153] Worker 0: VineyardError occurred on worker 0: VineyardError occurred on worker 0: /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/fragment_loader_utils.cc:218: SyncSchema -> Assertion failed: field_num > 0: Empty table list cannot be used for normalizing schema
vineyard::SyncSchema(std::shared_ptr<arrow::Table> const&, grape::CommSpec const&) + 0x7BC
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda(std::shared_ptr<arrow::Table> const&)#2}&, std::shared_ptr<arrow::Table> const&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda(std::shared_ptr<arrow::Table> const&)#2}&, std::shared_ptr<arrow::Table> const&)::{lambda()#2}::operator()() const + 0x49
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int) + 0x1845
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&)::{lambda()#2}::operator()() const + 0x52
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables() + 0x35D
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexEdgeTables() + 0x2D1
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragment(unsigned long) + 0x47
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragmentAsFragmentGroup(unsigned long) + 0x3B
AddLabelsToGraph + 0x485
gs::GrapeInstance::addLabelsToGraph(gs::rpc::GSParams const&) + 0x83B
gs::GrapeInstance::OnReceive(std::shared_ptr<gs::CommandDetail>) + 0x1357
gs::Dispatcher::processCmd(std::shared_ptr<gs::CommandDetail>) + 0xEA
gs::Dispatcher::publisherLoop() + 0x246
std::error_code::default_error_condition() const + 0x33
pthread_condattr_setpshared + 0x513
2024-11-06 09:01:31,956 [ERROR][rpc:189]: Runstep failed with code: ANALYTICAL_ENGINE_INTERNAL_ERROR, message: Error occurred during RunStep, The traceback is: Traceback (most recent call last):
  File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/op_executor.py", line 106, in run_step
    for response in responses:
  File "/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py", line 543, in __next__
    return self._next()
  File "/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py", line 969, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "VineyardError occurred on worker 0: VineyardError occurred on worker 0: /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/fragment_loader_utils.cc:218: SyncSchema -> Assertion failed: field_num > 0: Empty table list cannot be used for normalizing schema
vineyard::SyncSchema(std::shared_ptr<arrow::Table> const&, grape::CommSpec const&) + 0x7BC
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda(std::shared_ptr<arrow::Table> const&)#2}&, std::shared_ptr<arrow::Table> const&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda(std::shared_ptr<arrow::Table> const&)#2}&, std::shared_ptr<arrow::Table> const&)::{lambda()#2}::operator()() const + 0x49
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int) + 0x1845
...

In short, error:
Failed to read from stream o04c02d48a740008a: Object not exists: failed to get metadata for 'o04c02d48a740008a': failed to read get_data reply: {"content":null,"type":"get_data_reply"}
Expected behavior
We expect a graph with vertices and edges loaded. Which we could check, using interactive, for example. And no errors.

Environment:

  • GraphScope version: v0.29.0
  • OS: Ubuntu
  • Version 24.04
  • Kubernetes Version 1.28.14
  • Python version: 3.11.10 (with following dependencies: graphscope==0.29.0, graphscope-client==0.29.0, pandas==2.0.3, aiohttp, async_timeout)

Additional context
We also tried to load the same data (vertices and edges) as file:

session = graphscope.session(
    k8s_volumes={
        "data": {
            "type": "hostPath",
            "field": {
                "path": os.path.expanduser("~/examples/"),
                "type": "Directory"
            },
            "mounts": {
                "mountPath": "/examples/"
            }
        }
    }
)

graph = session.g()
graph = graph.add_vertices(Loader('/examples/vertices.csv', delimiter='|'), label='vertex')
graph = graph.add_edges(Loader('/examples/edges.csv', delimiter='|'), src_label='vertex', dst_label='vertex', label='knows')

And it works as expected, with no errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions