-
Notifications
You must be signed in to change notification settings - Fork 456
Open
Labels
Description
Describe the bug
Have k8s cluster. When try to load graph from data storing in S3, get an error
To Reproduce
Steps to reproduce the behavior:
- Setup and run k8s cluster
- Be sure, that the following python script runs properly and rises no error. Also be sure, that it starts all required GS pods on k8s cluster
import graphscope
from graphscope.framework.loader import Loader
session = graphscope.session() # depends on your setup, you could have some parameters set
#<placeholder>
session.close()
- Be sure, that all S3 settings are correct and you can access files in bucket from every GS pods directly (using curl or s3cmd, etc)
- Then, add the following code (replace
<placeholder>
):
graph = session.g()
graph = graph.add_vertices(Loader('s3://bucket/vertices.csv', key='{{ s3_access_key }}', secret='{{ s3_secret_key }}', endpoint_url='{{ s3_endpoint_url }}', delimiter='|'), label='vertex')
graph = graph.add_edges(Loader('s3://bucket/edges.csv', key='{{ s3_access_key }}', secret='{{ s3_secret_key }}', endpoint_url='{{ s3_endpoint_url }}', delimiter='|'), src_label='vertex', dst_label='vertex', label='knows')
- See error:
E1106 17:01:31.000000 481 /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/arrow_fragment_loader.cc:432] Failed to read from stream o04c02d48a740008a: Object not exists: failed to get metadata for 'o04c02d48a740008a': failed to read get_data reply: {"content":null,"type":"get_data_reply"}
E1106 17:01:31.000000 435 /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/arrow_fragment_loader.cc:432] Failed to read from stream o04c02d48a740008a: Object not exists: failed to get metadata for 'o04c02d48a740008a': failed to read get_data reply: {"content":null,"type":"get_data_reply"}
E1106 17:01:31.000000 114 /home/graphscope/GraphScope/analytical_engine/core/server/dispatcher.cc:153] Worker 0: VineyardError occurred on worker 0: VineyardError occurred on worker 0: /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/fragment_loader_utils.cc:218: SyncSchema -> Assertion failed: field_num > 0: Empty table list cannot be used for normalizing schema
vineyard::SyncSchema(std::shared_ptr<arrow::Table> const&, grape::CommSpec const&) + 0x7BC
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda(std::shared_ptr<arrow::Table> const&)#2}&, std::shared_ptr<arrow::Table> const&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda(std::shared_ptr<arrow::Table> const&)#2}&, std::shared_ptr<arrow::Table> const&)::{lambda()#2}::operator()() const + 0x49
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int) + 0x1845
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&)::{lambda()#2}::operator()() const + 0x52
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables() + 0x35D
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexEdgeTables() + 0x2D1
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragment(unsigned long) + 0x47
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragmentAsFragmentGroup(unsigned long) + 0x3B
AddLabelsToGraph + 0x485
gs::GrapeInstance::addLabelsToGraph(gs::rpc::GSParams const&) + 0x83B
gs::GrapeInstance::OnReceive(std::shared_ptr<gs::CommandDetail>) + 0x1357
gs::Dispatcher::processCmd(std::shared_ptr<gs::CommandDetail>) + 0xEA
gs::Dispatcher::publisherLoop() + 0x246
std::error_code::default_error_condition() const + 0x33
pthread_condattr_setpshared + 0x513
2024-11-06 09:01:31,956 [ERROR][rpc:189]: Runstep failed with code: ANALYTICAL_ENGINE_INTERNAL_ERROR, message: Error occurred during RunStep, The traceback is: Traceback (most recent call last):
File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/op_executor.py", line 106, in run_step
for response in responses:
File "/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py", line 543, in __next__
return self._next()
File "/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py", line 969, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INTERNAL
details = "VineyardError occurred on worker 0: VineyardError occurred on worker 0: /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/fragment_loader_utils.cc:218: SyncSchema -> Assertion failed: field_num > 0: Empty table list cannot be used for normalizing schema
vineyard::SyncSchema(std::shared_ptr<arrow::Table> const&, grape::CommSpec const&) + 0x7BC
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda(std::shared_ptr<arrow::Table> const&)#2}&, std::shared_ptr<arrow::Table> const&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda(std::shared_ptr<arrow::Table> const&)#2}&, std::shared_ptr<arrow::Table> const&)::{lambda()#2}::operator()() const + 0x49
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int) + 0x1845
...
In short, error:
Failed to read from stream o04c02d48a740008a: Object not exists: failed to get metadata for 'o04c02d48a740008a': failed to read get_data reply: {"content":null,"type":"get_data_reply"}
Expected behavior
We expect a graph with vertices and edges loaded. Which we could check, using interactive, for example. And no errors.
Environment:
- GraphScope version: v0.29.0
- OS: Ubuntu
- Version 24.04
- Kubernetes Version 1.28.14
- Python version: 3.11.10 (with following dependencies: graphscope==0.29.0, graphscope-client==0.29.0, pandas==2.0.3, aiohttp, async_timeout)
Additional context
We also tried to load the same data (vertices and edges) as file:
session = graphscope.session(
k8s_volumes={
"data": {
"type": "hostPath",
"field": {
"path": os.path.expanduser("~/examples/"),
"type": "Directory"
},
"mounts": {
"mountPath": "/examples/"
}
}
}
)
graph = session.g()
graph = graph.add_vertices(Loader('/examples/vertices.csv', delimiter='|'), label='vertex')
graph = graph.add_edges(Loader('/examples/edges.csv', delimiter='|'), src_label='vertex', dst_label='vertex', label='knows')
And it works as expected, with no errors.