You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I have tried AutoTuner feature in single mode machine locally its working fine.
With recent update, I tried to run AutoTuner in distributed mode using following command: python3.7 distributed.py --design fuserisc_v1 --platform sky130hd --config ../designs/sky130hd/fuserisc_v1/autotuner.json --jobs 2000 --server localhost tune --samples 200
But flow failed to complete:
Log:
(run pid=825) ... 180 more trials not shown (180 TERMINATED)
(run pid=825)
(run pid=825)
Log channel is reconnecting. Logs produced while the connection was down can be found on the head node of the cluster in `ray_client_server_[port].out`
2022-05-03 13:13:37,973 WARNING dataclient.py:221 -- Encountered connection issues in the data channel. Attempting to reconnect.
2022-05-03 13:14:08,189 WARNING dataclient.py:226 -- Failed to reconnect the data channel
Traceback (most recent call last):
File "distributed.py", line 947, in <module>
analysis = tune.run(TrainClass, **tune_args)
File "/home/vijayan/.local/lib/python3.7/site-packages/ray/tune/tune.py", line 363, in run
while ray.wait([remote_future], timeout=0.2)[1]:
File "/home/vijayan/.local/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return getattr(ray, func.__name__)(*args, **kwargs)
File "/home/vijayan/.local/lib/python3.7/site-packages/ray/util/client/api.py", line 61, in wait
return self.worker.wait(*args, **kwargs)
File "/home/vijayan/.local/lib/python3.7/site-packages/ray/util/client/worker.py", line 435, in wait
resp = self._call_stub("WaitObject", req, metadata=self.metadata)
File "/home/vijayan/.local/lib/python3.7/site-packages/ray/util/client/worker.py", line 291, in _call_stub
raise ConnectionError("Client is shutting down.")
ConnectionError: Client is shutting down.
Expected behavior
Flow should complete successfully in Distributed mode.
@vijayank88 Is this still an issue? If so, is it possible to share the necessary files for reproduction?
Edit: After trying out it appears that the issue might be the --server localhost. Ray only needs us to supply the --server, --port switch when we are using Ray Cluster[1].
Describe the bug
I have tried AutoTuner feature in single mode machine locally its working fine.
With recent update, I tried to run AutoTuner in distributed mode using following command:
python3.7 distributed.py --design fuserisc_v1 --platform sky130hd --config ../designs/sky130hd/fuserisc_v1/autotuner.json --jobs 2000 --server localhost tune --samples 200
But flow failed to complete:
Log:
Expected behavior
Flow should complete successfully in Distributed mode.
@dralabeing FYI
The text was updated successfully, but these errors were encountered: