Skip to content

Commit 701e972

Browse files
yikaiMetafacebook-github-bot
authored andcommitted
run par as an entrypoint if there is no patch or jetter patch. (meta-pytorch#994)
Summary: Pull Request resolved: meta-pytorch#994 # Context: Currently, when running torchx local job, we are using penv_python as entrypoint. That means we pass the actual .par or .xar file as argument to penv_python. within penv_python, the par/xar is executed as a new process. # Old way to run torchx local job. For example, if the local job is running "jetter --help", torchx runs it like: PENV_PAR='/data/users/yikai/fbsource/buck-out/v2/gen/fbcode/a6cb9616985b22b0/jetter/__jetter-bin__/jetter-bin-inplace.par' penv_python -m jetter.main --help It passes the par file as an environment variable called "PENV_PAR"(There is another way to pass this to penv_python, which is passing 'PENV_PARNAME' as env variable then get the par file's path using it. But it is very very rare, only 0.1% of total usage.) # New way to run torchx local job After migration, We will run it like: PAR_MAIN_OVERRIDE=jetter.main /data/users/yikai/fbsource/buck-out/v2/gen/fbcode/a6cb9616985b22b0/jetter/__jetter-bin__/jetter-bin-inplace.par --help NOTE: This diff only migrates one of the most common use cases, which: 1. There are no patch or jetter patch. 2. it's a par not xar. 3. the par file is passed via "PENV_PAR" env variable. For other use cases, we still run penv_python as entrypoint. Reviewed By: Sanjay-Ganeshan Differential Revision: D66621649
1 parent ff22758 commit 701e972

File tree

1 file changed

+19
-2
lines changed

1 file changed

+19
-2
lines changed

torchx/schedulers/local_scheduler.py

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
import warnings
2929
from dataclasses import asdict, dataclass
3030
from datetime import datetime
31+
from subprocess import Popen
3132
from types import FrameType
3233
from typing import (
3334
Any,
@@ -696,12 +697,11 @@ def _popen(
696697
log.debug(f"Running {role_name} (replica {replica_id}):\n {args_pfmt}")
697698
env = self._get_replica_env(replica_params)
698699

699-
proc = subprocess.Popen(
700+
proc = self.run_local_job(
700701
args=replica_params.args,
701702
env=env,
702703
stdout=stdout_,
703704
stderr=stderr_,
704-
start_new_session=True,
705705
cwd=replica_params.cwd,
706706
)
707707
return _LocalReplica(
@@ -714,6 +714,23 @@ def _popen(
714714
error_file=env.get("TORCHELASTIC_ERROR_FILE", "<N/A>"),
715715
)
716716

717+
def run_local_job(
718+
self,
719+
args: List[str],
720+
env: Dict[str, str],
721+
stdout: Optional[io.FileIO],
722+
stderr: Optional[io.FileIO],
723+
cwd: Optional[str] = None,
724+
) -> Popen[bytes]:
725+
return subprocess.Popen(
726+
args=args,
727+
env=env,
728+
stdout=stdout,
729+
stderr=stderr,
730+
start_new_session=True,
731+
cwd=cwd,
732+
)
733+
717734
def _get_replica_output_handles(
718735
self,
719736
replica_params: ReplicaParam,

0 commit comments

Comments
 (0)