Skip to content

fix(py_venv): work in terms of bytes when patching shebang lines #606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 30 additions & 7 deletions py/tools/py/src/venv.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ use sha256::try_digest;
use std::{
env::current_dir,
fs::{self, File},
io::{BufRead, BufReader, BufWriter, Write},
io::{self, BufRead, BufReader, BufWriter, Read, Write, Seek},
os::unix::fs::{MetadataExt, PermissionsExt},
path::{Path, PathBuf},
};
Expand Down Expand Up @@ -172,7 +172,34 @@ fn copy(original: &PathBuf, link: &PathBuf) -> miette::Result<()> {
));
}

const RELOCATABLE_SHEBANG: &str = "\
fn copy_and_patch_shebang(original: &PathBuf, link: &PathBuf) -> miette::Result<()> {
let mut src = File::open(original.as_path()).into_diagnostic()?;

let mut buf = [0u8; PLACEHOLDER_SHEBANG.len()];
let found_shebang = match src.read_exact(&mut buf) {
Ok(()) => buf == PLACEHOLDER_SHEBANG,
Err(error) => match error.kind() {
io::ErrorKind::UnexpectedEof => false, // File too short to contain shebang.
_ => Err(error).into_diagnostic()?,
},
};

if found_shebang {
let mut dst = File::create(link.as_path()).into_diagnostic()?;
dst.write_all(RELOCATABLE_SHEBANG).into_diagnostic()?;
src.rewind().into_diagnostic()?;
io::copy(&mut src, &mut dst).into_diagnostic()?;
}
else {
copy(original, link)?;
}

Ok(())
}

const PLACEHOLDER_SHEBANG: &[u8] = b"#!/dev/null";
Copy link
Author

@plobsing plobsing Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I would love to know where this #!/dev/null placeholder comes from (so I can document it with a comment). FWICT, PEP 427 only recommends recognizing #!python and #!pythonw. @arrdem , you seem to have added this; do you recall?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shebang is unspecified behavior of rules_python. As part of how rules_python implements installing packages an interpreter path must be specified but since that path is being specified at module/workspace setup time there's no way to know either the Bazel label or the relative path or anything else about the interpreter with which the script may eventually be invoked. So rules_python does the "reasonable" (insane) thing and uses /dev/null as the shebang. It could use /bin/false or any other value.

I don't think it's reasonable or future-proof to hardcode this or use the read_exact strategy here. The protocol should be to read the first 512b, see if it starts with #! and there's a \n in there and replace that first line if such.

I think your rewind() machinery fails to strip the shebang from the copy source as this PR stands.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. I've added it to the PR. It is a bit wild, I wouldn't have expected rules_python to be the source.

The two behaviours identified as defects were exactly preserved from the prior implementation. This PR only fixes the defect it claims to — the venv builder choking on binary files in the scripts directory.

I do agree that its a little odd to do things this way, and I'd be happy to work with you towards getting a more correct shebang logic in place (BTW, do you know of any packages that trigger the shebang substitution logic so that we can cover all of this with a test?), but they are not what cause the issue I am seeking to address and so I do not think they should be a part of this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this patch set is unacceptable, the issue of binaries triggering stream did not contain valid UTF-8 could alternately be addressed by handling specifically that error. That looks like this: main...plobsing:rules_py:ignore_invalid_utf8 .

I like that solution less because, while it handles more cases than are handled today, including the one I care about, it just feels less correct. In principle, a Python source file is not required to be UTF-8 (PEP 263 is still current and documented for recent Pythons, even if the feature is little used); the encoding assumption/assertion made by using read_to_string to process bin files, even only Python sources, just isn't great in general.


const RELOCATABLE_SHEBANG: &[u8] = b"\
#!/bin/sh
'''exec' \"$(dirname -- \"$(realpath -- \"$0\")\")\"/'python3' \"$0\" \"$@\"
' '''
Expand Down Expand Up @@ -619,11 +646,7 @@ pub fn create_tree(

// In the case of copying bin entries, we need to patch them. Yay.
if link_dir.file_name() == Some(OsStr::new("bin")) {
let mut content = fs::read_to_string(original_entry).into_diagnostic()?;
if content.starts_with("#!/dev/null") {
content.replace_range(..0, &RELOCATABLE_SHEBANG);
}
fs::write(&link_entry, content).into_diagnostic()?;
copy_and_patch_shebang(&original_entry, &link_entry)?;
}
// Normal case of needing to link a file :smile:
else {
Expand Down