Logging to directory + log file management; default to log directory for hf_xet #502

hoytak · 2025-09-22T18:38:16Z

This PR switches the default logging to log events to a file in '~/.cache/huggingface/xet/logs' (or 'xet/logs' under the specified cache directory if not ~/.cache/huggingface/).

In this directory, log files older than 2 weeks are cleaned up on process start, and if the total size of files in the directory is larger than 1gb, then log files are deleted by age to get the directory size under 1gb. Log files are named with a timestamp and PID; by default, logs newer than 1 day or logs with an active associated PID are never deleted. All of these are user configurable constants.

xet_runtime/src/log_file_management.rs

assafvayner · 2025-09-22T20:24:03Z

xet_runtime/src/lib.rs

why does this need to be in xet_runtime? Can we spin this into a 3rd crate?

I 👍 this to be default behavior for hf_xet users, 👎 for all users of xet_runtime, specifically the repo-scanner use of xet-core (needs to be updated but here: https://github.com/huggingface-internal/repository-scanner/blob/main/apps/Cargo.toml#L52). I don't think this functionality needs to be imported in that use case, so shouldn't live in this crate.

I'm happy to move this into another crate and make it more explicit. Note currently that a user has to explicitly call the init_logging function with a LoggingConfig instance to set it up; nothing is set up without that explicit call and config instance, so there isn't really default behavior.

I'm a little confused how this relates to the repo scanner, as it looks like that sets up its own logging and doesn't import the xet_runtime part of xet-core directly (it's there through the data crate, but only used parts get compiled in).

Updated the description to clarify this.

xet_runtime/src/logging.rs

jsulz

This is sweet!!

Just took it for a spin. This will be a much better experience for everyone when we're trying to do a first pass of troubleshooting now. Few questions/thoughts:

A 1GB max size on this directory is quite large. Outside of the size it takes up on disk it means there will be many log files for users to dig through when forwarding reports. Any reason for it to be this large?
From a troubleshooter's perspective, the value is almost always in the DEBUG log statements. Asking a user to rerun a download/upload with the log level set to something new can be painful depending on the user. Any way we can get these by default?
Having this on by default is good for us (and for the user, IMO), but some people may want to turn these off for yet-to-be-known reasons. Might be good to include that as a config option.

I can take on documenting this and updating our issue intake to include mention of this as well once it's released. Great stuff @hoytak

seanses · 2025-09-23T21:46:24Z

From a troubleshooter's perspective, the value is almost always in the DEBUG log statements. Asking a user to rerun a download/upload with the log level set to something new can be painful depending on the user. Any way we can get these by default?

Agree with @jsulz that default to DEBUG when logging to file

xet_runtime/src/logging.rs

xet_logging/src/logging.rs

xet_runtime/src/logging.rs

xet_logging/src/logging.rs

…base-logging

hoytak · 2025-09-24T23:30:16Z

Replying to your thoughts @jsulz :

A 1GB max size on this directory is quite large. Outside of the size it takes up on disk it means there will be many log files for users to dig through when forwarding reports. Any reason for it to be this large?

I suspect that for most people, the two week maximum age will be what actually causes the files to be deleted, so most directories will not be anywhere close to that large. However, I'm happy to decrease this.

From a troubleshooter's perspective, the value is almost always in the DEBUG log statements. Asking a user to rerun a download/upload with the log level set to something new can be painful depending on the user. Any way we can get these by default?

The problem here is that debug logs a LOT of things that aren't relevant for many things, and it's still a bit expensive to write to a file. Enabling debug logging will noticeably slow down things like dedup lookup, cache querying, etc. quite significantly. We could mitigate this by going through and eliminating the debug statements in lower loops, or changing some to be enabled only for debug builds, but I'd rather go through the current debug statements that are part of the upload or download paths and change them to info statements.

Having this on by default is good for us (and for the user, IMO), but some people may want to turn these off for yet-to-be-known reasons. Might be good to include that as a config option.

It is all controlled by config variables. HF_XET_LOG_DEST="" changes it to log to console, HF_XET_LOG_DEST=<file name> logs it to a file, and HF_XET_LOG_DEST=<dir name> logs it to a directory without cleanup work.

I can take on documenting this and updating our issue intake to include mention of this as well once it's released. Great stuff @hoytak

jsulz · 2025-09-25T01:43:24Z

Thanks @hoytak!

I suspect that for most people, the two week maximum age will be what actually causes the files to be deleted

🤦 if only I had read your description or the code more thoroughly. Agreed! Happy to strike my comment on that front.

but I'd rather go through the current debug statements that are part of the upload or download paths and change them to info statements.

Sounds good to me and you make fair points about your concerns here. I think we should ensure the logs print information that helps us adequately debug a customer's issue. I'm not sure the current INFO logs are sufficient on that front.

It is all controlled by config variables. HF_XET_LOG_DEST="" changes it to log to console, HF_XET_LOG_DEST= logs it to a file, and HF_XET_LOG_DEST=
logs it to a directory without cleanup work.

Apologies, I might be misunderstanding your comment/the PR. I don't see a way to turn off writing to a log file. Is there a way to do that? If not, it might be good to consider it.

hoytak · 2025-09-29T18:28:15Z

It is all controlled by config variables. HF_XET_LOG_DEST="" changes it to log to console, HF_XET_LOG_DEST= logs it to a file, and HF_XET_LOG_DEST= logs it to a directory without cleanup work.

Apologies, I might be misunderstanding your comment/the PR. I don't see a way to turn off writing to a log file. Is there a way to do that? If not, it might be good to consider it.

When HF_XET_LOG_DEST="" is set to an empty string, it logs straight to the console, i.e. not a file. Happy to consider something clearer if you think this is an issue.

Also, RUST_LOG=off turns off logging completely.

…gging

assafvayner · 2025-10-02T21:43:34Z

xet_logging/src/constants.rs

+
+use utils::ByteSize;
+
+utils::configurable_constants! {


Is there a way at all to disable logging altogether? I think we should have that option but have logging enabled by default.

This setting should also skip the cleanup and creating/verifying the log directory.

Yes, already built in -- RUST_LOGS=off (see above).

I could detect this and skip the log directory stuff completely in this case :-).

assafvayner · 2025-10-02T21:45:16Z

xet_logging/src/config.rs

+                    }
+                }
+            } else {
+                LoggingMode::Directory(log_directory.as_ref().to_path_buf())


not sure where to comment, I noticed the log directory is not separated by the CAS environment. Is there a way to do that? To have a different logs directory per environment. If it's not really possible (since we don't know the endpoint at the start, then don't worry about it, it doesn't really matter a lot, especially since we have less endpoints now.

We could, but I'm not sure what the advantage of this would be above having the endpoint logged at the beginning of a file? What's the use case?

If I have on the same day talked to 2 different cas endpoints, then the logs for the interactions with those 2 endpoints are separated so I can easily tell which logs came from which interactions

I think CAS endpoint is not important because that's not explicit to users, rather, they care more about the hub endpoint? Users in us and eu access the same hf.co but get different CAS server names, and if we deploy more CAS servers in us-west in the future users in us also get different CAS server names. Placing the logs under different folders and asking them to find the correct folder can be a bit confusing.

seanses

LGTM!

hoytak requested review from assafvayner, jsulz, rajatarya and seanses September 22, 2025 18:38

Switch logging to directory by default.

d077c2c

hoytak force-pushed the hoytak/250908-base-logging branch from 87743d1 to d077c2c Compare September 22, 2025 18:39

Fixed clippy issues.

1da3a56

assafvayner reviewed Sep 22, 2025

View reviewed changes

jsulz reviewed Sep 23, 2025

View reviewed changes