Skip to content

Conversation

hoytak
Copy link
Collaborator

@hoytak hoytak commented Sep 22, 2025

This PR switches the default logging to log events to a file in '~/.cache/huggingface/xet/logs' (or 'xet/logs' under the specified cache directory if not ~/.cache/huggingface/).

In this directory, log files older than 2 weeks are cleaned up on process start, and if the total size of files in the directory is larger than 1gb, then log files are deleted by age to get the directory size under 1gb. Log files are named with a timestamp and PID; by default, logs newer than 1 day or logs with an active associated PID are never deleted. All of these are user configurable constants.

@hoytak hoytak force-pushed the hoytak/250908-base-logging branch from 87743d1 to d077c2c Compare September 22, 2025 18:39
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be in xet_runtime? Can we spin this into a 3rd crate?

I 👍 this to be default behavior for hf_xet users, 👎 for all users of xet_runtime, specifically the repo-scanner use of xet-core (needs to be updated but here: https://github.com/huggingface-internal/repository-scanner/blob/main/apps/Cargo.toml#L52). I don't think this functionality needs to be imported in that use case, so shouldn't live in this crate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to move this into another crate and make it more explicit. Note currently that a user has to explicitly call the init_logging function with a LoggingConfig instance to set it up; nothing is set up without that explicit call and config instance, so there isn't really default behavior.

I'm a little confused how this relates to the repo scanner, as it looks like that sets up its own logging and doesn't import the xet_runtime part of xet-core directly (it's there through the data crate, but only used parts get compiled in).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the description to clarify this.

Copy link
Collaborator

@jsulz jsulz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sweet!!

Just took it for a spin. This will be a much better experience for everyone when we're trying to do a first pass of troubleshooting now. Few questions/thoughts:

  1. A 1GB max size on this directory is quite large. Outside of the size it takes up on disk it means there will be many log files for users to dig through when forwarding reports. Any reason for it to be this large?
  2. From a troubleshooter's perspective, the value is almost always in the DEBUG log statements. Asking a user to rerun a download/upload with the log level set to something new can be painful depending on the user. Any way we can get these by default?
  3. Having this on by default is good for us (and for the user, IMO), but some people may want to turn these off for yet-to-be-known reasons. Might be good to include that as a config option.

I can take on documenting this and updating our issue intake to include mention of this as well once it's released. Great stuff @hoytak

@seanses
Copy link
Collaborator

seanses commented Sep 23, 2025

From a troubleshooter's perspective, the value is almost always in the DEBUG log statements. Asking a user to rerun a download/upload with the log level set to something new can be painful depending on the user. Any way we can get these by default?

Agree with @jsulz that default to DEBUG when logging to file

@hoytak hoytak changed the title Default to file based logging Logging to directory + log file management; default to log directory for hf_xet Sep 24, 2025
@hoytak
Copy link
Collaborator Author

hoytak commented Sep 24, 2025

Replying to your thoughts @jsulz :

  1. A 1GB max size on this directory is quite large. Outside of the size it takes up on disk it means there will be many log files for users to dig through when forwarding reports. Any reason for it to be this large?

I suspect that for most people, the two week maximum age will be what actually causes the files to be deleted, so most directories will not be anywhere close to that large. However, I'm happy to decrease this.

  1. From a troubleshooter's perspective, the value is almost always in the DEBUG log statements. Asking a user to rerun a download/upload with the log level set to something new can be painful depending on the user. Any way we can get these by default?

The problem here is that debug logs a LOT of things that aren't relevant for many things, and it's still a bit expensive to write to a file. Enabling debug logging will noticeably slow down things like dedup lookup, cache querying, etc. quite significantly. We could mitigate this by going through and eliminating the debug statements in lower loops, or changing some to be enabled only for debug builds, but I'd rather go through the current debug statements that are part of the upload or download paths and change them to info statements.

  1. Having this on by default is good for us (and for the user, IMO), but some people may want to turn these off for yet-to-be-known reasons. Might be good to include that as a config option.

It is all controlled by config variables. HF_XET_LOG_DEST="" changes it to log to console, HF_XET_LOG_DEST=<file name> logs it to a file, and HF_XET_LOG_DEST=<dir name> logs it to a directory without cleanup work.

I can take on documenting this and updating our issue intake to include mention of this as well once it's released. Great stuff @hoytak

@jsulz
Copy link
Collaborator

jsulz commented Sep 25, 2025

Thanks @hoytak!

I suspect that for most people, the two week maximum age will be what actually causes the files to be deleted

🤦 if only I had read your description or the code more thoroughly. Agreed! Happy to strike my comment on that front.

but I'd rather go through the current debug statements that are part of the upload or download paths and change them to info statements.

Sounds good to me and you make fair points about your concerns here. I think we should ensure the logs print information that helps us adequately debug a customer's issue. I'm not sure the current INFO logs are sufficient on that front.

It is all controlled by config variables. HF_XET_LOG_DEST="" changes it to log to console, HF_XET_LOG_DEST= logs it to a file, and HF_XET_LOG_DEST=

logs it to a directory without cleanup work.

Apologies, I might be misunderstanding your comment/the PR. I don't see a way to turn off writing to a log file. Is there a way to do that? If not, it might be good to consider it.

@hoytak
Copy link
Collaborator Author

hoytak commented Sep 29, 2025

It is all controlled by config variables. HF_XET_LOG_DEST="" changes it to log to console, HF_XET_LOG_DEST= logs it to a file, and HF_XET_LOG_DEST= logs it to a directory without cleanup work.

Apologies, I might be misunderstanding your comment/the PR. I don't see a way to turn off writing to a log file. Is there a way to do that? If not, it might be good to consider it.

When HF_XET_LOG_DEST="" is set to an empty string, it logs straight to the console, i.e. not a file. Happy to consider something clearer if you think this is an issue.

Also, RUST_LOG=off turns off logging completely.


use utils::ByteSize;

utils::configurable_constants! {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way at all to disable logging altogether? I think we should have that option but have logging enabled by default.

This setting should also skip the cleanup and creating/verifying the log directory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, already built in -- RUST_LOGS=off (see above).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could detect this and skip the log directory stuff completely in this case :-).

}
}
} else {
LoggingMode::Directory(log_directory.as_ref().to_path_buf())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure where to comment, I noticed the log directory is not separated by the CAS environment. Is there a way to do that? To have a different logs directory per environment. If it's not really possible (since we don't know the endpoint at the start, then don't worry about it, it doesn't really matter a lot, especially since we have less endpoints now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I'm not sure what the advantage of this would be above having the endpoint logged at the beginning of a file? What's the use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have on the same day talked to 2 different cas endpoints, then the logs for the interactions with those 2 endpoints are separated so I can easily tell which logs came from which interactions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CAS endpoint is not important because that's not explicit to users, rather, they care more about the hub endpoint? Users in us and eu access the same hf.co but get different CAS server names, and if we deploy more CAS servers in us-west in the future users in us also get different CAS server names. Placing the logs under different folders and asking them to find the correct folder can be a bit confusing.

Copy link
Collaborator

@seanses seanses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants