-
Notifications
You must be signed in to change notification settings - Fork 34
Logging to directory + log file management; default to log directory for hf_xet #502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
87743d1
to
d077c2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be in xet_runtime? Can we spin this into a 3rd crate?
I 👍 this to be default behavior for hf_xet users, 👎 for all users of xet_runtime, specifically the repo-scanner use of xet-core (needs to be updated but here: https://github.com/huggingface-internal/repository-scanner/blob/main/apps/Cargo.toml#L52). I don't think this functionality needs to be imported in that use case, so shouldn't live in this crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to move this into another crate and make it more explicit. Note currently that a user has to explicitly call the init_logging function with a LoggingConfig instance to set it up; nothing is set up without that explicit call and config instance, so there isn't really default behavior.
I'm a little confused how this relates to the repo scanner, as it looks like that sets up its own logging and doesn't import the xet_runtime part of xet-core directly (it's there through the data crate, but only used parts get compiled in).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the description to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sweet!!
Just took it for a spin. This will be a much better experience for everyone when we're trying to do a first pass of troubleshooting now. Few questions/thoughts:
- A
1GB
max size on this directory is quite large. Outside of the size it takes up on disk it means there will be many log files for users to dig through when forwarding reports. Any reason for it to be this large? - From a troubleshooter's perspective, the value is almost always in the
DEBUG
log statements. Asking a user to rerun a download/upload with the log level set to something new can be painful depending on the user. Any way we can get these by default? - Having this on by default is good for us (and for the user, IMO), but some people may want to turn these off for yet-to-be-known reasons. Might be good to include that as a config option.
I can take on documenting this and updating our issue intake to include mention of this as well once it's released. Great stuff @hoytak
Agree with @jsulz that default to |
Replying to your thoughts @jsulz :
I suspect that for most people, the two week maximum age will be what actually causes the files to be deleted, so most directories will not be anywhere close to that large. However, I'm happy to decrease this.
The problem here is that debug logs a LOT of things that aren't relevant for many things, and it's still a bit expensive to write to a file. Enabling debug logging will noticeably slow down things like dedup lookup, cache querying, etc. quite significantly. We could mitigate this by going through and eliminating the debug statements in lower loops, or changing some to be enabled only for debug builds, but I'd rather go through the current debug statements that are part of the upload or download paths and change them to info statements.
It is all controlled by config variables.
|
Thanks @hoytak!
🤦 if only I had read your description or the code more thoroughly. Agreed! Happy to strike my comment on that front.
Sounds good to me and you make fair points about your concerns here. I think we should ensure the logs print information that helps us adequately debug a customer's issue. I'm not sure the current INFO logs are sufficient on that front.
Apologies, I might be misunderstanding your comment/the PR. I don't see a way to turn off writing to a log file. Is there a way to do that? If not, it might be good to consider it. |
When HF_XET_LOG_DEST="" is set to an empty string, it logs straight to the console, i.e. not a file. Happy to consider something clearer if you think this is an issue. Also, RUST_LOG=off turns off logging completely. |
|
||
use utils::ByteSize; | ||
|
||
utils::configurable_constants! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way at all to disable logging altogether? I think we should have that option but have logging enabled by default.
This setting should also skip the cleanup and creating/verifying the log directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, already built in -- RUST_LOGS=off
(see above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could detect this and skip the log directory stuff completely in this case :-).
} | ||
} | ||
} else { | ||
LoggingMode::Directory(log_directory.as_ref().to_path_buf()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure where to comment, I noticed the log directory is not separated by the CAS environment. Is there a way to do that? To have a different logs directory per environment. If it's not really possible (since we don't know the endpoint at the start, then don't worry about it, it doesn't really matter a lot, especially since we have less endpoints now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but I'm not sure what the advantage of this would be above having the endpoint logged at the beginning of a file? What's the use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have on the same day talked to 2 different cas endpoints, then the logs for the interactions with those 2 endpoints are separated so I can easily tell which logs came from which interactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think CAS endpoint is not important because that's not explicit to users, rather, they care more about the hub endpoint? Users in us and eu access the same hf.co but get different CAS server names, and if we deploy more CAS servers in us-west in the future users in us also get different CAS server names. Placing the logs under different folders and asking them to find the correct folder can be a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR switches the default logging to log events to a file in '~/.cache/huggingface/xet/logs' (or 'xet/logs' under the specified cache directory if not
~/.cache/huggingface/
).In this directory, log files older than 2 weeks are cleaned up on process start, and if the total size of files in the directory is larger than 1gb, then log files are deleted by age to get the directory size under 1gb. Log files are named with a timestamp and PID; by default, logs newer than 1 day or logs with an active associated PID are never deleted. All of these are user configurable constants.