-
-
Notifications
You must be signed in to change notification settings - Fork 268
Replace atos backtrace generation with a dSYM loader. #4930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Replace atos backtrace generation with a dSYM loader. #4930
Conversation
Feedback that I gave on Discord, replicating here as requested. The reasons for using |
location.file = info.dli_fname[0..strlen(info.dli_fname)]; | ||
location.procedure = info.dli_sname[0..strlen(info.dli_sname)]; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more or less already handled earlier:
ldc/runtime/druntime/src/core/runtime.d
Lines 925 to 948 in 53ffdf1
// https://code.woboq.org/userspace/glibc/debug/backtracesyms.c.html | |
// The logic that glibc's backtrace use is to check for for `dli_fname`, | |
// the file name, and error if not present, then check for `dli_sname`. | |
// In case `dli_fname` is present but not `dli_sname`, the address is | |
// printed related to the file. We just print the file. | |
static const(char)[] getFrameName (const(void)* ptr) | |
{ | |
import core.sys.posix.dlfcn; | |
Dl_info info = void; | |
// Note: See the module documentation about `-L--export-dynamic` | |
if (dladdr(ptr, &info)) | |
{ | |
// Return symbol name if possible | |
if (info.dli_sname !is null && info.dli_sname[0] != '\0') | |
return info.dli_sname[0 .. strlen(info.dli_sname)]; | |
// Fall back to file name | |
if (info.dli_fname !is null && info.dli_fname[0] != '\0') | |
return info.dli_fname[0 .. strlen(info.dli_fname)]; | |
} | |
// `dladdr` failed | |
return "<ERROR: Unable to retrieve function name>"; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good to know, then this can just be a no-op stub. I am calling it for the day (spent 6 hours getting ldc2 compiling due to fighting with llvm and cmake for some time. Did get it working eventually)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The hook's job is to populate the file
and line
fields of the Location
s, if possible. So the name could reflect that (resolveSourceLocs()
or so).
Edit: Well, that's the primary job at least. There's nothing standing in the way of improving/adding the symbol names too. E.g., to make it work without -L--export-dynamic
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've called it symbolicate since it may, on macOS, also end up handling symbolication of Objective-C which requires another kind of roundtrip not related to DWARF. So in general it's supposed to be a hook that pre-symbolicates stuff (including the name of the procedure if relevant). Then letting the default implementation handle anything left over.
Needs a cleanup function in case you need to allocate and free. Do we need to have a function that processes the entire array at once? Why not a function that works on a single stack frame? |
That gives greatest flexibility and reduces the need for some state/cache somewhere. The implementation will most likely have to read files etc., so processing the whole batch at once makes sense. Edit: See e.g. the default implementation, processing the ldc/runtime/druntime/src/core/internal/backtrace/dwarf.d Lines 405 to 512 in 53ffdf1
|
I did not test this, but it looks like a regression to me (no more line info). I'm OK with making the |
dladdr has been removed and a cleanup hook added. Also moved them be run before and after the rest of the symbol resolution. |
As said in #4895, having good stack traces is important, but those stack traces should not come at the cost of it being practically infeasible to use D to release stuff on the AppStore. At this point D is already capable of being used for that, and at least in my business it's a part of my future strategy to release i(Pad)OS versions of my software. While yes, we could make the runtime only compile in atos stuff in debug mode, that does also mean there's extra friction for developers using DLang to develop apps; and also that it's likely to be useless anyways on Darwin derived OSes that don't have atos. |
What is that extra friction? Do people put debug builds in the app store?
This is not a valid argument to make things worse for macOS. How about providing a separate small library with |
This is not only about app store, it also affects things such as making builds with the hardened runtime. Which can end up making your life a headache if you want to sign and notarize your apps as well (including for macOS). Over all, I don't agree that this is the correct approach here. And yeah, some may want to put release builds with debug info, for example, onto the app store. and also may not know to pass in extra LDC specific flags to link to the non-debug version of druntime, etc. While those can be somewhat solved by making dub more intelligent on the matter, it's just overall a massive ugly hack and your application freezing for a couple of seconds while atos runs every time you generate a stack trace is a little excessive. Also if you only care about macOS, just use lldb, it'll do the symbolication for you when something crashes. It comes with the dev tools. |
Also as a side-note, it's the default behaviour to generate debug symbols for releases with i(Pad)OS apps and the like, xcode does it for you on compile when using swift or Objective-C; so it is over-all expected that you will in fact have the debug symbols there even if it's technically in release mode. So if we for example, just detected whether the auto-generated dSYMs were there (which apple also forcefully does for you once you publish an app) instead of tying it to release-debug; then atos would still possibly be run. Additionally having strings in druntime relating to atos might be enough to trigger a review rejection. |
My opinion is that the easy experience for D should have stack traces with file/line. You need them when you are learning D, so that is what should be the default. I'm OK with making D easier to release on the app store, and I'm hoping there's a way we can make it easy to do (it's OK to require some extra effort for this, with documentation). But it would be bad for the first experience with D to be unreadable stack traces. |
As I said in the other thread. Besides providing these hooks. Making a new utility that embeds dSYMs into the dwarf section might be the way to go. Means LDC avoids ugly hacks in its source tree that just makes it more difficult for existing D users to get their work done. My main goal right now is to ensure we get the ugly hacks rooted out that breaks production software for businesses like mine or Auburn Sounds who rely on D and LDC to make a living. Once that's done it'll be easier to take a wholistic look at how best to approach the shortcomings removing these hacks create, through better tooling or writing custom implementations where needed. |
Yeah my main concern is exactly that - the default experience on a dev box at least should be stack traces with resolved source Locs. Ideally with an acceptable runtime overhead etc., but that's secondary. So a solution that depends on libatos (dynamically, i.e., copes with it not being available) would be fine to me as well - we depend on poorly documented stuff on Darwin already, that whole TLS disaster with macOS 15.4 was caused by Apple removing an API in macOS 10.15, and Jacob having to re-implement those finicky details in upstream druntime again. This stuff breaking was just a question of time I guess, and might happen again anytime. |
One option here might be to add a dedicated object file that is linked automatically for executables on Posix. Pass a switch and it won't do this. That object file can contain the atos stuff, giving the desired default. However, this is unnecessary if |
I'm not sure it can, Apple also scans strings. They'd easily find out that you are trying to load atos or that you are at some point in the application's life cycle. If you try to hide this kind of stuff from them, they might revoke your dev and signing access. |
On a second thought. Im currently down with a bit of a cold; I think I should have a harder think about what kind of tooling could be made to make things work out once I'm feeling better. So do hold off on merging this. |
Note that Using private frameworks directy is another issue, but we are not doing that. Of course, a more elegant solution could always be nice, but might be quite a bit of engineering effort. |
compiler-rt and the ASAN are not compiled in when you make release versions, so there it's irrelevant. The problem is that private APIs shouldn't make their way into production software; while you're developing and testing locally with self signed certs, it doesn't matter as much, even if you may end up with your application crashing due to using these things with no backtrace. At least you know what you're getting into there. |
v1.41.0 final won't take too long anymore, I wanna release it in the next 2 weeks or so. How about making the existing |
I'll get back to this soonish; have some other stuff happening which means I need to focus on some other things. |
I've rewritten this PR to now include a subsystem that can mmap dSYM files when the line sections aren't in the file. If someone more experienced with the DWARF format could give me some hints on how to make this properly work, that'd be great. The file is read and I get no crashes, but so far it seems I don't get any line info. |
Update: I got it working!
|
static SharedObject fromIndex(uint idx) { | ||
return SharedObject( | ||
cast(mach_header_64*)_dyld_get_image_header(idx), | ||
_dyld_get_image_vmaddr_slide(idx), | ||
_dyld_get_image_name(idx) | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the _dyld_get_image_
functions are discouraged from being used. Can we use dladdr
to get the mach_header instead? (not sure how to get the slide).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These specific functions are still documented as usable by Apple in their man pages in Tahoe; so I don't think that's a problem? Alternatively yeah; we'd have to calculate the slide ourselves to account for address space randomisation; which would require a lot more work.
The functions which Apple does not support are the get_image functions that take a mach header, to be specific. The ones that take an image index are still available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I know. I was thinking about this comment: https://github.com/apple-oss-distributions/dyld/blob/93bd81f9d7fcf004fcebcb66ec78983882b41e71/include/mach-o/dyld.h#L46-L52. We already calculate some form of slide: https://github.com/LunaTheFoxgirl/ldc/blob/4e1fa51ef35d77606d86686376a2332ceee93a73/runtime/druntime/src/rt/sections_darwin_64.d#L139.
Perhaps we could get some tests? |
I want to, but I have no idea where to slot in |
The closest example I can think of is ldc/runtime/druntime/test/exceptions/Makefile Lines 125 to 141 in 46bbe8b
If it helps you your example can be roughly translated as: diff --git a/runtime/druntime/test/dsymutil/Makefile b/runtime/druntime/test/dsymutil/Makefile
new file mode 100644
index 0000000000..9059819c2b
--- /dev/null
+++ b/runtime/druntime/test/dsymutil/Makefile
@@ -0,0 +1,49 @@
+ifdef IN_LDC
+# Include this for OS
+include ../../../../dmd/osmodel.mak
+endif
+
+ifeq ($(OS),osx)
+# Only define the tests on macos
+endif
+TESTS := my_example
+
+
+# common.mak picks up TESTS and defines default build rules
+include ../common.mak
+
+# Adds -g to all built executables
+$(OBJDIR)/%$(DOTEXE): private extra_dflags += -g
+
+# Tell make how to produce the .dSYM from an object file
+$(OBJDIR)/%$(DOTEXE).dSYM: $(OBJDIR)/%$(DOTEXE)
+ dsymutil $<
+
+# Defines how to run the tests, they require the executable to be build and dsymutil to have been run.
+# And force use this rule instead of the one created by common.mak
+$(TESTS:%=$(OBJDIR)/%.done): $(OBJDIR)/%.done: $(OBJDIR)/%$(DOTEXE) $(OBJDIR)/%$(DOTEXE).dSYM
+ @echo Testing $*
+
+# Run the program and capture stderr
+# Print an error or something if the program didn't fail
+ if $(TIMELIMIT)$< 2>$(OBJDIR)/$*.stderr; then \
+ echo "Program completed unexpectedly. It should have failed" ; \
+ exit 1 ; \
+ fi
+
+# Check the stderr for a pattern
+ if ! grep -q "^object.Exception@src/my_example.d(10): Test$$" $(OBJDIR)/$*.stderr; then \
+ echo "The stderr is not alright" ; \
+ cat $(OBJDIR)/$*.stderr ; \
+ exit 1 ; \
+ fi
+
+# Or check for an exact file match, using sed to remove unportable address and other stuff
+# If you do this add `%.expected` to the target for make to properly track the dependency
+ if ! sed \
+ "s|^.*/src/|src/|g; s/\[0x[0-9a-f]*\]/\[ADDR\]/g; s/scope //g; s/Nl//g" $(OBJDIR)/$*.stderr \
+ | diff -q $*.expected -; then \
+ echo "The stderr did not match $*.expected exactly" ; \
+ cat $(OBJDIR)/$*.stderr ; \
+ exit 1 ; \
+ fi
diff --git a/runtime/druntime/test/dsymutil/my_example.expected b/runtime/druntime/test/dsymutil/my_example.expected
new file mode 100644
index 0000000000..acc96ac820
--- /dev/null
+++ b/runtime/druntime/test/dsymutil/my_example.expected
@@ -0,0 +1,4 @@
+object.Exception@src/my_example.d(10): Test
+----------------
+src/my_example.d:10 void test.myFunction() [ADDR]
+src/my_example.d:5 _Dmain [ADDR]
\ No newline at end of file
diff --git a/runtime/druntime/test/dsymutil/src/my_example.d b/runtime/druntime/test/dsymutil/src/my_example.d
new file mode 100644
index 0000000000..30e8600295
--- /dev/null
+++ b/runtime/druntime/test/dsymutil/src/my_example.d
@@ -0,0 +1,11 @@
+import std.stdio;
+import core.sys.darwin.dlfcn;
+
+void main() {
+ myFunction();
+}
+
+
+void myFunction() {
+ throw new Exception("Test");
+}
I'm not on a mac so I couldn't test it at all but you can try using this template. |
b662829
to
c3e7ec5
Compare
Tried adapting the suggestion to the exception tests; seems it just broke them instead. |
Can you try:
edit: |
9d4dc12
to
93bc61e
Compare
ok the failing tests there are unrelated to the PR and related to dynamic compile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The style guide is not entirely followed, but I'll let someone else worry about that.
Are there any plans to support reading the debug info from object files as well? |
This would be far more involved work. For now I think just using dSYM is the most sensible way. |
Does this detect stale dSYM files? (I don't see where it does that check) |
I still think we should let other tools do all the heavy lifting, rather than implement something (far inferior?) ourselves and pray it won't break with Apple's next OS release... https://wiki.dwarfstd.org/Apple%27s_%22Lazy%22_DWARF_Scheme.md is a nice read by the way, it mentions the many things that are needed to make this work well. |
It does not; but would be relatively trivial to add. |
Those tools are only available on macOS. This is the only portable way we can do this. Additionally due to the sanity checks I've put in place; if Apple breaks something it shouldn't lead to crashes. This is far more preferable than backtraces only working on macOS, and the backtraces causing the application to freeze for a few seconds; which in debugging situations like on iPadOS can cause the application to be killed by the os for being unresponsive. |
please do. It is very easy to forget to run dsymutil and run into weird issues. Another thing to add is to copy clang's behavior of automatically running dsymutil when building full executable (non-separate compilation); that would achieve (almost, not for separate compilation case) parity with the current state. |
It's not the only way, alternatives debated before.
Be aware of what you are signing up to. If things break, who is going to fix it? Will we again have years (!) of no debug info on macOS?
Preference is subjective, and I expect the vast majority of Apple LDC users to use it for macOS, not on iPadOS etc. I expect there to again be reports of "debug info not working", time will tell. |
Mainly me, my business is gearing up to release software for macOS, iOS, etc. |
This pull-request replaces the atos based backtrace generation with a small dSYM loader that maps the default dSYM location of a executable into the address space of the application.
This loader looks for a .dSYM bundle in the executable's directory, maps it into memory and passes it on to the DWARF state machine. It will not, however, map any dSYMs of loaded shared libraries currently; this is planned to be addressed in a future PR. Symbolicating OS libraries and symbolication by UUID are non-goals.
Outdated Info