diff --git a/docs/StackTraceSymbolicResolution.md b/docs/StackTraceSymbolicResolution.md new file mode 100644 index 0000000..1583ad6 --- /dev/null +++ b/docs/StackTraceSymbolicResolution.md @@ -0,0 +1,283 @@ +# Resolving Stack Traces With `procmon-parser` + +## Limitations and Constraints + +Symbolic stack trace resolution has the following limitations: + +* It is based on Windows libraries, thus it is only available on **Windows** systems. +* It requires an Internet connection. +* It connects and download files from a Microsoft Server. + - By doing so, it requires you to accept Microsoft License terms. + +## Basics + +All events in a ProcMon trace have a [stack trace](https://en.wikipedia.org/wiki/Stack_trace). + +Below is an example of an event in a ProcMon capture: + +![Event](./pictures/event.png) + +The event captured is a thread creation in `explorer.exe`. + +If you double-click the event, you are brought to a new window with 3 tabs, the interesting one in our case being the +stack trace tab. Below is an example of the aforementioned event with an **unresolved** stack trace: + +![Stack Trace No Symbols](./pictures/stack_trace_no_symbols.png) + +And the same event with a **resolved** stack trace: + +![Stack Trace No Symbols](./pictures/stack_trace_with_symbols.png) + +In the above pictures, after symbolic resolution of the addresses, the latter were resolved to their function names and +offsets and sometimes the source code position where the call happens (frames 12 and 13). + +Stack Traces are composed of frames (there are 25 frames in the above example), and is read from bottom to top: the +oldest call happens at the bottom and goes to the top, traversing all the frames in-between. Once the top function +returns, all the frames are unstacked and eventually the code flow goes back to the first frame (at the bottom). + +## Resolving a Stack Trace with `procmon-parser` + +Resolving a stack trace in `procmon-parser` can be as simple as follows: + +```python +#!/usr/bin/env python3 +# -*- coding:utf-8 -*- +import pathlib +import sys + +from procmon_parser import ProcmonLogsReader, SymbolResolver, StackTraceInformation + +def main(): + log_file = pathlib.Path(r"c:\temp\Logfile.PML") + + with log_file.open("rb") as f: + procmon_reader = ProcmonLogsReader(f) + symbol_resolver = SymbolResolver(procmon_reader) + for idx, event in enumerate(procmon_reader): + if idx == 213: + frames = list(symbol_resolver.resolve_stack_trace(event)) + print(StackTraceInformation.prettify(frames)) + +if __name__ == "__main__": + sys.exit(main()) +``` + +## Setting Up Stack Trace Resolution In `procmon-parser` + +### Obtaining Required Windows Libraries + +Stack trace resolution uses 2 Windows DLLs: + +* `dbghelp.dll` ([official documentation](https://learn.microsoft.com/en-us/windows/win32/debug/debug-help-library)) + - Provide symbol resolution functionalities. +* `symsrv.dll` ([official documentation](https://learn.microsoft.com/en-us/windows/win32/debug/using-symsrv)) + - Symbol file management (mostly downloading symbolic information from a symbol store). + +While `dbghelp.dll` is provided with Windows systems (it's located in `%WINDIR%\system32`) this DLL might be out of date +on some systems and thus missing various functionalities (as [explained here](https://learn.microsoft.com/en-us/windows/win32/debug/dbghelp-versions)). +`symsrv.dll`, on the other hand, does not ship with Windows systems. + +Both DLLs can be acquired from various Microsoft products, notably: + +* [Debugging Tools for Windows](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/) +* [Windbg Preview](https://apps.microsoft.com/store/detail/windbg-preview/9PGJGD53TN86) +* [Visual Studio](https://visualstudio.microsoft.com/downloads/) + +The official and [**recommended way**](https://learn.microsoft.com/en-us/windows/win32/debug/dbghelp-versions) is to +install the `Debugging Tools For Windows` from the Windows SDK (please note that the SDK installer allows to only +install the *Debugging Tools for Windows* and not the whole SDK). + +**Important**: `procmon-parser` will try to find the correct path to the Debugging Tools For Windows and Windbg Preview +and then automatically provide the path to the DLLs matching the Python interpreter architecture. It does not, however, +try to find the DLLs from a Visual Studio installation. + +Be sure to use the DLLs that matches your interpreter architecture. For example, the *Debugging Tools For Windows* comes +with 4 different architectures: x86, x64, arm(32) and arm64: + +``` +neitsa@lab:c/Program Files (x86)/Windows Kits/10/Debuggers$ tree -L 1 +. +├── Redist +├── arm +├── arm64 +├── ddk +├── inc +├── lib +├── x64 +└── x86 +``` + +You can get your Python interpreter architecture by using the `platform` module for example: + +``` +>>> import platform +>>> platform.architecture() +('64bit', 'WindowsPE') +``` + +Thus, the directory in the *Debugging Tools For Windows* would be the `x64` one since the Python interpreter is a 64-bit +one. + +### Symsrv and Microsoft License Terms + +Microsoft's symbol servers (located at https://msdl.microsoft.com/download/symbols/), provides access to +symbols for the operating system itself. The `symsrv.dll` library requires agreement to Microsoft's +*"Terms of Use for Microsoft Symbols and Binaries."* ([visible here](https://learn.microsoft.com/en-us/legal/windows-sdk/microsoft-symbol-server-license-terms)). + +On your first usage of the symbolic resolution, the `symsrv.dll` may display a prompt requiring you to accept the +aforementioned *Terms of Use* if you wish to continue further. + +To automatically indicate agreement to the terms, you may create a file called `symsrv.yes` (there's no need to put +something in the file) in the same directory as the `symsrv.dll` library (Note that `symsrv.dll` will also recognize a +`symsrv.no` file as indicating that you do not accept the terms; the `.yes` file takes priority over the `.no` file.). + +It is also possible to view the terms from within the WinDbg debugger (included in the *Debugging Tools for Windows*) +by removing any `symsrv.yes` and `symsrv.no` files from WinDbg's directory, setting the symbol path to include +Microsoft's symbol server (using the `.sympath` command), and attempting to load symbols from their server (`.reload` +command). + +## Advanced Usage + +### Symbol Download Location + +The [_NT_SYMBOL_PATH](https://learn.microsoft.com/en-us/windows/win32/debug/using-symsrv#setting-the-symbol-path) +environment variable is the official way to set the location where the symbols are going to be stored. + +Symbols files may need to be downloaded from a symbol store, in which case the following algorithm takes place in the +`SymbolResolver` class: + +* if `_NT_SYMBOL_PATH` environment variable is set: + - Use `_NT_SYMBOL_PATH` location to put symbol files. + - if `symbol_path` constructor argument is set: + - Do not use `_NT_SYMBOL_PATH` but use the provided symbol path instead. +* else + - Use `%TEMP%` directory. + +Note that using the `%TEMP%` directory may require to download the symbol between each computer reboot. The is most of +the time a lengthy operation, even with a fast internet connection. + +The basic syntax of the `_NT_SYMBOL_PATH` environment variable (and therefore the `symbol_path` constructor argument) is +as follows: + +``` +srv**https://msdl.microsoft.com/download/symbols/ +``` + +Where `` must be an **existing directory** which is **writable** by any user. For example: + +``` +srv*c:\symbols*https://msdl.microsoft.com/download/symbols/ +``` + +For more information on the various possibilities for setting up the environment variable, please refer to the +[official documentation](https://learn.microsoft.com/en-us/windows/win32/debug/using-symsrv). + +### Copying DLLs + +If, for any reason, you do not wish to install the *Debugging Tools For Windows* on a particular machine (e.g. a virtual +machine) but already have it installed on another machine you can copy and paste both DLLs (`dbghelp.dll` and +`symsrv.dll`) from the *Debugging Tools For Windows* onto the target machine, preferably in their own (writable) +directory but **not** a system one (never erase the default one in `%WINDIR%\System32`). Both DLLs must reside alongside each other. + +In case you would want to provide a different path for the DLLs, you can use the `dll_dir_path` parameter of the +`SymbolResolver` class: + +```python +#!/usr/bin/env python3 +# -*- coding:utf-8 -*- +import pathlib +import sys + +from procmon_parser import ProcmonLogsReader, SymbolResolver + +def main(): + log_file = pathlib.Path(r"c:\temp\Logfile.PML") + dll_dir_path = r"c:\tmp\my_debug_dll_dir" + + with log_file.open("rb") as f: + procmon_reader = ProcmonLogsReader(f) + # disable automatic retrieval of dbghelp.dll and symsrv.dll, and use the provided path instead. + # It must contain at least both DLLs: + # - from the same provider (e.g. Debugging tools for Windows) + # - and the same architecture (e.g. Debugging tools for Windows '\x64' directory). + symbol_resolver = SymbolResolver(procmon_reader, dll_dir_path=dll_dir_path) + # ... + +if __name__ == "__main__": + sys.exit(main()) +``` + +### Skipping `symsrv.dll` Check + +The `SymbolResolver` class in `procmon-parser` checks if both `dbghelp` and `symsrv` DLLs are present in the provided +directory (if you pass it through the `dll_dir_path` parameter as explained above). + +If you have offline symbols already available (for example, by having previously used the `symchk` +([documentation](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/using-symchk)) tool from the +*Debugging Tools For Windows*), and do not want to connect your machine to the Internet, you can skip the `symsrv.dll` +automatic check by using the `skip_symsrv` parameter of the `SymbolResolver` class: + +```python +#!/usr/bin/env python3 +# -*- coding:utf-8 -*- +import pathlib +import sys + +from procmon_parser import ProcmonLogsReader, SymbolResolver + +def main(): + log_file = pathlib.Path(r"c:\temp\Logfile.PML") + # does not contain symsrv.dll + dll_dir_path = r"c:\tmp\my_debug_dll_dir" + + with log_file.open("rb") as f: + procmon_reader = ProcmonLogsReader(f) + # disable automatic retrieval of dbghelp.dll and symsrv.dll, and use the provided path instead. + # skip entirely the check for symsrv.dll. + # Use **only** if you know that you already have the necessary symbols! + symbol_resolver = SymbolResolver(procmon_reader, dll_dir_path=dll_dir_path, skip_symsrv=True) + # ... + +if __name__ == "__main__": + sys.exit(main()) +``` + +## Debugging Symbol Resolution Problem + +If symbol resolution is not working as expected, you can pass a callback function - using the `debug_callback` named +parameter - to the `SymbolResolver` constructor, as follows: + +```python +#!/usr/bin/env python3 +# -*- coding:utf-8 -*- +import pathlib +import sys + +from procmon_parser import ProcmonLogsReader, SymbolResolver, CBA + +def symbol_debug_callback(handle: int, action_code: CBA | int, callback_data: str, user_context: int): + if action_code == CBA.CBA_DEBUG_INFO: + print(f"[DEBUG MESSAGE DBGHELP: CBA_DEBUG_INFO] {callback_data}") + return 1 + return 0 + +def main(): + log_file = pathlib.Path(r"c:\temp\Logfile.PML") + + with log_file.open("rb") as f: + procmon_reader = ProcmonLogsReader(f) + symbol_resolver = SymbolResolver(procmon_reader, debug_callback=symbol_debug_callback) + # ... + +if __name__ == "__main__": + sys.exit(main()) +``` + +The callback function mimics the [PSYMBOL_REGISTERED_CALLBACK64 ](https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nc-dbghelp-psymbol_registered_callback64) +Windows' API callback function. As of now only the `CBA.CBA_DEBUG_INFO` action code is handled internally. + +To indicate success handling the `CBA` code, the function **must** return 1. To indicate failure handling the code, +return 0. If your code does not handle a particular code, you should also return 0. (Returning 1 in this case may have +unintended consequences.) + +This will print a lot of information that is helpful debugging symbol retrieval problems. diff --git a/docs/pictures/event.png b/docs/pictures/event.png new file mode 100644 index 0000000..55e9f5c Binary files /dev/null and b/docs/pictures/event.png differ diff --git a/docs/pictures/stack_trace_no_symbols.png b/docs/pictures/stack_trace_no_symbols.png new file mode 100644 index 0000000..b6232cf Binary files /dev/null and b/docs/pictures/stack_trace_no_symbols.png differ diff --git a/docs/pictures/stack_trace_with_symbols.png b/docs/pictures/stack_trace_with_symbols.png new file mode 100644 index 0000000..18d98bb Binary files /dev/null and b/docs/pictures/stack_trace_with_symbols.png differ diff --git a/procmon_parser/__init__.py b/procmon_parser/__init__.py index 390bc44..8c1e90c 100644 --- a/procmon_parser/__init__.py +++ b/procmon_parser/__init__.py @@ -1,3 +1,5 @@ +import sys + from six import PY2 from procmon_parser.configuration import * @@ -11,6 +13,12 @@ 'Rule', 'Column', 'RuleAction', 'RuleRelation', 'PMLError' ] +if sys.platform == "win32": + from procmon_parser.symbol_resolver.symbol_resolver import ( + SymbolResolver, StackTraceFrameInformation, StackTraceInformation, CBA) + + __all__.extend(['SymbolResolver', 'StackTraceFrameInformation', 'StackTraceInformation', 'CBA']) + class ProcmonLogsReader(object): """Reads procmon logs from a stream which in the PML format @@ -44,6 +52,12 @@ def __getitem__(self, index): def __len__(self): return self._struct_readear.number_of_events + @property + def maximum_application_address(self): + """Return the highest possible user land address. + """ + return self._struct_readear.maximum_application_address + def processes(self): """Return a list of all the known processes in the log file """ diff --git a/procmon_parser/logs.py b/procmon_parser/logs.py index 113c10b..06d6c93 100644 --- a/procmon_parser/logs.py +++ b/procmon_parser/logs.py @@ -309,6 +309,12 @@ def get_event_at_offset(self, offset): def number_of_events(self): return self.header.number_of_events + @property + def maximum_application_address(self): + """Return the highest possible user land address. + """ + return self.header.maximum_application_address + def processes(self): """Return a list of all the known processes in the log file """ diff --git a/procmon_parser/stream_logs_format.py b/procmon_parser/stream_logs_format.py index 83f318f..c756554 100644 --- a/procmon_parser/stream_logs_format.py +++ b/procmon_parser/stream_logs_format.py @@ -36,7 +36,9 @@ def __init__(self, io): # Docs of this table's layout are in "docs\PML Format.md" self.icon_table_offset = read_u64(stream) - stream.seek(12, 1) # Unknown fields + self.maximum_application_address = read_u64(stream) + + self.os_version_info_size = read_u32(stream) self.windows_major_number = read_u32(stream) self.windows_minor_number = read_u32(stream) self.windows_build_number = read_u32(stream) diff --git a/procmon_parser/symbol_resolver/__init__.py b/procmon_parser/symbol_resolver/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/procmon_parser/symbol_resolver/symbol_resolver.py b/procmon_parser/symbol_resolver/symbol_resolver.py new file mode 100644 index 0000000..a913533 --- /dev/null +++ b/procmon_parser/symbol_resolver/symbol_resolver.py @@ -0,0 +1,776 @@ +#!/usr/bin/env python3 +# -*- coding:utf-8 -*- +"""Module used to resolve symbolic information for a given a stack trace. +""" +import contextlib +import ctypes +import enum +import logging +import os +import platform +import re +import sys + +import procmon_parser + +if sys.platform != "win32": + raise RuntimeError("Symbol Resolver can only be used on Windows Operating Systems.") + +_ver = sys.version_info[:3] +if _ver >= (3, 5, 0): + import winreg + import typing + import pathlib +elif _ver <= (2, 7, 18): + import pathlib2 as pathlib + import _winreg as winreg + +from procmon_parser.symbol_resolver.win.dbghelp import ( + DbgHelp, PFINDFILEINPATHCALLBACK, SYMBOL_INFOW, IMAGEHLP_LINEW64, SYMOPT, SSRVOPT, PSYMBOL_REGISTERED_CALLBACK64, + CBA, PIMAGEHLP_CBA_EVENTW) +from procmon_parser.symbol_resolver.win.win_types import PVOID, HANDLE, DWORD64, DWORD, ULONG, ULONG64, BOOL +from procmon_parser.symbol_resolver.win.win_consts import MAX_PATH, ERROR_FILE_NOT_FOUND + +logger = logging.getLogger(__name__) + + +@enum.unique +class FrameType(enum.Enum): + """Type of frame in a stack trace. Either User (the frame lies in user mode) or Kernel (the frame lies in kernel + mode). + """ + KERNEL = enum.auto() + USER = enum.auto() + + @staticmethod + def from_address(address, max_user_address): + # type: (int, int) -> "FrameType" + """Get the type of frame given an address and the maximum possible user address. + + Args: + address: The address for which the FrameType should be obtained. + max_user_address: The maximum possible user address. + + Returns: + A `FrameType` which corresponds to the given address. + """ + return FrameType.KERNEL if address > max_user_address else FrameType.USER + + +class StackTraceFrameInformation(object): + def __init__(self, + frame_type, # type: FrameType + frame_number, # type: int + address, # type: int + module=None, # type: procmon_parser.Module | None + symbol_info=None, # type: SYMBOL_INFOW | None + displacement=None, # type: int | None + line_info=None, # type: IMAGEHLP_LINEW64 | None + line_displacement=None, # type: int | None + source_file_path=None # type: str | None + ): + # type: (...) -> None + """Contain various symbolic information about a frame in a stacktrace. + """ + # Type of the frame, either Kernel or User. + self.frame_type = frame_type + # The frame number (its position in the stack trace). + self.frame_number = frame_number + # Address of the symbol, at which the frame happens. + self.address = address + # The module inside which the frame happens. + self.module = module + # Symbolic information about the frame. + self.symbol_info = symbol_info + # The displacement in regard to the symbol. + # For example if the symbol name is 'foo' and the displacement is 0x10, then the frame happened at 'foo + 0x10'. + self.displacement = displacement + # Line information in regard to the symbol (available only if symbolic source information is present). + self.line_info = line_info + # Displacement from the source code line (that is, the column in the source code line). + self.line_displacement = line_displacement + # The source code full file path at which the frame happened. + self.source_file_path = source_file_path + + @property + def frame(self): + # type: () -> str + """Return a string representation of a frame (its `FrameType` and frame number). + """ + return "{frame_type.name[0]} {frame_number}".format( + frame_type=self.frame_type, frame_number=self.frame_number) + + @property + def location(self): + # type: () -> str + """Return a string representation of the symbolic location at which the frame happens. + """ + if self.symbol_info is None: + return "{address:#x}".format(address=self.address) + + # symbolic information (symbol + asm offset) + sym_str = "{symbol_info.Name} + {displacement:#x}".format( + symbol_info=self.symbol_info, displacement=self.displacement) + if self.line_info is None: + return sym_str + + # line information + path = self.source_file_path if self.source_file_path else self.line_info.FileName + line_str = "{path} ({line_info.LineNumber}; col: {line_displacement})".format( + path=path, line_info=self.line_info, line_displacement=self.line_displacement + ) + + return "{sym_str}, {line_str}".format(sym_str=sym_str, line_str=line_str) + + @property + def module_name(self): + # type: () -> str + """Return a string representation of the frame main module name. + """ + if self.module is None or not self.module.path: + return "" + + return pathlib.Path(self.module.path).name + + @property + def module_path(self): + # type: () -> str + """Return a string representation of the frame main module fully qualified path. + """ + if self.module is None or not self.module.path: + return "" + + return self.module.path + + def __repr__(self): + # type: () -> str + return "{frame} {module_name} {location} {address:#x} {module_path}".format( + frame=self.frame, module_name=self.module_name, location=self.location, address=self.address, + module_path=self.module_path) + + +class StackTraceInformation(object): + """Class used to prettify a whole stack trace so its output if similar to ProcMon's stack trace window tab for an + event. + """ + + @staticmethod + def prettify(resolved_stack_trace): + # type: (list[StackTraceFrameInformation]) -> str + """Prettify a list of `StackTraceFrameInformation` so its output is similar to the one given by ProcMon. + + Args: + resolved_stack_trace: A list of stack trace frame information. + + Returns: + A string that match closely the output of a stack trace from ProcMon. + """ + if not resolved_stack_trace: + return "" + + max_frame = max(len(stfi.frame) for stfi in resolved_stack_trace) + max_module = max(len(stfi.module_name) for stfi in resolved_stack_trace) + max_location = max(len(stfi.location) for stfi in resolved_stack_trace) + max_address = max(len("{stfi.address:#x}".format(stfi=stfi)) for stfi in resolved_stack_trace) + + output = list() + for stfi in resolved_stack_trace: + output.append("{stfi.frame:<{max_frame}} {stfi.module_name:<{max_module}} {stfi.location:<{max_location}} " + "0x{stfi.address:<{max_address}x} {stfi.module_path}".format( + stfi=stfi, max_frame=max_frame, max_module=max_module, max_location=max_location, + max_address=max_address) + ) + + return '\n'.join(output) + + +class SymbolResolver(object): + """Main workhorse class for resolving symbolic information from a stack trace. + """ + + def __init__(self, + procmon_logs_reader, # type: procmon_parser.ProcmonLogsReader + dll_dir_path=None, # type: str | pathlib.Path | None + skip_symsrv=False, # type: bool + symbol_path=None, # type: str + debug_callback=None # type: typing.Callable[[int, CBA | int, str, int], int] + ): + # type: (...) -> None + """Class Initialisation. + + Args: + procmon_logs_reader: An instance of the `ProcmonLogsReader` class. + dll_dir_path: Path to a directory containing at least `dbghelp.dll`, and optionally `symsrv.dll`. + skip_symsrv: Set to True if symbols are available locally on the machine and `_NT_SYMBOL_PATH` environment + variable is correctly set. This skips the check for `symsrv.dll` presence altogether. + symbol_path: Replace the `_NT_SYMBOL_PATH` environment variable if it exists, or prevent using %TEMP% as + the download location of the symbol files. This must be a string compatible with the `_NT_SYMBOL_PATH` + syntax. + debug_callback: A callback which can be used to understand and debug problems with symbol downloading and + resolution. + + Notes: + If `dll_dir_path` is None, then the code does its best to find matching installations of the Debugging Tools + for Windows (can be installed from the Windows SDK) and Windbg Preview (installed from the Windows Store). + If neither can be found, the function raises. + + Raises: + ValueError: + The provided DLL path is not a valid directory, does not contain the required DLL(s) or the automatic + finder could not find the required DLL. Also raises if `skip_symsrv` is True but _NT_SYMBOL_PATH env. + var. is not set. + RuntimeError: + The initialisation couldn't get the system modules. + """ + # Check if we can find the needed DLLs if not path has been provided. + # Both DLLs are needed to resolve symbolic information. + # * 'dbghelp.dll' contains the functionalities to resolve symbols. + # * 'symsrv.dll' downloads symbol from the symbol store. + if dll_dir_path is None: + dll_dir_path = next( + (v for v in [DbgHelpUtils.find_debugging_tools(), DbgHelpUtils.find_windbg_preview()] if v is not None), + None) + if not dll_dir_path: + raise ValueError("You need to provide a valid path to 'dbghelp.dll' and 'symsrv.dll' or install either " + "debugging tools or windbg preview.") + else: + if isinstance(dll_dir_path, str): + dll_dir_path = pathlib.Path(dll_dir_path) + # just check that the given dir contains dbghelp and symsrv. + if not dll_dir_path.is_dir(): + raise ValueError("The given path '{dll_dir_path}' is not a directory.".format( + dll_dir_path=dll_dir_path)) + files_to_check = ["dbghelp.dll"] + if not skip_symsrv: + files_to_check.append("symsrv.dll") + if not all((dll_dir_path / file_name).is_file() for file_name in files_to_check): + raise ValueError("The given path must be a path to a directory containing: {files_to_check!r}.".format( + files_to_check=files_to_check)) + self.dll_dir_path = dll_dir_path + + # _NT_SYMBOL_PATH is needed to store symbols locally. If it's not set, we need to set it. + nt_symbol_path = os.environ.get("_NT_SYMBOL_PATH", None) + if nt_symbol_path is None: + if skip_symsrv: + raise ValueError("_NT_SYMBOL_PATH env. var. is not set: you can't skip the symsrv.dll check.") + if symbol_path is None: + # resolve TEMP folder and set it at the symbol path. + symbol_path = "srv*{environ_tmp}*https://msdl.microsoft.com/download/symbols".format( + environ_tmp=os.environ['TEMP']) + # set symbol path + os.environ["_NT_SYMBOL_PATH"] = symbol_path + logger.debug("NT_SYMBOL_PATH: {environ_nt_symbol_path}".format( + environ_nt_symbol_path=os.environ['_NT_SYMBOL_PATH'])) + + # DbgHelp wrapper instance initialisation and symbolic option setting. + self._dbghelp = DbgHelp(self.dll_dir_path / "dbghelp.dll") + + self._debug_callback = debug_callback + dbghelp_options = [ + SYMOPT.SYMOPT_CASE_INSENSITIVE | SYMOPT.SYMOPT_UNDNAME | SYMOPT.SYMOPT_DEFERRED_LOADS | + SYMOPT.SYMOPT_LOAD_LINES | SYMOPT.SYMOPT_OMAP_FIND_NEAREST | SYMOPT.SYMOPT_FAIL_CRITICAL_ERRORS | + SYMOPT.SYMOPT_INCLUDE_32BIT_MODULES | SYMOPT.SYMOPT_AUTO_PUBLICS] + + if self._debug_callback is not None: + dbghelp_options.append(SYMOPT.SYMOPT_DEBUG) + + self._dbghelp.SymSetOptions(sum(dbghelp_options)) # 0x12237 (if not SYMOPT_DEBUG). + + # maximum user-address, used to discern between user and kernel modules (which don't change between processes). + self._max_user_address = procmon_logs_reader.maximum_application_address + + # Keep track of all system modules. + for process in procmon_logs_reader.processes(): + # Can't remember if System pid has always been 4. + # Just check its name (doesn't end with .exe) and company is MS. That should be foolproof enough. + if process.process_name in ["System"] and process.user.lower() == "nt authority\\system": + self.system_modules = process.modules + break + else: + # Couldn't find system modules. Log possible candidates. + sys_pid = next((p for p in procmon_logs_reader.processes() if p.pid == 4), None) + sys_name = next((p for p in procmon_logs_reader.processes() if p.process_name.lower() == "system"), None) + if sys_pid is not None: + logger.debug("Process w/ PID = 4: {sys_pid!r}".format(sys_pid=sys_pid)) + if sys_name is not None: + logger.debug("Process w/ Name = 'System': {sys_name!r}".format(sys_name=sys_name)) + raise RuntimeError("Could not get system modules.") + + def find_module(self, event, address): + # type: (procmon_parser.Event, int) -> procmon_parser.Module | None + """Try to find the corresponding module given an address from an event stack trace. + + Args: + event: The event from which the address belongs to. + address: The address to be resolved to its containing module. + + Returns: + If the address lies inside a known module, the module is returned, otherwise the function returns None. + """ + + def is_kernel(addr): + # type: (int) -> bool + """[Internal] Return whether an address is kernel (True) or not (user mode address: False).""" + return addr > self._max_user_address + + def find_module_from_list(addr, modules): + # type: (int, list[procmon_parser.Module]) -> procmon_parser.Module | None + """[Internal] Return an instance of a Module given an address (if the address lies inside the module). + """ + for m in modules: + base = m.base_address + end = m.base_address + m.size + if base <= addr < end: + return m + return None + + # get the right modules depending on the address type. + # kernel address: check modules in the system process. + # user land address: check modules in the process itself. + module_source = self.system_modules if is_kernel(address) else event.process.modules + module = find_module_from_list(address, module_source) + return module # may be None. + + def resolve_stack_trace(self, event): + # type: (procmon_parser.Event) -> typing.Iterator[StackTraceFrameInformation] + """Resolve the stack trace of an event to include symbolic information. + + Args: + event: The event for which the stack trace should be resolved. + + Notes: + The `ProcmonLogsReader` instance must be instantiated with `should_get_stacktrace` set to True (default). + + Raises: + RuntimeError: + - The given event des not contain any stack trace information (be sure to call `ProcmonLogsReader` with + the `should_get_stacktrace` parameter set to True). + - The symbol engine could not be correctly initialized. + + Examples: + ```python + p = pathlib.Path(r"C:\temp\Logfile.PML") + + with p.open("rb") as f: + log_reader = ProcmonLogsReader(f, should_get_stacktrace=True) + sym_resolver = SymbolResolver(log_reader) + for i, event in enumerate(log_reader): + print(f"{i:04x} {event!r}") + frames = list(sym_resolver.resolve_stack_trace(event)) + print(StackTraceInformation.prettify(frames)) + ``` + + Yields: + An instance of `StackTraceFrameInformation` for each of the frame in the stack trace. + """ + with self._dbghelp_init(event) as sym_pid: + # yield from self._resolve_stack_trace(event, sym_pid) + for sti in self._resolve_stack_trace(event, sym_pid): + return sti + + def _resolve_stack_trace(self, event, pid): + # type: (procmon_parser.Event, int) -> typing.Iterator[StackTraceFrameInformation] + + if not event.stacktrace or event.stacktrace is None: + raise RuntimeError("Trying to resolve a stack trace while there is no stack trace.") + + # set up callback if we are in debug mode + if self._debug_callback: + callback = PSYMBOL_REGISTERED_CALLBACK64(self._symbol_registered_callback) + self._dbghelp.SymRegisterCallbackW64(pid, callback, PVOID(pid)) + + logger.debug("# Stack Trace frames: {len_event_stack_trace}".format( + len_event_stack_trace=len(event.stacktrace))) + logger.debug("PID: {pid:#08x}".format(pid=pid)) + + # Resolve each of the addresses in the stack trace, frame by frame. + for frame_number, address in enumerate(event.stacktrace): + frame_type = FrameType.from_address(address, self._max_user_address) + logger.debug("{sep}\nStack Frame: {frame_number:04} type: {frame_type}".format( + sep='-' * 79, frame_number=frame_number, frame_type=frame_type)) + + # find the module that contains the given address. It might not be found. + logger.debug("Address: {address:#016x}".format(address=address)) + module = self.find_module(event, address) + if not module: + yield StackTraceFrameInformation(frame_type, frame_number, address) + continue + + logger.debug("Address: {address:#016x} --> Module: {module!r}".format(address=address, module=module)) + + # We have the address and the module name. Get the corresponding file from the Symbol store! + # Once we have the file, we'll be able to query the symbol for the address. + module_id = PVOID(module.timestamp) + search_path = None # use the default search path provided to SymInitialize. + + # We give it two tries: + # 1. The module is an MS module, in which case it's going to be resolved pretty much automatically. + # 1.a if it's not an MS module it's going to fail. + # 2. If it's not an MS module, then we indicate to SymFindFileInPath where to find the binary in SearchPath. + ret_val, found_file = self._find_file(pid, search_path, module, module_id) + if not ret_val: + last_err = ctypes.get_last_error() + logger.debug("SymFindFileInPathW failed at attempt 0 (error: {last_err:#08x}).".format( + last_err=last_err)) + if last_err == ERROR_FILE_NOT_FOUND: + # check if the directory exists. If it is, use it as the search path. + dir_path = pathlib.Path(module.path).parent + search_path = str(dir_path) if dir_path.is_dir() else search_path + ret_val, found_file = self._find_file(pid, search_path, module, module_id) + if ret_val == 0: + logger.error("SymFindFileInPathW: ({last_err:#08x}) {formatted_last_err}".format( + last_err=last_err, formatted_last_err=ctypes.FormatError(last_err))) + yield StackTraceFrameInformation(frame_type, frame_number, address, module) + continue + + logger.debug("Found file: {found_file.value}".format(found_file=found_file)) + + # We have the file from the symbol store, we now 'load' the symbolic module (it does not load it inside + # the process address space) to be able to query the symbol right after that. + module_base = self._dbghelp.SymLoadModuleExW( + pid, # hProcess + None, # hFile + found_file, # ImageName + None, # ModuleName + module.base_address, # BaseOfDll + module.size, # DllSize + None, # Data (nullptr) + 0 # Flags + ) + if module_base == 0: + # the function return 0 (FALSE) and GetLastError will also return 0 if there was no error, but the + # module was already loaded. This is not an error in this case. + last_err = ctypes.get_last_error() + if last_err != 0: # if it's not 0, then it's really an error. + logger.error("SymLoadModuleExW: ({last_err:#08x}) {formatted_last_err}".format( + last_err=last_err, formatted_last_err=ctypes.FormatError(last_err))) + yield StackTraceFrameInformation(frame_type, frame_number, address, module) + continue + + logger.debug("Module Base: {module_base:#x}".format(module_base=module_base)) + + # Now that we have loaded the symbolic module, we query it with the address (lying inside it) to get the + # name of the symbol and the displacement from the symbol (if any). + displacement = DWORD64(0) + symbol_info = SYMBOL_INFOW() + symbol_info.MaxNameLen = SYMBOL_INFOW.BUFFER_NUM_ELEMENTS + symbol_info.SizeOfStruct = 0x58 + ret_val = self._dbghelp.SymFromAddr( + pid, # hProcess + address, # Address of the symbol + ctypes.byref(displacement), # [out] displacement from the base of the symbol. e.g. 'foo + 0x10' + ctypes.byref(symbol_info) # [in, out] symbol information. + ) + if ret_val == 0: + last_err = ctypes.get_last_error() + logger.error("SymFromAddr: ({last_err:#08x}) {formatted_last_err}".format( + last_err=last_err, formatted_last_err=ctypes.FormatError(last_err))) + yield StackTraceFrameInformation(frame_type, frame_number, address, module) + continue + + logger.debug("Symbol Name: {symbol_info.Name}; Displacement: {displacement.value:#08x}".format( + symbol_info=symbol_info, displacement=displacement)) + + # In case we have source information, we need to continue to query the symbol to get source information such + # as the source file name and the line number. This obviously fails if there are no symbolic source code + # information. + line_displacement = DWORD(0) + line = IMAGEHLP_LINEW64() + line.SizeOfStruct = ctypes.sizeof(IMAGEHLP_LINEW64) + ret_val = self._dbghelp.SymGetLineFromAddrW64( + pid, # hProcess + address, # Address + ctypes.byref(line_displacement), # Displacement + ctypes.byref(line) # [out] Line + ) + # The above call fails if there are no source code information. This is the default for Windows binaries. + if ret_val == 0: + last_err = ctypes.get_last_error() + logger.debug( + "SymGetLineFromAddrW64 [no source line]: ({last_err:#08x}) {formatted_last_err}".format( + last_err=last_err, formatted_last_err=ctypes.FormatError(last_err))) + yield StackTraceFrameInformation(frame_type, frame_number, address, module, symbol_info, + displacement.value) + continue + + # FIX: If you don't copy the line.FileName buffer, it gets overwritten in the next call to + # SymGetLineFromAddrW64(). After much debugging, the solution is in fact currently written in the + # SymGetLineFromAddrW64() documentation: + # This function returns a pointer to a buffer that may be reused by another function. Therefore, be + # sure to copy the data returned to another buffer immediately. + # The following 2 lines just do that. + file_name = ctypes.create_unicode_buffer(line.FileName) # noqa + line.FileName = ctypes.cast(file_name, ctypes.c_wchar_p) + + logger.debug("File Name: '{line.FileName}'; Line Number: {line.LineNumber}; " + "Line Displacement (col): {line_displacement.value}".format( + line=line, line_displacement=line_displacement)) # noqa + + # It's possible that the returned line.Filename is already a fully qualified path, in which case there's no + # need to call SymGetSourceFileW, as the latter would be only used to retrieve the fully qualified path. + # We just check that we already have fully qualified path. If it is, then we bail out, otherwise we call + # SymGetSourceFileW. + if pathlib.Path(line.FileName).is_absolute(): # noqa + # we have a fully qualified source file path. + logger.debug("source file path [from line.Filename]: {line.FileName}".format(line=line)) + fully_qualified_source_path = line.FileName + else: + # we don't have a fully qualified source file path. + source_file_path_size = DWORD(MAX_PATH) + source_file_path = ctypes.create_unicode_buffer(source_file_path_size.value) + ret_val = self._dbghelp.SymGetSourceFileW( + pid, # hProcess + module.base_address, # Base + None, # Params (never used) + line.FileName, # FileSpec (name of source file) [PCWSTR] + source_file_path, # [out] FilePath: fully qualified path of source file + source_file_path_size # FilePath size (num chars) + ) + if ret_val == 0: + last_err = ctypes.get_last_error() + logger.debug("SymGetSourceFileW: ({last_err:#08x}) {formatted_last_err}".format( + last_err=last_err, formatted_last_err=ctypes.FormatError(last_err))) + logger.debug("SymGetSourceFileW failed: using '{line.FileName}' as fallback.".format( + line=line)) + # use line.FileName as fallback + fully_qualified_source_path = line.FileName + else: + logger.debug("source file path [from SymGetSourceFileW]: {source_file_path.value}".format( + source_file_path=source_file_path)) + fully_qualified_source_path = source_file_path.value + + yield StackTraceFrameInformation(frame_type, frame_number, address, module, symbol_info, displacement.value, + line, line_displacement.value, fully_qualified_source_path) + + def _find_file(self, pid, search_path, module, module_id): + # type: (int, str | None, procmon_parser.Module, ctypes.c_void_p[int]) -> tuple[int, ctypes.Array[ctypes.c_wchar]] + """[Internal] Locates a symbol file or executable image. + + Args: + pid: process ID; must be the same as the one passed to SymInitialize. + search_path: Either None (use the default search path) or a specific search path. + module: The module for which we're trying to find the symbol information. + module_id: pointer to id, see SymFindFileInPathW documentation for more information. + + Returns: + A tuple: the return code of the SymFindFileInPathW function and the path to the symbolic file on success. + """ + found_file = ctypes.create_unicode_buffer(MAX_PATH * ctypes.sizeof(ctypes.c_wchar)) + ret_val = self._dbghelp.SymFindFileInPathW( + HANDLE(pid), # hProcess + search_path, # SearchPath + module.path, # FileName (PCWSTR: it's fine to pass a python string) + ctypes.byref(module_id), # id + module.size, # two + 0, # three + SSRVOPT.SSRVOPT_GUIDPTR, # flags: ProcMon uses 'SSRVOPT_GUIDPTR' but it's not a GUID??? still works + found_file, # [out] FoundFile + PFINDFILEINPATHCALLBACK(0), # callback (nullptr) + None # context + ) + return ret_val, found_file + + def _symbol_registered_callback(self, handle, action_code, callback_data, user_context): + # type: (HANDLE, ULONG, ULONG64, ULONG64) -> BOOL + """[Internal] Callback passed to SymRegisterCallbackW64. Translates arguments to python types before calling the + user-defined callback. + + Notes: + Only called if a callback is passed to SymbolResolver constructor. + + Args: + handle: handle passed initially to SymInitialize (actually the pid of the process). + action_code: One of the constant defined in the `CBA` enumeration. + callback_data: data passed from the symbol engine. Interpretation of the data depends on the action code. + user_context: user defined data; The pid of the process. + + Returns: + The value returned by the user-defined callback. + """ + param_callback_data = callback_data + try: + param_action_code = CBA(action_code) + if param_action_code == CBA.CBA_DEBUG_INFO: + param_callback_data = ctypes.cast(callback_data, ctypes.c_wchar_p).value + elif param_action_code == CBA.CBA_EVENT: + param_callback_data = ctypes.cast(callback_data, PIMAGEHLP_CBA_EVENTW).contents + except ValueError: + # can't convert from int to CBA. this happens for internal messages that surfaces. + param_action_code = action_code + + ret = self._debug_callback(handle, param_action_code, param_callback_data, user_context) + return ret + + @contextlib.contextmanager + def _dbghelp_init(self, event): + # type: (procmon_parser.Event) -> typing.Generator[int, None, None] + """[internal] context manager used to make sure we clean up symbol information on exit from the + `resolve_stack_trace` function. Takes care of initializing and performing cleanup of the symbol engine. + + Args: + event: The event for which the stack trace is to be resolved. + + Returns: + The process pid used by the symbolic functions. + """ + pid = event.process.pid + if self._dbghelp.SymInitialize(pid, None, False) == 0: + last_err = ctypes.get_last_error() + raise RuntimeError("SymInitialize failed: {last_err:#08x}; msg: {err_msg}".format( + last_err=last_err, err_msg=ctypes.FormatError(last_err))) + else: + logger.debug("SymInitialize OK.") + try: + yield pid + finally: + if self._dbghelp.SymCleanup(pid) == 0: + last_err = ctypes.get_last_error() + raise RuntimeError("SymCleanup failed: {last_err:#08x}; msg: {err_msg}".format( + last_err=last_err, err_msg=ctypes.FormatError(last_err))) + else: + logger.debug("SymCleanup OK.") + + +class DbgHelpUtils(object): + """Utility functions to automatically find DbgHelp.dll and Symsrv.dll if Debugging Tools For Windows or Windbg + preview are installed on the current system. + """ + + @staticmethod + def find_debugging_tools(): + # type: () -> pathlib.Path | None + """Find the path of the directory containing DbgHelp.dll and Symsrv.dll from the Debugging Tools For Windows + (installed from the Windows SDK). + + Returns: + The path to the DLLs directory (that corresponds to the interpreter architecture) , or None if the Debugging + Tools for Windows are not installed. + """ + sdk_key = r"SOFTWARE\WOW6432Node\Microsoft\Windows Kits\Installed Roots" + debugger_roots = list() + try: + with winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE, sdk_key) as top_key: + _, num_values, _ = winreg.QueryInfoKey(top_key) + for i in range(num_values): + value_name, _, _ = winreg.EnumValue(top_key, i) + if value_name.startswith("{"): # skip GUIDs + continue + if "windowsdebuggersroot" in value_name.lower(): + # 'WindowsDebuggerRoot' key is followed by the SDK major number, i.e. 'WindowsDebuggersRoot10'. + debugger_roots.append(value_name) + except OSError: + return None + + if not debugger_roots: + return None + + # we have the debugger roots, we need to find the latest version. 11 > 10 > 81 > 80 > 7 ... + versions = {} + for debugger_root in debugger_roots: + match = re.search(r"WindowsDebuggersRoot(\d+)", debugger_root) + if not match: + return None + version = float(match.group(1)) + if version > 20.0: + version = version / 10.0 # e.g. 81 -> 8.1 + versions.update({version: match.group(1)}) + + if not versions: + return None + + max_ver = max(versions.keys()) + max_ver_str = versions[max_ver] + + debugger_path = None # type: pathlib.Path | None + try: + with winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE, sdk_key) as top_key: + value, value_type = winreg.QueryValueEx(top_key, "WindowsDebuggersRoot{max_ver_str}".format( + max_ver_str=max_ver_str)) + if value_type == winreg.REG_SZ: + debugger_path = pathlib.Path(value) + except OSError: + return None + + if not debugger_path or not debugger_path.is_dir(): + return None + + # we have found Windbg installation path; we need to get the correct architecture directory. + lookup = { + "amd64": "x64", + "win32": "x86", + "arm64": "arm64", + "arm32": "arm" + } + + return DbgHelpUtils._arch_dir(debugger_path, lookup) + + @staticmethod + def find_windbg_preview(): + # type: () -> pathlib.Path | None + """Find the directory path of the DbgHelp.dll and Symsrv.dll from the Windbg preview installation (installed + from the Windows Store). + + Returns: + The path to the DLLs directory (that corresponds to the interpreter architecture), or None if Windbg Preview + directory couldn't be found. + """ + package_key = (r"SOFTWARE\Classes\Local Settings\Software\Microsoft\Windows\CurrentVersion\AppModel" + r"\Repository\Packages") + + windbg_location = None # type: pathlib.Path | None + try: + with winreg.OpenKey(winreg.HKEY_CURRENT_USER, package_key) as top_key: + num_keys, _, _ = winreg.QueryInfoKey(top_key) + for i in range(num_keys): + key_name = winreg.EnumKey(top_key, i) + if "microsoft.windbg" in key_name.lower(): + # found Windbg Preview. Get its installation location. + install_key = "{package_key}\\{key_name}".format(package_key=package_key, key_name=key_name) + with winreg.OpenKey(winreg.HKEY_CURRENT_USER, install_key) as windbg_key: + install_location, key_type = winreg.QueryValueEx(windbg_key, "PackageRootFolder") + if key_type == winreg.REG_SZ: + windbg_location = pathlib.Path(install_location) + break + except OSError: + return None + + if windbg_location is None or windbg_location.is_dir(): + return None + + # we have found the installation path; we need to get the correct architecture directory. + # One of: 'x86', 'amd64' or 'arm64' (there's no 'arm32' support in Windbg Preview). + lookup = { + "amd64": "amd64", + "win32": "x86", + "arm64": "arm64", + # note: Windbg preview doesn't support arm32. + } + + return DbgHelpUtils._arch_dir(windbg_location, lookup) + + @staticmethod + def _arch_dir(debugger_dir, arch_lookup): + # type: (pathlib.Path, dict[str, str]) -> pathlib.Path | None + """[internal] Get the path to the right DLLs (depending on the architecture used by the python interpreter). + + Args: + debugger_dir: The top level directory of the debugger (Windbg / Windbg Preview). + arch_lookup: A dictionary which translate the system architecture to a folder name in the Windbg + installation. + + Returns: + The correct path to the DLLs (dbghelp & symsrv) directory. None if the directory couldn't be found. + """ + machine = platform.machine().lower() + bitness = arch_lookup.get(machine, None) + if bitness is None: + return None + arch_dir = debugger_dir / bitness + if not arch_dir.is_dir(): + return None + + # check that there are both 'dbghelp.dll' and 'symsrv.dll' in the given directory. + if not all((arch_dir / file_name).is_file() for file_name in ("symsrv.dll", "dbghelp.dll")): + return None + + return arch_dir diff --git a/procmon_parser/symbol_resolver/win/__init__.py b/procmon_parser/symbol_resolver/win/__init__.py new file mode 100644 index 0000000..8781c8a --- /dev/null +++ b/procmon_parser/symbol_resolver/win/__init__.py @@ -0,0 +1 @@ +from .dbghelp import CBA diff --git a/procmon_parser/symbol_resolver/win/dbghelp.py b/procmon_parser/symbol_resolver/win/dbghelp.py new file mode 100644 index 0000000..c9b29e4 --- /dev/null +++ b/procmon_parser/symbol_resolver/win/dbghelp.py @@ -0,0 +1,446 @@ +#!/usr/bin/env python3 +# -*- coding:utf-8 -*- +"""Module wrapper around DbgHelp.dll Windows library. + +DbgHelp.dll is the main library for resolving symbols. +""" +import ctypes +import logging +import sys + + +from procmon_parser.symbol_resolver.win.win_types import ( + HANDLE, PCSTR, BOOL, DWORD, PCWSTR, PVOID, PWSTR, DWORD64, ULONG, ULONG64, WCHAR, PDWORD64, PDWORD, BOOLEAN) +from procmon_parser.symbol_resolver.win.win_consts import MAX_PATH + +_ver = sys.version_info[:3] +if _ver >= (3, 5, 0): + import enum + import pathlib + import typing + if typing.TYPE_CHECKING: + import _ctypes # only used for typing as ctypes doesn't export inner types. +elif _ver <= (2, 7, 18): + import aenum as enum + import pathlib2 as pathlib + +logger = logging.getLogger(__name__) + + +# +# Callback Functions needed by some DbgHelp APIs. +# + +# PFINDFILEINPATHCALLBACK; used with the SymFindFileInPath function. +# https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nc-dbghelp-pfindfileinpathcallback +PFINDFILEINPATHCALLBACK = ctypes.WINFUNCTYPE(BOOL, PCSTR, PVOID) + +# PSYMBOL_REGISTERED_CALLBACK64 ; passed to SymRegisterCallback64 +# Used if SymSetOptions is passed the 'SYMOPT_DEBUG' flag. +# https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nc-dbghelp-psymbol_registered_callback64 +PSYMBOL_REGISTERED_CALLBACK64 = ctypes.WINFUNCTYPE(BOOL, HANDLE, ULONG, ULONG64, ULONG64) + + +# +# Structures used by DbgHelp APIs. +# + + +class MODLOAD_DATA(ctypes.Structure): # noqa + """Contains module data. Used by SymLoadModuleExW. + + See Also: + https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/ns-dbghelp-modload_data + """ + _fields_ = ( + ("ssize", DWORD), + ("ssig", DWORD), + ("data", PVOID), + ("size", DWORD), + ("flags", DWORD), + ) + + +PMODLOAD_DATA = ctypes.POINTER(MODLOAD_DATA) + + +class SYMBOL_INFOW(ctypes.Structure): # noqa + """Contains symbol information. Used by SymFromAddrW. + + See Also: + https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/ns-dbghelp-symbol_infow + """ + BUFFER_NUM_ELEMENTS = 468 + + _fields_ = ( + ("SizeOfStruct", ULONG), + ("TypeIndex", ULONG), + ("Reserved", ULONG64 * 2), + ("Index", ULONG), + ("Size", ULONG), + ("ModBase", ULONG64), + ("Flags", ULONG), + ("Value", ULONG64), + ("Address", ULONG64), + ("Register", ULONG), + ("Scope", ULONG), + ("Tag", ULONG), + ("NameLen", ULONG), + ("MaxNameLen", ULONG), + ("Name", WCHAR * BUFFER_NUM_ELEMENTS) + ) + + +PSYMBOL_INFOW = ctypes.POINTER(SYMBOL_INFOW) + + +class IMAGEHLP_LINEW64(ctypes.Structure): # noqa + """Represents a source file line. Used by SymGetLineFromAddrW64. + + See Also: + https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/ns-dbghelp-imagehlp_linew64 + """ + _fields_ = ( + ("SizeOfStruct", DWORD), + ("Key", PVOID), + ("LineNumber", DWORD), + ("FileName", PWSTR), + ("Address", DWORD64) + ) + + +PIMAGEHLP_LINEW64 = ctypes.POINTER(IMAGEHLP_LINEW64) + + +class IMAGEHLP_DEFERRED_SYMBOL_LOADW64(ctypes.Structure): # noqa + """Contains information about a deferred symbol load. + + See Also: + https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/ns-dbghelp-imagehlp_deferred_symbol_loadw64 + """ + _fields_ = ( + ("SizeOfStruct", DWORD), + ("BaseOfImage", DWORD64), + ("Checksum", DWORD), + ("TimeDateStamp", DWORD), + ("FileName", WCHAR * (MAX_PATH + 1)), + ("Reparse", BOOLEAN), + ("hFile", HANDLE), + ("Flags", DWORD) + ) + + +PIMAGEHLP_DEFERRED_SYMBOL_LOADW64 = ctypes.POINTER(IMAGEHLP_DEFERRED_SYMBOL_LOADW64) + + +class IMAGEHLP_DUPLICATE_SYMBOL64(ctypes.Structure): # noqa + """Contains duplicate symbol information. + + See Also: + https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/ns-dbghelp-imagehlp_duplicate_symbol64 + + """ + _fields_ = ( + ("SizeOfStruct", DWORD), + ("NumberOfDups", DWORD), + ("Symbol", PVOID), # should be POINTER(IMAGEHLP_SYMBOLW64) + ("SelectedSymbol", DWORD) + ) + + +PIMAGEHLP_DUPLICATE_SYMBOL64 = ctypes.POINTER(IMAGEHLP_DUPLICATE_SYMBOL64) + + +class IMAGEHLP_CBA_EVENTW(ctypes.Structure): # noqa + """Contains information about a debugging event. + + See Also: + https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/ns-dbghelp-imagehlp_cba_eventw + """ + _fields_ = ( + ("severity", DWORD), + ("code", DWORD), + ("desc", PCWSTR), + ("object", PVOID) + ) + + +PIMAGEHLP_CBA_EVENTW = ctypes.POINTER(IMAGEHLP_CBA_EVENTW) + +# +# Functions descriptors +# + + +class _FunctionDescriptor(object): + __slots__ = ["name", "parameter_types", "return_type", "aliases"] + + def __init__(self, name, parameter_types=None, return_type=None, aliases=None): + # type: (str, tuple[_ctypes._SimpleCData] | None, _ctypes._SimpleCData | None, list[str] | None) -> None + """Class used to describe a Windows API function wrt its ctypes bindings.""" + self.name = name + self.parameter_types = parameter_types + self.return_type = return_type + self.aliases = aliases + + +# list of function (descriptors) from DbgHelp.dll +# type: list[_FunctionDescriptor] +_functions_descriptors = [ + # SymInitializeW + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-syminitializew + _FunctionDescriptor("SymInitializeW", + (HANDLE, PCSTR, BOOL), + BOOL, + ["SymInitialize"]), + # SymCleanup + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symcleanup + _FunctionDescriptor("SymCleanup", + (HANDLE,), + BOOL), + # SymSetOptions + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symsetoptions + _FunctionDescriptor("SymSetOptions", + (DWORD,), + DWORD), + # SymSetSearchPathW + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symsetsearchpathw + _FunctionDescriptor("SymSetSearchPathW", + (HANDLE, PCWSTR), + BOOL, + ["SymSetSearchPath"]), + # SymFindFileInPathW + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symfindfileinpathw + _FunctionDescriptor("SymFindFileInPathW", + (HANDLE, PCWSTR, PCWSTR, PVOID, DWORD, DWORD, DWORD, PWSTR, PFINDFILEINPATHCALLBACK, PVOID), + BOOL, + ["SymFindFileInPath"]), + # SymLoadModuleExW + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symloadmoduleexw + _FunctionDescriptor("SymLoadModuleExW", + (HANDLE, HANDLE, PCWSTR, PCWSTR, DWORD64, DWORD, PMODLOAD_DATA, DWORD), + DWORD64, + ["SymLoadModuleEx"]), + + # SymFromAddrW + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symfromaddrw + _FunctionDescriptor("SymFromAddrW", + (HANDLE, DWORD64, PDWORD64, PSYMBOL_INFOW), + BOOL, + ["SymFromAddr"]), + + # SymGetLineFromAddrW64 + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symgetlinefromaddrw64 + _FunctionDescriptor("SymGetLineFromAddrW64", + (HANDLE, DWORD64, PDWORD, PIMAGEHLP_LINEW64), + BOOL), + + # SymGetLinePrevW64 + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symgetlineprevw64 + _FunctionDescriptor("SymGetLinePrevW64", + (HANDLE, PIMAGEHLP_LINEW64), + BOOL), + + # SymGetSourceFileW + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symgetsourcefilew + _FunctionDescriptor("SymGetSourceFileW", + (HANDLE, ULONG64, PCWSTR, PCWSTR, PWSTR, DWORD), + BOOL, + ["SymGetSourceFile"] + ), + + # SymRegisterCallback + # https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symregistercallback + _FunctionDescriptor("SymRegisterCallbackW64", + (HANDLE, PSYMBOL_REGISTERED_CALLBACK64, PVOID), + BOOL), +] + + +# +# Constants +# + +class SYMOPT(enum.IntFlag): + """Options that are set/returned by SymSetOptions() & SymGetOptions(); these are used as a mask. + + Notes: + This is a made up enum since constants are just `#define` in dbghelp.h. This prevents to have to import all + constants though. + """ + SYMOPT_CASE_INSENSITIVE = 0x00000001 + SYMOPT_UNDNAME = 0x00000002 + SYMOPT_DEFERRED_LOADS = 0x00000004 + SYMOPT_NO_CPP = 0x00000008 + SYMOPT_LOAD_LINES = 0x00000010 + SYMOPT_OMAP_FIND_NEAREST = 0x00000020 + SYMOPT_LOAD_ANYTHING = 0x00000040 + SYMOPT_IGNORE_CVREC = 0x00000080 + SYMOPT_NO_UNQUALIFIED_LOADS = 0x00000100 + SYMOPT_FAIL_CRITICAL_ERRORS = 0x00000200 + SYMOPT_EXACT_SYMBOLS = 0x00000400 + SYMOPT_ALLOW_ABSOLUTE_SYMBOLS = 0x00000800 + SYMOPT_IGNORE_NT_SYMPATH = 0x00001000 + SYMOPT_INCLUDE_32BIT_MODULES = 0x00002000 + SYMOPT_PUBLICS_ONLY = 0x00004000 + SYMOPT_NO_PUBLICS = 0x00008000 + SYMOPT_AUTO_PUBLICS = 0x00010000 + SYMOPT_NO_IMAGE_SEARCH = 0x00020000 + SYMOPT_SECURE = 0x00040000 + SYMOPT_NO_PROMPTS = 0x00080000 + SYMOPT_OVERWRITE = 0x00100000 + SYMOPT_IGNORE_IMAGEDIR = 0x00200000 + SYMOPT_FLAT_DIRECTORY = 0x00400000 + SYMOPT_FAVOR_COMPRESSED = 0x00800000 + SYMOPT_ALLOW_ZERO_ADDRESS = 0x01000000 + SYMOPT_DISABLE_SYMSRV_AUTODETECT = 0x02000000 + SYMOPT_READONLY_CACHE = 0x04000000 + SYMOPT_SYMPATH_LAST = 0x08000000 + SYMOPT_DISABLE_FAST_SYMBOLS = 0x10000000 + SYMOPT_DISABLE_SYMSRV_TIMEOUT = 0x20000000 + SYMOPT_DISABLE_SRVSTAR_ON_STARTUP = 0x40000000 + SYMOPT_DEBUG = 0x80000000 + + +class SSRVOPT(enum.IntFlag): + """Symbol Server Options; used by functions such as SymFindFileInPathW. + + Notes: + This is a made up enum since constants are just `#define` in dbghelp.h. This prevents to have to import all + constants though. + """ + SSRVOPT_CALLBACK = 0x00000001 + SSRVOPT_DWORD = 0x00000002 + SSRVOPT_DWORDPTR = 0x00000004 + SSRVOPT_GUIDPTR = 0x00000008 + SSRVOPT_OLDGUIDPTR = 0x00000010 + SSRVOPT_UNATTENDED = 0x00000020 + SSRVOPT_NOCOPY = 0x00000040 + SSRVOPT_GETPATH = 0x00000040 + SSRVOPT_PARENTWIN = 0x00000080 + SSRVOPT_PARAMTYPE = 0x00000100 + SSRVOPT_SECURE = 0x00000200 + SSRVOPT_TRACE = 0x00000400 + SSRVOPT_SETCONTEXT = 0x00000800 + SSRVOPT_PROXY = 0x00001000 + SSRVOPT_DOWNSTREAM_STORE = 0x00002000 + SSRVOPT_OVERWRITE = 0x00004000 + SSRVOPT_RESETTOU = 0x00008000 + SSRVOPT_CALLBACKW = 0x00010000 + SSRVOPT_FLAT_DEFAULT_STORE = 0x00020000 + SSRVOPT_PROXYW = 0x00040000 + SSRVOPT_MESSAGE = 0x00080000 + SSRVOPT_SERVICE = 0x00100000 # deprecated + SSRVOPT_FAVOR_COMPRESSED = 0x00200000 + SSRVOPT_STRING = 0x00400000 + SSRVOPT_WINHTTP = 0x00800000 + SSRVOPT_WININET = 0x01000000 + SSRVOPT_DONT_UNCOMPRESS = 0x02000000 + SSRVOPT_DISABLE_PING_HOST = 0x04000000 + SSRVOPT_DISABLE_TIMEOUT = 0x08000000 + SSRVOPT_ENABLE_COMM_MSG = 0x10000000 + + +class CBA(enum.IntEnum): + """Values passed to various callbacks used by dbghelp. + + Only one value can be used at a time. + """ + CBA_DEFERRED_SYMBOL_LOAD_START = 0x00000001 + CBA_DEFERRED_SYMBOL_LOAD_COMPLETE = 0x00000002 + CBA_DEFERRED_SYMBOL_LOAD_FAILURE = 0x00000003 + CBA_SYMBOLS_UNLOADED = 0x00000004 + CBA_DUPLICATE_SYMBOL = 0x00000005 + CBA_READ_MEMORY = 0x00000006 + CBA_DEFERRED_SYMBOL_LOAD_CANCEL = 0x00000007 + CBA_SET_OPTIONS = 0x00000008 + CBA_EVENT = 0x00000010 + CBA_DEFERRED_SYMBOL_LOAD_PARTIAL = 0x00000020 + CBA_DEBUG_INFO = 0x10000000 + CBA_SRCSRV_INFO = 0x20000000 + CBA_SRCSRV_EVENT = 0x40000000 + CBA_UPDATE_STATUS_BAR = 0x50000000 + CBA_ENGINE_PRESENT = 0x60000000 + CBA_CHECK_ENGOPT_DISALLOW_NETWORK_PATHS = 0x70000000 + CBA_CHECK_ARM_MACHINE_THUMB_TYPE_OVERRIDE = 0x80000000 + + +class DbgHelp: + """Main wrapper around DbgHelp.dll library functions. + + Examples: + ``` + # functions can be called as attributes from the class instance, as long as they have a function descriptor. + ret_val = DbgHelp.SymInitialize(0xdeadbeef, None, False) + if ret_val == 0: + # log error + pass + ``` + """ + + def __init__(self, dbghelp_path): + # type: (pathlib.Path) -> None + """Class init. + + Args: + dbghelp_path: Path to the dbghelp.dll library. + """ + if not dbghelp_path.is_file(): + raise ValueError("The given path '{dbghelp_path}' is not a file.".format(dbghelp_path=dbghelp_path)) + + self._dll_path = dbghelp_path + + # Dictionary of functions; key is str (function name), value is ctypes function pointer. + self._functions = dict() # type: dict[str: _ctypes.CFuncPtr] + + # DLL instance + self._dbghelp = ctypes.WinDLL(str(dbghelp_path), use_last_error=True) + + # resolve all needed functions. + self._resolve_functions(_functions_descriptors) + + def __getitem__(self, item): + # type: (str) -> _ctypes.CFuncPtr + return self._functions[item] + + def __getattr__(self, item): + # type: (str) -> _ctypes.CFuncPtr + return self[item] + + def _resolve_functions(self, function_descriptors): + # type: (list[_FunctionDescriptor]) -> None + """[internal] Resolve functions, for the given DLL, from the list of `_FunctionsDescriptor`. + + Raises: + AttributeError: A given function was not found. + """ + for function_descriptor in function_descriptors: + self._register_function(function_descriptor) + + def _register_function(self, function_descriptor): + # type: (_FunctionDescriptor) -> None + """[internal] Build a function ctypes wrapping from its function descriptor. + + Args: + function_descriptor: An instance of a _FunctionDescriptor that describes a function ctypes wrapping. + + Raises: + AttributeError: A given function was not found. + """ + try: + function_pointer = getattr(self._dbghelp, function_descriptor.name) + except AttributeError: + # We land here if the function can't be found in the given DLL. + # note: it raises from quite deep inside ctypes if the function can't be resolved, which might be confusing. + # Log it now and re-raise. + logger.error("The function {function_descriptor.name} was not found in the DLL: " + "'{dll_path!r}'.".format(function_descriptor=function_descriptor, dll_path=self.dll_path)) + raise + if function_descriptor.parameter_types: + function_pointer.argtypes = function_descriptor.parameter_types + if function_descriptor.return_type: + function_pointer.restype = function_descriptor.return_type + self._functions.update({function_descriptor.name: function_pointer}) + if function_descriptor.aliases: + for alias in function_descriptor.aliases: + self._functions.update({alias: function_pointer}) diff --git a/procmon_parser/symbol_resolver/win/win_consts.py b/procmon_parser/symbol_resolver/win/win_consts.py new file mode 100644 index 0000000..9a064b1 --- /dev/null +++ b/procmon_parser/symbol_resolver/win/win_consts.py @@ -0,0 +1,7 @@ +"""Windows Constants +""" +# Windows (default) max path length. +MAX_PATH = 260 + +# Windows standard error codes. +ERROR_FILE_NOT_FOUND = 2 diff --git a/procmon_parser/symbol_resolver/win/win_types.py b/procmon_parser/symbol_resolver/win/win_types.py new file mode 100644 index 0000000..c74772b --- /dev/null +++ b/procmon_parser/symbol_resolver/win/win_types.py @@ -0,0 +1,21 @@ +#!/usr/bin/env python3 +# -*- coding:utf-8 -*- +"""Module that declares various Windows types for ctypes. +""" +import ctypes + +BYTE = BOOLEAN = ctypes.c_uint8 +HANDLE = ctypes.c_void_p +BOOL = ctypes.c_long +PCSTR = PCWSTR = PSTR = PWSTR = LPWSTR = ctypes.c_wchar_p +DWORD = ctypes.c_uint32 +DWORD64 = ctypes.c_uint64 +ULONG = ctypes.c_uint32 +ULONG64 = ctypes.c_uint64 +CHAR = ctypes.c_char +WCHAR = ctypes.c_wchar + +# pointer types +PVOID = ctypes.c_void_p +PDWORD = ctypes.POINTER(DWORD) +PDWORD64 = ctypes.POINTER(DWORD64) diff --git a/setup.py b/setup.py index 68dd765..54a22df 100644 --- a/setup.py +++ b/setup.py @@ -15,10 +15,12 @@ download_url="https://github.com/eronnen/procmon-parser/archive/v0.3.0.tar.gz", packages=["procmon_parser"], install_requires=[ - "enum34;python_version<'3.4'", + # "enum34;python_version<'3.4'", + "aenum;python_version<'3.4'", "construct>=2.10.54", "six", "ipaddress;python_version<'3'", + "pathlib2;python_version<'3'", ], classifiers=[ "Intended Audience :: Developers",