-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
Describe the bug
A plain text encoded body is not decoded correctly. In technical terms, 001E data encoding stream is read as 001F instead. This is because PROPS_ID_MAP maps the stream 1000 (body tag) to 001F: code here
Expected behavior
001E encoding should be read as 001E encoding
Screenshots
Showing how my stream (directory_name) has both name and data encoding there:
Additional context
Using PROPS_ID_MAP as a reference for property details doesn't work well in practice and should only be used as last resort unless directory_entry_name
doesn't have that information (or the encoding is not recognized).
For more information, this is how I am reading the output:
import email
from msg_parser import MsOxMessage
from msg_parser.email_builder import EmailFormatter
textfile = <path_to_msg_file>
msg_obj = MsOxMessage(textfile)
email_obj = EmailFormatter(msg_obj)
eml_content = email_obj.build_email()
text = get_email_text(email.message_from_file(StringIO(eml_content)))
def get_email_text(msg) -> (str, str):
text = None
for part in msg.walk():
if part.get_content_type() == "text/plain":
text = part.get_payload(decode=True).decode('utf-8')
return text
Metadata
Metadata
Assignees
Labels
No labels