Open
Description
The integration of sanexml with Pymaven and SCTK are failing.
Here is the log.
I think a lot of these are failing because the xmls are not in a proper format and i didn't implement any logic corresponding to the recover=True
for the XMLParser. What this does that it Parses XML liniently.
I see two possibilitoes to solve this issue:
- Use
BeautifulSoup4
and use it'shtml.parser
for linient parsing to get a correct xml string and then parse that string normally using python's default XML parsers. (BS4 Docs) - Create a custom parser :(
If we go with the first approch we can either use this process html.parser -> get correct xml -> DefaultXMLParser
as the default or we can do:
try: DefaultXMLParser
except:
try: html.parser
-> get correct xml -> DefaultXMLParser
# Slower #Lininent
except:
try: html5lib
-> get correct xml -> DefaultXMLParser
# Slowest #MoreLinient
@JonoYang @pombredanne @AyanSinhaMahapatra please share your opinions on this.