Skip to content

Integration of sanexml fails. #3483

Open
@35C4n0r

Description

@35C4n0r

The integration of sanexml with Pymaven and SCTK are failing.
Here is the log.
I think a lot of these are failing because the xmls are not in a proper format and i didn't implement any logic corresponding to the recover=True for the XMLParser. What this does that it Parses XML liniently.

I see two possibilitoes to solve this issue:

  • Use BeautifulSoup4 and use it's html.parser for linient parsing to get a correct xml string and then parse that string normally using python's default XML parsers. (BS4 Docs)
  • Create a custom parser :(

If we go with the first approch we can either use this process html.parser -> get correct xml -> DefaultXMLParser as the default or we can do:

try: DefaultXMLParser
except:
try: html.parser -> get correct xml -> DefaultXMLParser # Slower #Lininent
except:
try: html5lib -> get correct xml -> DefaultXMLParser # Slowest #MoreLinient

@JonoYang @pombredanne @AyanSinhaMahapatra please share your opinions on this.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions