Integration of sanexml fails.

The integration of [sanexml](https://github.com/nexB/sanexml) with Pymaven and SCTK are failing.
Here is the [log](https://gist.github.com/35C4n0r/c44ec67e80e7bb276847bdb9809d1172).
I think a lot of these are failing because the xmls are not in a proper format and i didn't implement any logic corresponding to the `recover=True` for the XMLParser. What this does that it Parses XML liniently.

I see two possibilitoes to solve this issue:

- Use `BeautifulSoup4` and use it's `html.parser` for linient parsing to get a correct xml string and then parse that string normally using python's default XML parsers. (BS4 [Docs](https://www.crummy.com/software/BeautifulSoup/bs4/doc/))
- Create a custom parser :(


If we go with the first approch we can either use this process ``html.parser -> get correct xml -> DefaultXMLParser`` as the default or we can do:

try: `DefaultXMLParser`
except: 
try: `html.parser` -> get correct xml -> `DefaultXMLParser` # Slower #Lininent
 except:
try: `html5lib` -> get correct xml -> `DefaultXMLParser` # Slowest #MoreLinient
                     
@JonoYang @pombredanne @AyanSinhaMahapatra please share your opinions on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Integration of sanexml fails. #3483

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Integration of sanexml fails. #3483

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions