mediawiki reader: improve strong/emph conformance #10766
                
     Draft
            
            
          
      
        
          +193
        
        
          −10
        
        
          
        
      
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
cf. #10761 and #3044.
I made some progress with this today without completely blowing up the existing strong and emph parsers but weird edge cases remain. E.g. consider
''foo''''bar''. Pandoc today will give youEmph [ Str "foo" , Str "bar" ], which has an obvious appeal. My work in progress givesEmph [ Str "foo''" ] , Str "bar''", which is odder but defensible given other requirements for emphasized quote marks. The actual correct answer, according to MediaWiki, isEmph [ Str "foo'" , Strong [ Str "bar" ] ], i.e. foo'bar, which is basically a koan.Parsoid has a lot of code just for processing quotes, presumably aiming to maintain bug-for-bug compatibility with whatever MediaWiki's first parser did. So what a string of single-quotes means varies depending on what comes after it in the line, in a more context-sensitive way than I expected.
Would it be better to merge code that makes us more conformant with MediaWiki for some cases and "wrong in a different way" for others, or to try to reach perfection?