Parsing bytes doesn't work

```python
In [1]: from tatsu import compile
   ...: 
   ...: parser = compile(r"""
   ...:     start       = text ;
   ...:     text        = { sentence }* ;
   ...:     sentence    = { word }+ { punctuation }* ;
   ...:     word        = /\w+/ ;
   ...:     punctuation = /[?!.,;\"']/ ;
   ...: """)
   ...: 

In [2]: parser.parse("Hello world!")
Out[2]: [(['Hello', 'world'], ['!'])]

In [3]: parser.parse(b"Hello world!")
Out[3]: [(['b'], ["'"]), (['Hello', 'world'], ['!', "'"])]
```

I hoped to be able to parse `bytes` instead of `str` because I am trying to use TatSu to parse file format that uses ASCII for data structure but also can contain strings in other encoding. The other encoding is specified in content of the file. So I should try to `bytes.decode()` these parts only after parsing what encoding is used.

I kinda expected to get `TypeError` or similar on this. But actual ~~error~~ **behaviour** is more interesting. It seem like TatSu parser tries to cast input data to `str` which in turn gets `bytes.__repr__()` return value instead of contents.

Would it be possible to implement accepting `bytes` as data for parsing? If not I guess there should be some kind of type check on input instead of blind cast to `str` which in turn leads to parsing python representation of incompatible types instead of data itself.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Parsing bytes doesn't work #346

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Parsing bytes doesn't work #346

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions