Exception when clean=True in search_for_connected_sentences

**Describe the bug**
Segmenter will raise "exception: bad escape (end of pattern) at position" when it is initialized with clean=True and it encounters a sentence like "etc.Png,Jpg,.\\" (word/token that contains a backslash).

The exception is raised in:
module: 
```cleaner.py```
class: 
```class Cleaner```
method name: 
```search_for_connected_sentences```
line:
```
txt = re.sub(re.escape(word), new_word, txt)
```

**To Reproduce**
Steps to reproduce the behavior:
```
# This is a simplified example, the original text contained names so I changed it to img formats
# Word that is a abbreviation with dot followed by upper case letter and backslash
sentencer = pysbd.Segmenter(language="en", clean=True)
txt = "etc.Png,Jpg,.\\"
sentences = sentencer.segment(txt)
```

**Expected behavior**
The output should be the same as is, but is should not trow an exception.
Workaround to see the output is to escape the backslash.
```
sentencer = pysbd.Segmenter(language="en", clean=True)
txt = "etc.Png,Jpg,.\\\\"
sentences = sentencer.segment(txt)
```
Expected output:
```
['etc.', 'Png,Jpg,.', '\\']
```
***Possible solution***
replace ```txt = re.sub(re.escape(word), new_word, txt)```
with ```txt = txt.replace(word, new_word)```
It avoids all the pitfalls of regular expressions (like escaping), and is generally faster.

**Additional context**
Originally we parse small text files (in Slovak language) without special treatment to form a huge sentenced corpus. The example was specially crafted just to reproduce the behavior for English parser. I know that the backslash combination is rare for English but it happens to occur in Slovak articles when you process vast amounts of text.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Exception when clean=True in search_for_connected_sentences #91

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Exception when clean=True in search_for_connected_sentences #91

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions