-
Notifications
You must be signed in to change notification settings - Fork 258
Description
Describe the bug
Describe the bug
Hi team, thanks for your great work! I think I found a small bug that might lead to DDoS in the system
At line 77 in response_cleaner.py
the regex r"\*\*Code Summary:\*\*\s*(.*?)\s*provides functions to", is vulnerable to ReDoS when it is used in
text = re.sub(
How to Reproduce
To Reproduce
I have a test file to stimulate the LLMs response
import re
import time
_regex = re.compile(r'\*\*Code Summary:\*\*\s*(.*?)\s*provides functions to')
for i in range(0, 8000, 2000):
attack_string = "**Code Summary:**" + "\t" * i
start_time = time.time()
match = _regex.match(attack_string)
end_time = time.time()The result is like this
i: 0, Time taken: 0.0014803409576416016 seconds
i: 2000, Time taken: 10.000782012939453 seconds
i: 4000, Time taken: 73.86766386032104 seconds
i: 6000, Time taken: 231.146071434021 seconds
i: 8000, Time taken: 547.5873472690582 secondsAs we can see, with around 6k chars, the string can cost the system to hang for around 5 mins and the time consumption increase significantly with the increase of the string length.
If using readme-ai in a server setup (e.g. readme-ai.streamlit), this bug may lead to high CPU usage or DoS risks if users submit malicious or resource-intensive repositories.
Expected behavior
I think we can add a limit like replace .*? with .{0,200}? ? Maybe it can help to solve the recursion problem.
The core of the problem lies within \s*(.*?)\s*. The constructs like \s* and .* tend to eagerly match strings, leading to massive recursion and backtracking when faced with malicious input.
I tested a modification of the regex, and the performance improved significantly:
# after modify it to r'\*\*Code Summary:\*\*\s{0,200}(.*?)\s*provides functions to'
i: 100, Time taken: 0.001453399658203125 seconds
i: 1100, Time taken: 0.6828515529632568 seconds
i: 2100, Time taken: 2.861234188079834 seconds
i: 3100, Time taken: 5.831449031829834 seconds
i: 4100, Time taken: 10.390498399734497 seconds
i: 6000, Time taken: 23.510437726974487 seconds
i: 8000, Time taken: 40.42485165596008 seconds