Skip to content

Crawler example RuntimeError: maximum recursion depth exceeded #179

@ilyaglow

Description

@ilyaglow

Hey there,

I'm trying to play with MRQ version 0.2.1 from PyPI in a completely distributed manner with little docker-compose I made using crawler example provided here (hope my work will help someone!) and I'm a bit stucked with a couple of things.

Somehow after ~55 jobs, jobs start to fail and the gigantic trace log says there is a recursion depth exceeded. It seems related to mrq/monkey.py and its _mrq_patched_method:

worker_1     | 2017-08-15 18:33:50.156263 [DEBUG] Starting crawler.Fetch({u'url': u'http://docs.python-requests.org/en/latest/user/quickstart/', u'from': u'http://docs.python-requests.org/en/latest/'})
worker_1     | Monkey-patching MongoDB methods...
worker_1     | 2017-08-15 18:33:50.160419 [ERROR] Job failed
worker_1     | 2017-08-15 18:33:50.164313 [ERROR] Traceback (most recent call last):
worker_1     |   File "/usr/lib/python2.7/site-packages/mrq/worker.py", line 632, in perform_job
worker_1     |     job.perform()
worker_1     |   File "/usr/lib/python2.7/site-packages/mrq/job.py", line 304, in perform
worker_1     |     result = self.task.run_wrapped(self.data["params"])
worker_1     |   File "/usr/lib/python2.7/site-packages/mrq/task.py", line 19, in run_wrapped
worker_1     |     return self.run(params)
worker_1     |   File "/app/crawler.py", line 21, in run
worker_1     |     response = requests.get(params["url"])
worker_1     |   File "/usr/lib/python2.7/site-packages/requests/api.py", line 72, in get
worker_1     |     return request('get', url, params=params, **kwargs)
worker_1     |   File "/usr/lib/python2.7/site-packages/requests/api.py", line 58, in request
worker_1     |     return session.request(method=method, url=url, **kwargs)
worker_1     |   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
worker_1     |     resp = self.send(prep, **send_kwargs)
worker_1     |   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
worker_1     |     r = adapter.send(request, **kwargs)
worker_1     |   File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 440, in send
worker_1     |     timeout=timeout
worker_1     |   File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
worker_1     |     chunked=chunked)
worker_1     |   File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 357, in _make_request
worker_1     |     conn.request(method, url, **httplib_request_kw)
worker_1     |   File "/usr/lib/python2.7/site-packages/mrq/monkey.py", line 16, in _mrq_patched_method
worker_1     |     return method(old_method, *args, **kwargs)
...snip...
worker_1     |   File "/usr/lib/python2.7/site-packages/mrq/monkey.py", line 16, in _mrq_patched_method
worker_1     |     return method(old_method, *args, **kwargs)
worker_1     |   File "/usr/lib/python2.7/site-packages/mrq/monkey.py", line 328, in request
worker_1     |     res = old_method(self, method, url, body=body, headers=headers)
worker_1     |   File "/usr/lib/python2.7/httplib.py", line 1042, in request
worker_1     |     self._send_request(method, url, body, headers)
worker_1     |   File "/usr/lib/python2.7/httplib.py", line 1082, in _send_request
worker_1     |     self.endheaders(body)
worker_1     |   File "/usr/lib/python2.7/httplib.py", line 1038, in endheaders
worker_1     |     self._send_output(message_body)
worker_1     |   File "/usr/lib/python2.7/httplib.py", line 882, in _send_output
worker_1     |     self.send(msg)
worker_1     |   File "/usr/lib/python2.7/httplib.py", line 844, in send
worker_1     |     self.connect()
worker_1     |   File "/usr/lib/python2.7/site-packages/mrq/monkey.py", line 16, in _mrq_patched_method
worker_1     |     return method(old_method, *args, **kwargs)
...snip...
worker_1     | RuntimeError: maximum recursion depth exceeded
worker_1     | Traceback (most recent call last):
worker_1     |   File "/usr/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run
worker_1     |     result = self._run(*self.args, **self.kwargs)
worker_1     |   File "/usr/lib/python2.7/site-packages/mrq/worker.py", line 242, in greenlet_paused_queues
worker_1     |     time.sleep(self.config["paused_queues_refresh_interval"])
worker_1     | KeyError: 'paused_queues_refresh_interval'
worker_1     | Tue Aug 15 18:33:52 2017 <Greenlet at 0x7f04cd3aba50: <bound method Worker.greenlet_paused_queues of <mrq.worker.Worker object at 0x7f04cdba9190>>> failed with KeyError

In addition as you can see I faced KeyError: 'paused_queues_refresh_interval' which I don't know how to set up properly.

I tried config file and env variable. Things are messed up here a little to be honest: I can set most of the vars through env, but some of them can be only set in mrq-config.py, for example DEQUEUE_STRATEGY and few others.

Probably I did something completely wrong or stupid here, point me if it really is.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions