Skip to content

File system storage client performance optimizations #1246

@vdusek

Description

@vdusek

When using the file system storage client, Crawlee for Python is significantly slower than Crawlee for JavaScript.

Processing 1000 requests to a local HTTP server (pydantic models are loaded in advance):

  • Crawlee TS - memory: ~1.8s
  • Crawlee TS - FS: ~3s
  • Scrapy - memory*: 1.9s
  • Crawlee Py - new memory: ~1.5s
  • Crawlee Py - new FS: ~13.6s

Optimize it.

Ideas:

  • atomic write - several FS operations per 1 write
  • index file for handled/sequence/forefront

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions