-
-
Notifications
You must be signed in to change notification settings - Fork 291
Open
Labels
Milestone
Description
Now all pages are stored in memory (each resource content is stored in Resource.text) which cause high memory consumption.
It would be nice to avoid storing Resource.text and save resourcess directly to FS just after they were received
Probably we can use streams for that
- for html, css:
Request -> update links/images/styles/etc. -> saveResource - all other types:
Request -> saveResourcewhen content modification is not needed
To do:
- Update Resource class - get rid of
textproperty and related functionality. Probably store reference to stream for resource - Update scraper mechanism: rework request/save functionality in scraper - replace
requestQueueproperty withstreamsQueue, replacerequestedResourcePromiseswithrequestResourceStreamsor remove it, use streams instead of promises in request file - Check and update all actions that use Resource class objects - at least
afterResponse,saveResource - Measure memory consumption of current implementation and streams implementation
Questions:
- how to handle links to pages which are not downloaded yet? Can we set reference in parent before child is loaded? (see getReference action)
raszi, kalled, pavelloz, PierrickLozach, andreiashu and 3 more