-
Notifications
You must be signed in to change notification settings - Fork 58
Description
Describe the bug
I have a series of large dfsu files. To keep memory requirements low, I'm reading them bit by bit using the time slicing functionality like so:
import mikeio
def get_data_for_each_timestep(file: str, item: str, time_batch_n: int = 1):
file_info = mikeio.open(file)
times = file_info.time
time_batches = list(batched(times, time_batch_n))
for time_batch in time_batches:
ds = mikeio.read(file, items=[item], time=time_batch)
This is a very stripped down version of the code and nothing is done to the data. However, memory constantly grows and grows, easily reaching 10+ GB in just a few iterations of this code. The problem is worse when reading a lower number of time steps, so there seems to be an overhead involved with mikeio.read(). Calling gc.collect()
does help, but doesn't completely solve the problem.
However, running the data fetch in a seperate process solves the problem. Here, the memory associated with that process is freed up when the process is killed:
import mikeio
import multiprocessing
def get_ds_multiprocess(file: str, item: str, time_batch: list[Any], return_dict: dict):
ds = mikeio.read(file, items=[item], time=time_batch)
da = ds[item]
return_dict['time'] = ds.time.copy() # not sure if copy() is necessary
return_dict['data'] = da.values.copy() # not sure if copy() is necessary
def get_data_for_each_timestep(file: str, item: str, time_batch_n: int = 1):
file_info = mikeio.open(file)
times = file_info.time
time_batches = list(batched(times, time_batch_n))
for time_batch in time_batches:
manager = multiprocessing.Manager()
return_dict = manager.dict()
p = multiprocessing.Process(target=get_ds_multiprocess, args=(file, item, time_batch, return_dict))
p.start()
p.join()
t = return_dict['time']
d = return_dict['data'])
Using this example, memory usage does not grow.
To Reproduce
See above
Expected behavior
To be able to iteratively read the data from very large .dfsus without running out of memory
Screenshots
If applicable, add screenshots to help explain your problem.
System information:
- Python 3.12
- MIKE IO version 2.5.0