Posts

Showing posts with the label multiprocessing

multiprocessing Pool and generators

multiprocessing Pool and generators First look at the following code: pool = multiprocessing.Pool(processes=N) batch = for item in generator(): batch.append(item) if len(batch) == 10: pool.apply_async(my_fun, args=(batch,)) batch = # leftovers pool.apply_async(my_fun, args=(batch,)) Essentially I'm retrieving data from a generator, collecting in into a list and then spawning a process that consumes the batch of data. This may look fine but when the consumers (aka the pool processes) are slower than the producer (aka the generator) memory usage of the main process grows until the generator stops or... the system runs out of memory. How can I avoid this problem? Have you tried to build a list of lists and use pool.map_async() ? or maybe starmap_async ?? – wwii Jun 27 at 20:09 pool.map_async() starmap_async ...