I have some python that creates multiple processes to complete a task much quicker. When I create these processes I pass in a queue. Inside the processes I use queue.put(data) so I am able to retrieve the data outside of the processes. It works fantastic on my local machine, but when I upload the zip to an AWS Lambda function (Python 3.8) it says the Queue() function has not been implemented.The project runs great in the AWS Lambda when I simply take out the queue functionality so I know this is the only hang up I currently have.
I ensured to install the multiprocessing package directly to my python project by using "pip install multiprocess -t ./" as well as "pip install boto3 -t ./".
I am new to python specifically as well as AWS but the research I have come across recently potentially points we to SQS.
Reading over these SQS docs I am not sure if this is exactly what I am looking for.
Here is the code I am running in the Lambda that works locally but not on AWS. See the *'s for important pieces:
from multiprocessing import Process, Queue
from craigslist import CraigslistForSale
import time
import math
sitesHold = ["sfbay", "seattle", "newyork", "(many more)..." ]
results = []
def f(sites, category, search_keys, queue):
local_results = []
for site in sites:
cl_fs = CraigslistForSale(site=site, category=category, filters={'query': search_keys})
for result in cl_fs.get_results(sort_by='newest'):
local_results.append(result)
if len(local_results) > 0:
print(local_results)
queue.put(local_results) # Putting data *********************************
def scan_handler(event, context):
started_at = time.monotonic()
queue = Queue()
print("Running...")
amount_of_lists = int(event['amountOfLists'])
list_length = int(len(sitesHold) / amount_of_lists)
extra_lists = math.ceil((len(sitesHold) - (amount_of_lists * list_length)) / list_length)
site_list = []
list_creator_counter = 0
site_counter = 0
for i in range(amount_of_lists + extra_lists):
site_list.append(sitesHold[list_creator_counter:list_creator_counter + list_length])
list_creator_counter += list_length
processes = []
for i in range(len(site_list)):
site_counter = site_counter + len(site_list[i])
processes.append(Process(target=f, args=(site_list[i], event['category'], event['searchQuery'], queue,))) # Creating processes and creating queues ***************************
for process in processes:
process.start() # Starting processes ***********************
for process in processes:
listings = queue.get() # Getting from queue ****************************
if len(listings) > 0:
for listing in listings:
results.append(listing)
print(f"Results: {results}")
for process in processes:
process.join()
total_time_took = time.monotonic() - started_at
print(f"Sites processed: {site_counter}")
print(f'Took {total_time_took} seconds long')
This is the error the Lambda function is giving me:
{
"errorMessage": "[Errno 38] Function not implemented",
"errorType": "OSError",
"stackTrace": [
" File \"/var/task/main.py\", line 90, in scan_handler\n queue = Queue()\n",
" File \"/var/lang/lib/python3.8/multiprocessing/context.py\", line 103, in Queue\n return Queue(maxsize, ctx=self.get_context())\n",
" File \"/var/lang/lib/python3.8/multiprocessing/queues.py\", line 42, in __init__\n self._rlock = ctx.Lock()\n",
" File \"/var/lang/lib/python3.8/multiprocessing/context.py\", line 68, in Lock\n return Lock(ctx=self.get_context())\n",
" File \"/var/lang/lib/python3.8/multiprocessing/synchronize.py\", line 162, in __init__\n SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)\n",
" File \"/var/lang/lib/python3.8/multiprocessing/synchronize.py\", line 57, in __init__\n sl = self._semlock = _multiprocessing.SemLock(\n"
]
}
Does Queue() work in an AWS Lambda? What is the best way to accomplish my goal?
doesn't look like it's supported -
https://blog.ruanbekker.com/blog/2019/02/19/parallel-processing-on-aws-lambda-with-python-using-multiprocessing/
From the AWS docs
If you develop a Lambda function with Python, parallelism doesn’t come
by default. Lambda supports Python 2.7 and Python 3.6, both of which
have multiprocessing and threading modules.
The multiprocessing module that comes with Python lets you run
multiple processes in parallel. Due to the Lambda execution
environment not having /dev/shm (shared memory for processes) support,
you can’t use multiprocessing.Queue or multiprocessing.Pool.
On the other hand, you can use multiprocessing.Pipe instead of
multiprocessing.Queue to accomplish what you need without getting any
errors during the execution of the Lambda function.
Related
I am trying to:
share a dataframe between processes
update a shared dict based on calculations performed on (but not changing) that dataframe
I am using a multiprocessing.Manager() to create a dict in shared memory (to store results) and a Namespace to store/share my dataframe that I want to read from.
import multiprocessing
import pandas as pd
import numpy as np
def add_empty_dfs_to_shared_dict(shared_dict, key):
shared_dict[key] = pd.DataFrame()
def edit_df_in_shared_dict(shared_dict, namespace, ind):
row_to_insert = namespace.df.loc[ind]
df = shared_dict[ind]
df[ind] = row_to_insert
shared_dict[ind] = df
if __name__ == '__main__':
manager = multiprocessing.Manager()
shared_dict = manager.dict()
namespace = manager.Namespace()
n = 100
dataframe_to_be_shared = pd.DataFrame({
'player_id': list(range(n)),
'data': np.random.random(n),
}).set_index('player_id')
namespace.df = dataframe_to_be_shared
for i in range(n):
add_empty_dfs_to_shared_dict(shared_dict, i)
jobs = []
for i in range(n):
p = multiprocessing.Process(
target=edit_df_in_shared_dict,
args=(shared_dict, namespace, i)
)
jobs.append(p)
p.start()
for p in jobs:
p.join()
print(shared_dict[1])
When running the above, it writes to shared_dict correctly as my print statement executes with some data. I also get an error regarding the manager:
Process Process-88:
Traceback (most recent call last):
File "/Users/henrysorsky/.pyenv/versions/3.7.3/lib/python3.7/multiprocessing/managers.py", line 788, in _callmethod
conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/henrysorsky/.pyenv/versions/3.7.3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/henrysorsky/.pyenv/versions/3.7.3/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Users/henrysorsky/Library/Preferences/PyCharm2019.2/scratches/scratch_13.py", line 34, in edit_df_in_shared_dict
row_to_insert = namespace.df.loc[ind]
File "/Users/henrysorsky/.pyenv/versions/3.7.3/lib/python3.7/multiprocessing/managers.py", line 1099, in __getattr__
return callmethod('__getattribute__', (key,))
File "/Users/henrysorsky/.pyenv/versions/3.7.3/lib/python3.7/multiprocessing/managers.py", line 792, in _callmethod
self._connect()
File "/Users/henrysorsky/.pyenv/versions/3.7.3/lib/python3.7/multiprocessing/managers.py", line 779, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/Users/henrysorsky/.pyenv/versions/3.7.3/lib/python3.7/multiprocessing/connection.py", line 492, in Client
c = SocketClient(address)
File "/Users/henrysorsky/.pyenv/versions/3.7.3/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused
I understand this is coming from the manager and seems to be due to it not shutting down properly. The only similar issue I can find online:
Share list between process in python server
suggests joining all the child processes, which I am already doing.
So after a full nights sleep I realised it was actually the reading of the dataframe in shared memory that was causing issues and that at around the 20th child process, some of them were failing this read. I added a max number of processes to run at once and this solved it.
For anyone wondering, the code I used is:
import multiprocessing
import pandas as pd
import numpy as np
def add_empty_dfs_to_shared_dict(shared_dict, key):
shared_dict[key] = pd.DataFrame()
def edit_df_in_shared_dict(shared_dict, namespace, ind):
row_to_insert = namespace.df.loc[ind]
df = shared_dict[ind]
df[ind] = row_to_insert
shared_dict[ind] = df
if __name__ == '__main__':
# region define inputs
max_jobs_running = 4
n = 100
# endregion
manager = multiprocessing.Manager()
shared_dict = manager.dict()
namespace = manager.Namespace()
dataframe_to_be_shared = pd.DataFrame({
'player_id': list(range(n)),
'data': np.random.random(n),
}).set_index('player_id')
namespace.df = dataframe_to_be_shared
for i in range(n):
add_empty_dfs_to_shared_dict(shared_dict, i)
jobs = []
jobs_running = 0
for i in range(n):
p = multiprocessing.Process(
target=edit_df_in_shared_dict,
args=(shared_dict, namespace, i)
)
jobs.append(p)
p.start()
jobs_running += 1
if jobs_running >= max_jobs_running:
while jobs_running >= max_jobs_running:
jobs_running = 0
for p in jobs:
jobs_running += p.is_alive()
for p in jobs:
p.join()
for key, value in shared_dict.items():
print(f"key: {key}")
print(f"value: {value}")
print("-" * 50)
This would probably be better handled by a Queue and Pool setup rather than my hacky fix.
The problem is probably in your main process, which created the shared dict. If you forgot to use process.join() (or an infinite loop) in your main process, then the main process may finish before the other processes using the dict. This way the dict gets destroyed, and the processes cannot connect to it.
The number of processes should not be a problem. You should be able to use the dict with as many as you wish.
TL;DR This error might happen if you initiate too many new connections to multiprocessing.Manager() objects in parallel due to hard-coded backlog limit (16 at the time of writing) in multiprocessing/managers.py:
# do authentication later
self.listener = Listener(address=address, backlog=16)
self.address = self.listener.address
Details: I was starting a few hundreds subprocesses trying to get a value from multiprocessing.Manager().dict object at the very start of my program (basically instantly parallel). First few worked fine, but then they started to fail sporadically.
Interestingly, in my case, this only happened under VSCode debugger. I have found a mailing list discussion mentioning this issue more than 10 years ago. Looking at the source code of multiprocessing I found out that the backlog limit is still hard-coded (seems to get increased from 5 to 16 in modern versions). I increased it to 64 and all errors were gone.
So if the pending connections queue reaches the limit, all new connections will be refused. Especially when you run your code under debugger, connections are getting served a tick slower and the backlog buffer may get full when hundreds of them are flowing fast in parallel.
I tried to use Python to get the docker stats, by using Python's docker module.
the code is:
import docker
cli = docker.from_env()
for container in cli.containers.list():
stream = container.stats()
print(next(stream))
I run 6 docker containers, but when I run the code, It needs a few second to get all containers' stats, so is there have some good methods to get the stats immediately?
Docker stats inherently takes a little while, a large part of this is waiting for the next value to come through the stream
$ time docker stats 1339f13154aa --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
...
real 0m1.556s
user 0m0.020s
sys 0m0.015s
You could reduce the time it takes to execute by running the commands in parralell, as opposed to one at a time.
To achieve this, you could use the wonderful threading or multiprocessing library.
Digital Ocean provides a good tutorial on how to accomplish this with a ThreadPoolExecutor:
import requests
import concurrent.futures
def get_wiki_page_existence(wiki_page_url, timeout=10):
response = requests.get(url=wiki_page_url, timeout=timeout)
page_status = "unknown"
if response.status_code == 200:
page_status = "exists"
elif response.status_code == 404:
page_status = "does not exist"
return wiki_page_url + " - " + page_status
wiki_page_urls = [
"https://en.wikipedia.org/wiki/Ocean",
"https://en.wikipedia.org/wiki/Island",
"https://en.wikipedia.org/wiki/this_page_does_not_exist",
"https://en.wikipedia.org/wiki/Shark",
]
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = []
for url in wiki_page_urls:
futures.append(executor.submit(get_wiki_page_existence, wiki_page_url=url))
for future in concurrent.futures.as_completed(futures):
print(future.result())
This is what I use to get the stats directly for each running docker (still takes like 1 second per container, so I don't think it can be helped). Besides this if it helps https://docker-py.readthedocs.io/en/stable/containers.html documentation for the arguments. Hope it helps.
import docker
client = docker.from_env()
containers = client.containers.list()
for x in containers:
print(x.stats(decode=None, stream=False))
As the TheQueenIsDead suggested, it might require threading if you want to get it faster.
I'm trying to run a PyTorch model in a Django app. As it is not recommended to execute the models (or any long-running task) in the views, I decided to run it in a Celery task. My model is quite big and it takes about 12 seconds to load and about 3 seconds to infer. That's why I decided that I couldn't afford to load it at every request. So I tried to load it at settings and save it there for the app to use it. So my final scheme is:
When the Django app starts, in the settings the PyTorch model is loaded and it's accessible from the app.
When views.py receives a request, it delays a celery task
The celery task uses the settings.model to infer the result
The problem here is that the celery task throws the following error when trying to use the model
[2020-08-29 09:03:04,015: ERROR/ForkPoolWorker-1] Task app.tasks.task[458934d4-ea03-4bc9-8dcd-77e4c3a9caec] raised unexpected: RuntimeError("Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method")
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
return self.run(*args, **kwargs)
/*...*/
File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/torch/cuda/__init__.py", line 191, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Here's the code in my settings.py loading the model:
if sys.argv and sys.argv[0].endswith('celery') and 'worker' in sys.argv: #In order to load only for the celery worker
import torch
torch.cuda.init()
torch.backends.cudnn.benchmark = True
load_model_file()
And the task code
#task
def getResult(name):
print("Executing on GPU:", torch.cuda.is_available())
if os.path.isfile(name):
try:
outpath = model_inference(name)
os.remove(name)
return outpath
except OSError as e:
print("Error", name, "doesn't exist")
return ""
The print in the task shows "Executing on GPU: true"
I've tried setting torch.multiprocessing.set_start_method('spawn') in the settings.py before and after the torch.cuda.init() but it gives the same error.
Setting this method works as long as you're also using Process from the same library.
from torch.multiprocessing import Pool, Process
Celery uses "regular" multiprocessing library, thus this error.
If I were you I'd try either:
run it single threaded to see if that helps
run it with eventlet to see if that helps
read this
A quick fix is to make things single-threaded. To do that set the worker pool type of celery to solo while starting the celery worker
celery -A your_proj worker -P solo -l info
This is due to the fact that the Celery worker itself is using forking. This appears to be a currently known issue with Celery >=4.0
You used to be able to configure celery to spawn, rather than fork, but that feature (CELERYD_FORCE_EXECV) was removed in 4.0.
There is no inbuilt options to get around this. Some custom monkeypatching to do this is probably be possible, but YMMV
Some potentially viable options might be:
Use celery <4.0 with CELERYD_FORCE_EXECV enabled.
Launch celery workers on Windows (where forking is not possible anyhow)
Repeated serial calls to subprocess.Popen() in a script parallelized with mpi4py eventually cause what seems to be data corruption during communication, manifesting as a pickle.unpickling error of various types (I have seen the unpickling errors: EOF, invalid unicode character, invalid load key, unpickling stack underflow). It seems to only happen when the data being communicated is large, the number of serial calls to subprocess is large, or the number of mpi processes is large.
I can reproduce the error with python>=2.7, mpi4py>=3.0.1, and openmpi>=3.0.0. I would ultimately like to communicate python objects so I am using the lowercase mpi4py methods. Here is a minimum code which reproduces the error:
#!/usr/bin/env python
from mpi4py import MPI
from copy import deepcopy
import subprocess
nr_calcs = 4
tasks_per_calc = 44
data_size = 55000
# --------------------------------------------------------------------
def run_test(nr_calcs, tasks_per_calc, data_size):
# Init MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
comm_size = comm.Get_size()
# Run Moc Calcs
icalc = 0
while True:
if icalc > nr_calcs - 1: break
index = icalc
icalc += 1
# Init Moc Tasks
task_list = []
moc_task = data_size*"x"
if rank==0:
task_list = [deepcopy(moc_task) for i in range(tasks_per_calc)]
task_list = comm.bcast(task_list)
# Moc Run Tasks
itmp = rank
while True:
if itmp > len(task_list)-1: break
itmp += comm_size
proc = subprocess.Popen(["echo", "TEST CALL TO SUBPROCESS"],
stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
out,err = proc.communicate()
print("Rank {:3d} Finished Calc {:3d}".format(rank, index))
# --------------------------------------------------------------------
if __name__ == '__main__':
run_test(nr_calcs, tasks_per_calc, data_size)
Running this on one 44 core node with 44 mpi processes completes the first 3 "calculations" successfully, but on the final loop some processes raise:
Traceback (most recent call last):
File "./run_test.py", line 54, in <module>
run_test(nr_calcs, tasks_per_calc, data_size)
File "./run_test.py", line 39, in run_test
task_list = comm.bcast(task_list)
File "mpi4py/MPI/Comm.pyx", line 1257, in mpi4py.MPI.Comm.bcast
File "mpi4py/MPI/msgpickle.pxi", line 639, in mpi4py.MPI.PyMPI_bcast
File "mpi4py/MPI/msgpickle.pxi", line 111, in mpi4py.MPI.Pickle.load
File "mpi4py/MPI/msgpickle.pxi", line 101, in mpi4py.MPI.Pickle.cloads
_pickle.UnpicklingError
Sometimes the UnpicklingError has a descriptor, such as invalid load key "x", or EOF Error, invalid unicode character, or unpickling stack underflow.
Edit: It appears the problem disappears with openmpi<3.0.0, and using mvapich2, but it would still be good to understand what's going on.
I had the same problem. In my case, I got my code to work by installing mpi4py in a Python virtual environment, and by setting mpi4py.rc.recv_mprobe = False as suggested by Intel:
https://software.intel.com/en-us/articles/python-mpi4py-on-intel-true-scale-and-omni-path-clusters
However, in the end I just switched to using the capital letter methods Recv and Send with NumPy arrays. They work fine with subprocess and they don't need any additional tricks.
How I can implement the multiprocessing to my function.I tried like this but did not work.
def steric_clashes_parallel(system):
rna_st = system[MolWithResID("G")].molecule()
for i in system.molNums():
peg_st = system[i].molecule()
if rna_st != peg_st:
print(peg_st)
for i in rna_st.atoms(AtomIdx()):
for j in peg_st.atoms(AtomIdx()):
# print(Vector.distance(i.evaluate().center(), j.evaluate().center()))
dist = Vector.distance(i.evaluate().center(), j.evaluate().center())
if dist<2:
return print("there is a steric clash")
return print("there is no steric clashes")
mix = PDB().read("clash_1.pdb")
system = System()
system.add(mix)
from multiprocessing import Pool
p = Pool(4)
p.map(steric_clashes_parallel,system)
I've thousand of pdb or system files to test through this function. It took 2 h for one file on a single core without multiprocessing module. Any suggestion would be great help.
My traceback looks something like this:
self.run()
File "/home/sajid/sire.app/bundled/lib/python3.3/threading.py", line 858,
in run self._target(*self._args, **self._kwargs)
File "/home/sajid/sire.app/bundled/lib/python3.3/multiprocessing/pool.py", line 351,
in _handle_tasks put(task)
File "/home/sajid/sire.app/bundled/lib/python3.3/multiprocessing/connection.py", line 206,
in send ForkingPickler(buf, pickle.HIGHEST_PROTOCOL).dump(obj)
RuntimeError: Pickling of "Sire.System._System.System" instances is not enabled
(boost.org/libs/python/doc/v2/pickle.html)
The problem is that Sire.System._System.System can't be serialized so it can't be sent to the child process. Multiprocessing uses the pickle module for serialization and you can frequently do a sanity check in the main program with pickle.dumps(my_mp_object) to verify.
You have another problem, though (or I think you do, based on variable names). the map method takes an iterable and fans its iterated objects out to pool members, but it appears that you want to process system itself, not something at it iterates.
One trick to multiprocessing is to keep the payload that you send from the parent to the child simple and let the child do the heavy lifting of creating its objects. Here, you might be better off just sending down filenames and letting the children do most of the work.
def steric_clashes_from_file(filename):
mix = PDB().read(filename)
system = System()
system.add(mix)
steric_clashes_parallel(system)
def steric_clashes_parallel(system):
rna_st = system[MolWithResID("G")].molecule()
for i in system.molNums():
peg_st = system[i].molecule()
if rna_st != peg_st:
print(peg_st)
for i in rna_st.atoms(AtomIdx()):
for j in peg_st.atoms(AtomIdx()):
# print(Vector.distance(i.evaluate().center(), j.evaluate().center()))
dist = Vector.distance(i.evaluate().center(), j.evaluate().center())
if dist<2:
return print("there is a steric clash")
return print("there is no steric clashes")
filenames = ["clash_1.pdb",]
from multiprocessing import Pool
p = Pool(4, chunksize=1)
p.map(steric_clashes_from_file,filenames)
# martineau:
I tested pickle command and it gave me;
----> 1 pickle.dumps(clash_1.pdb)
RuntimeError: Pickling of "Sire.Mol._Mol.MoleculeGroup" instances is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)
----> 1 pickle.dumps(system)
RuntimeError: Pickling of "Sire.System._System.System" instances is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)
With your script it took the same time and using a single core only. dist line is iterable though. Can i run this single line over multicores ? I modify the line as;
for i in rna_st.atoms(AtomIdx()):
icent = i.evaluate().center()
for j in peg_st.atoms(AtomIdx()):
dist = Vector.distance(icent, j.evaluate().center())
There is one trick you can do to get a faster computation for each file -- processing each file sequentially, but processing the contents of the file in parallel. This relies on a number of caveats:
Your are running on a system that can fork processes (such as Linux).
The computations you are doing do not have side effects that effect the result of future computations.
It seems like this is the case in your situation, but I can't be 100% sure.
When a process is forked, all the memory in the child process is duplicated from the parent process (what's more it is duplicated in an efficient manner -- bits of memory that are only read from aren't duplicated). This makes it easy to share big, complex initial states between processes. However, once the child processes have started they will not see any changes to objects made in the parent process though (and vice versa).
Sample code:
import multiprocessing
system = None
rna_st = None
class StericClash(Exception):
"""Exception used to halt processing of a file. Could be modified to
include information about what caused the clash if this is useful."""
pass
def steric_clashes_parallel(system_index):
peg_st = system[system_index].molecule()
if rna_st != peg_st:
for i in rna_st.atoms(AtomIdx()):
for j in peg_st.atoms(AtomIdx()):
dist = Vector.distance(i.evaluate().center(),
j.evaluate().center())
if dist < 2:
raise StericClash()
def process_file(filename):
global system, rna_st
# initialise global values before creating pool
mix = PDB().read(filename)
system = System()
system.add(mix)
rna_st = system[MolWithResID("G")].molecule()
with multiprocessing.Pool() as pool:
# contents of file processed in parallel
try:
pool.map(steric_clashes_parallel, range(system.molNums()))
except StericClash:
# terminate called to halt current jobs and further processing
# of file
pool.terminate()
# wait for pool processes to terminate before returning
pool.join()
return False
else:
pool.close()
pool.join()
return True
finally:
# reset globals
system = rna_st = None
if __name__ == "__main__":
for filename in get_files_to_be_processed():
# files are being processed in serial
result = process_file(filename)
save_result_to_disk(filename, result)