I tried to read https://hackernoon.com/asynchronous-python-45df84b82434.
It's about asynchronous python and I tried the code from this, but I'm getting a weird Error.
The code is:
`
import asyncio
import aiohttp
urls = ['http://www.google.com', 'http://www.yandex.ru', 'http://www.python.org']
async def call_url(url):
print('Starting {}'.format(url))
response = await aiohttp.ClientSession().get(url)
data = await response.text()
print('{}: {} bytes: {}'.format(url, len(data), data))
return data
futures = [call_url(url) for url in urls]
asyncio.run(asyncio.wait(futures))
When I try to run it says:
Traceback (most recent call last):
File "test.py", line 15, in <module>
asyncio.run(asyncio.wait(futures))
AttributeError: module 'asyncio' has no attribute 'run'
sys:1: RuntimeWarning: coroutine 'call_url' was never awaited
I dont have any files named ayncio and I have proof:
>>> asyncio
<module 'asyncio' from '/usr/lib/python3.6/asyncio/__init__.py'>
asyncio.run is a Python 3.7 addition. In 3.5-3.6, your example is roughly equivalent to:
import asyncio
futures = [...]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(futures))
The asyncio.run() function was added in Python 3.7. From the asyncio.run() function documentation:
New in version 3.7: Important: this function has been added to asyncio in Python 3.7 on a provisional basis.
Note the provisional part; the Python maintainers forsee that the function may need further tweaking and updating, so the API may change in future Python versions.
At any rate, you can't use it on Python 3.6. You'll have to upgrade or implement your own.
A very simple approximation would be to use loop.run_until_complete():
loop = asyncio.get_event_loop()
result = loop.run_until_complete(coro)
although this ignores handling remaining tasks that may still be running. See the asyncio.runners source code for the complete asyncio.run() implementation.
Just in case this is useful to someone else but for me the issue was my file was called asyncio.py. I renamed it to asyncio_example.py and it started to work again (it was failing at the import statement for asyncio).
This issue helped me realize this: https://github.com/tornadoweb/tornado/issues/2868
If anyone is having a problem with no current loop try:
loop = asyncio.**new**_event_loop()
result = loop.run_until_complete(coro)
Related
Why am I getting a missing value error when I try to run 2 websockets simultaneously using ProcessPoolExecutor. It's an issue I've been banging my head against for a week. I figured someone smarter than I may have an answer or insight readily available.
Courtesy of Alpaca's API I am trying to stream market data and trade updates simultaneously using the alpaca-py sdk. Both streams run on asyncio event loops via a websocket. The code to initialize each stream is similar. Here is what it looks like for live market data.
from alpaca.data.live.stock import StockDataStream
ticker = "SPY"
stock_data_stream = StockDataStream('API_KEY', 'API_SECRET')
# Data handler where each update will arrive
async def stock_data_handler(data_stream):
# Do something with data_stream here
pass
stock_data_stream.subscribe_bars(stock_data_handler, "SPY") # Subscribe to data type
stock_data_stream.run() # Starts asyncio event loop
The above stream runs in an infinite loop. I could not figure out a way to run another stream in parallel to receive live trade updates. I found that the market stream blocks the trade updates stream and vice versa.
I experimented a lot trying to run each stream on its own new event loop. I also tried to run/schedule each streaming data coroutine in its own thread using to_thread() and run_coroutine_threadsafe() to no success. This is when I decided to turn to a more parallel approach.
So, I wrapped each stream implementation in its own function and then created a concurrent future process for each function using the ProcessPoolExecutor. A trading update should fire every time an order is canceled, so I cooked up place_trade_loop() to provide an infinite loop and ran that as a process as well.
def place_trade_loop(order_request):
while True:
order_response = trading_client.submit_order(order_request)
time.sleep(2)
cancel_orders_response = trading_client.cancel_order_by_id(order_id = order_response.id)
def crypto_stream_wrapper():
async def stock_data_handler(data_stream):
print(data_stream)
stock_data_stream.subscribe_quotes(stock_data_handler, ticker)
stock_data_stream.run()
def trading_stream_wrapper():
async def trading_stream_handler(trade_stream):
print(trade_stream)
trading_stream.subscribe_trade_updates(trading_stream_handler)
trading_stream.run()
if __name__ == '__main__':
with concurrent.futures.ProcessPoolExecutor() as executor:
f1 = executor.submit(stock_stream_wrapper)
f2 = executor.submit(trading_stream_wrapper)
f3 = executor.submit(place_trade_loop)
The place trade loop and the market data stream play perfectly well together. However, the following error results when an order is canceled. Again, a canceled order should result in the trading_stream_handler receiving a trade_stream.
error during websocket communication: 1 validation error for TradeUpdate
execution_id
field required (type=value_error.missing)
Traceback (most recent call last):
File "C:\Users\zachm\AppData\Local\Programs\Python\Python310\lib\site-packages\alpaca\trading\stream.py", line 172, in _run_forever
await self._consume()
File "C:\Users\zachm\AppData\Local\Programs\Python\Python310\lib\site-packages\alpaca\trading\stream.py", line 145, in _consume
await self._dispatch(msg)
File "C:\Users\zachm\AppData\Local\Programs\Python\Python310\lib\site-packages\alpaca\trading\stream.py", line 89, in _dispatch
await self._trade_updates_handler(self._cast(msg))
File "C:\Users\zachm\AppData\Local\Programs\Python\Python310\lib\site-packages\alpaca\trading\stream.py", line 103, in _cast
result = TradeUpdate(**msg.get("data"))
File "pydantic\main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for TradeUpdate
execution_id
field required (type=value_error.missing)
For reference:
alpaca-py/alpaca/trading/stream.py - Line 172
alpaca-py/alpaca/trading/stream.py - Line 145
alpaca-py/alpaca/trading/stream.py - Line 89
alpaca-py/alpaca/trading/stream.py - Line 103
alpaca-py/alpaca/trading/models.py - Line 510
pydantic/pydantic/main.py - Line 341
execution_id is part of the TradeUpdate model as shown in link #5. I believe the error is arising when **msg.get("data") is passed to TradeUpdate (link #4). There could also be something going on with the Alpaca platform because I noticed many of my canceled orders are listed as 'pending_cancel,' which seemed to be an issue others are dealing with. Alpaca gave a response near the end of that thread.
Finally, the following is from the ProcessPoolExecutor documentation, which may have something to do with the error?
The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.
I am sorry for such a long post. Thank you in advance for any help or encouragement you can provide!
This question already has answers here:
Importing installed package from script with the same name raises "AttributeError: module has no attribute" or an ImportError or NameError
(2 answers)
Closed 3 years ago.
I've been trying to send requests to a local server built using flask.
requests are sent using requests module of python.
I don't know if that requests.post function has been deprecated and new one's introduced or is there anything really wrong with my code. I've done everything exactly as said in this article.
Here's my code:
import requests
host_url = "http://127.0.0.1:5000"
# get the data
data_for_prediction = [int(input()) for _ in range(10)]
r = requests.post(url=host_url,json={data_for_prediction})
print(r.json())
The error I'm getting for above code is:
Traceback (most recent call last):
File "C:/Users/--/requests.py", line 1, in <module>
import requests
File "C:\Users\--\requests.py", line 8, in <module>
r = requests.post(url=host_url,json={data_for_prediction})
AttributeError: module 'requests' has no attribute 'post'
Process finished with exit code 1
my server code is:
flask_server_app = Flask(__name__)
# let's make the server now
#flask_server_app.route("/api/predict", methods=["GET", "POST"])
# my prediction function goes here
def predict():
# Get the data from the POST request & reads the received json
json_data = request.get_json(force=True)
# making prediction
# Make prediction using model loaded from disk as per the data.
prediction = ml_model.predict([[json_data]])
# return json version of the prediction
return jsonify(prediction[0])
# run the app now
if __name__ == '__main__':
flask_server_app.run(port=5000, debug=True)
I've tried checking documentation, checked many articles online and also re-wrote the whole code. But, none helped.
So, is that requests.post function deprecated and a new one's been introduced or is there anything wrong with my code.
It seems like you are writing your code in a file called requests.py so when you try to import the requests module, it does import your own file as a module. Try renaming your file...
I have functions in a local_code.py file that I would like to pass to workers through dask. I've seen answers to questions on here saying that this can be done using the upload_file() function, but I can't seem to get it working because I'm still getting a ModuleNotFoundError.
The relevant part of the code is as follows.
from dask.distributed import Client
from dask_jobqueue import SLURMCluster
from local_code import *
helper_file = '/absolute/path/to/local_code.py'
def main():
with SLURMCluster(**slurm_params) as cluster:
cluster.scale(n_workers)
with Client(cluster) as client:
client.upload_file(helper_file)
mapping = client.map(myfunc, data)
client.gather(mapping)
if __name__ == '__main__':
main()
Note, myfunc is imported from local_code, and there's no error importing it to map. The function myfunc also depends on other functions that are defined in local_code.
With this code, I'm still getting this error
distributed.protocol.pickle - INFO - Failed to deserialize b'\x80\x04\x95+\x00\x00\x00\x00\x00\x00\x00\x8c\x11local_code\x94\x8c\x$
Traceback (most recent call last):
File "/home/gallagher.r/.local/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 61, in loads
return pickle.loads(x)
ModuleNotFoundError: No module named 'local_code'
Using upload_file() seems so straightforward that I'm not sure what I'm doing wrong. I must have it in the wrong place or not be understanding correctly what is passed to it.
I'd appreciate any help with this. Please let me know if you need any other information or if there's anything else I can supply from the error file.
The upload_file method only uploads the file to the currently available workers. If a worker arrives after you call upload_file then that worker won't have the provided file.
If your situation the easiest thing to do is probably to wait until all of the workers arrive before you call upload file
cluster.scale(n)
with Client(cluster) as client:
client.wait_for_workers(n)
client.upload_file(...)
Another option when you have workers going in/out is to use the Client.register_worker_callbacks to hook into whenever a new worker is registered/added. The one caveat is you will need to serialize your file(s) in the callback partial:
fname = ...
with open(fname, 'rb') as f:
data = f.read()
client.register_worker_callbacks(
setup=functools.partial(
_worker_upload, data=data, fname=fname,
)
)
def _worker_upload(dask_worker, *, data, fname):
dask_worker.loop.add_callback(
callback=dask_worker.upload_file,
comm=None, # not used
filename=fname,
data=data,
load=True)
This will also upload the file the first time the callback is registered so you can avoid calling client.upload_file entirely.
Why does the code below work only with multiprocessing.dummy, but not with simple multiprocessing.
import urllib.request
#from multiprocessing.dummy import Pool #this works
from multiprocessing import Pool
urls = ['http://www.python.org', 'http://www.yahoo.com','http://www.scala.org', 'http://www.google.com']
if __name__ == '__main__':
with Pool(5) as p:
results = p.map(urllib.request.urlopen, urls)
Error :
Traceback (most recent call last):
File "urlthreads.py", line 31, in <module>
results = p.map(urllib.request.urlopen, urls)
File "C:\Users\patri\Anaconda3\lib\multiprocessing\pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\patri\Anaconda3\lib\multiprocessing\pool.py", line 657, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<http.client.HTTPResponse object at 0x0000016AEF204198>]'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'
What's missing so that it works without "dummy" ?
The http.client.HTTPResponse-object you get back from urlopen() has a _io.BufferedReader-object attached, and this object cannot be pickled.
pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
Traceback (most recent call last):
...
pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
TypeError: cannot serialize '_io.BufferedReader' object
multiprocessing.Pool will need to pickle (serialize) the results to send it back to the parent process and this fails here. Since dummy uses threads instead of processes, there will be no pickling, because threads in the same process share their memory naturally.
A general solution to this TypeError is:
read out the buffer and save the content (if needed)
remove the reference to '_io.BufferedReader' from the object you are trying to pickle
In your case, calling .read() on the http.client.HTTPResponse will empty and remove the buffer, so a function for converting the response into something pickleable could simply do this:
def read_buffer(response):
response.text = response.read()
return response
Example:
r = urllib.request.urlopen('http://www.python.org')
r = read_buffer(r)
pickle.dumps(r)
# Out: b'\x80\x03chttp.client\nHTTPResponse\...
Before you consider this approach, make sure you really want to use multiprocessing instead of multithreading. For I/O-bound tasks like you have it here, multithreading would be sufficient, since most of the time is spend in waiting (no need for cpu-time) for the response anyway. Multiprocessing and the IPC involved also introduces substantial overhead.
I am trying to use urllib3 in simple thread to fetch several wiki pages.
The script will
Create 1 connection for every thread (I don't understand why) and Hang forever.
Any tip, advice or simple example of urllib3 and threading
import threadpool
from urllib3 import connection_from_url
HTTP_POOL = connection_from_url(url, timeout=10.0, maxsize=10, block=True)
def fetch(url, fiedls):
kwargs={'retries':6}
return HTTP_POOL.get_url(url, fields, **kwargs)
pool = threadpool.ThreadPool(5)
requests = threadpool.makeRequests(fetch, iterable)
[pool.putRequest(req) for req in requests]
#Lennart's script got this error:
http://en.wikipedia.org/wiki/2010-11_Premier_LeagueTraceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
http://en.wikipedia.org/wiki/List_of_MythBusters_episodeshttp://en.wikipedia.org/wiki/List_of_Top_Gear_episodes http://en.wikipedia.org/wiki/List_of_Unicode_characters result = request.callable(*request.args, **request.kwds)
File "crawler.py", line 9, in fetch
print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
result = request.callable(*request.args, **request.kwds)
File "crawler.py", line 9, in fetch
print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
result = request.callable(*request.args, **request.kwds)
File "crawler.py", line 9, in fetch
print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
result = request.callable(*request.args, **request.kwds)
File "crawler.py", line 9, in fetch
print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
After adding import threadpool; import urllib3 and tpool = threadpool.ThreadPool(4) #user318904's code got this error:
Traceback (most recent call last):
File "crawler.py", line 21, in <module>
tpool.map_async(fetch, urls)
AttributeError: ThreadPool instance has no attribute 'map_async'
Here is my take, a more current solution using Python3 and concurrent.futures.ThreadPoolExecutor.
import urllib3
from concurrent.futures import ThreadPoolExecutor
urls = ['http://en.wikipedia.org/wiki/2010-11_Premier_League',
'http://en.wikipedia.org/wiki/List_of_MythBusters_episodes',
'http://en.wikipedia.org/wiki/List_of_Top_Gear_episodes',
'http://en.wikipedia.org/wiki/List_of_Unicode_characters',
]
def download(url, cmanager):
response = cmanager.request('GET', url)
if response and response.status == 200:
print("+++++++++ url: " + url)
print(response.data[:1024])
connection_mgr = urllib3.PoolManager(maxsize=5)
thread_pool = ThreadPoolExecutor(5)
for url in urls:
thread_pool.submit(download, url, connection_mgr)
Some remarks
My code is based on a similar example from the Python Cookbook by Beazley and Jones.
I particularly like the fact that you only need a standard module besides urllib3.
The setup is extremely simple, and if you are only going for side-effects in download (like printing, saving to a file, etc.), there is no additional effort in joining the threads.
If you want something different, ThreadPoolExecutor.submit actually returns whatever download would return, wrapped in a Future.
I found it helpful to align the number of threads in the thread pool with the number of HTTPConnection's in a connection pool (via maxsize). Otherwise you might encounter (harmless) warnings when all threads try to access the same server (as in the example).
Obviously it will create one connection per thread, how should else each thread be able to fetch a page? And you try to use the same connection, made from one url, for all urls. That can hardly be what you intended.
This code worked just fine:
import threadpool
from urllib3 import connection_from_url
def fetch(url):
kwargs={'retries':6}
conn = connection_from_url(url, timeout=10.0, maxsize=10, block=True)
print url, conn.get_url(url)
print "Done!"
pool = threadpool.ThreadPool(4)
urls = ['http://en.wikipedia.org/wiki/2010-11_Premier_League',
'http://en.wikipedia.org/wiki/List_of_MythBusters_episodes',
'http://en.wikipedia.org/wiki/List_of_Top_Gear_episodes',
'http://en.wikipedia.org/wiki/List_of_Unicode_characters',
]
requests = threadpool.makeRequests(fetch, urls)
[pool.putRequest(req) for req in requests]
pool.wait()
Thread programming is hard, so I wrote workerpool to make exactly what you're doing easier.
More specifically, see the Mass Downloader example.
To do the same thing with urllib3, it looks something like this:
import urllib3
import workerpool
pool = urllib3.connection_from_url("foo", maxsize=3)
def download(url):
r = pool.get_url(url)
# TODO: Do something with r.data
print "Downloaded %s" % url
# Initialize a pool, 5 threads in this case
pool = workerpool.WorkerPool(size=5)
# The ``download`` method will be called with a line from the second
# parameter for each job.
pool.map(download, open("urls.txt").readlines())
# Send shutdown jobs to all threads, and wait until all the jobs have been completed
pool.shutdown()
pool.wait()
For more sophisticated code, have a look at workerpool.EquippedWorker (and the tests here for example usage). You can make the pool be the toolbox you pass in.
I use something like this:
#excluding setup for threadpool etc
upool = urllib3.HTTPConnectionPool('en.wikipedia.org', block=True)
urls = ['/wiki/2010-11_Premier_League',
'/wiki/List_of_MythBusters_episodes',
'/wiki/List_of_Top_Gear_episodes',
'/wiki/List_of_Unicode_characters',
]
def fetch(path):
# add error checking
return pool.get_url(path).data
tpool = ThreadPool()
tpool.map_async(fetch, urls)
# either wait on the result object or give map_async a callback function for the results