Hi so I have written a multithreaded request and response handler using requests-futures library.
However, it seems to be very slow, and not asynchronous as I would imagine. The output is slow and in order, not interlaced as i would expect if it was threading properly.
My question is why is my code slow, and what can i do to speed it up? An example would be great.
here is the code:
#!/usr/bin/python
import requests
import time
from concurrent.futures import ThreadPoolExecutor
from requests_futures.sessions import FuturesSession
session = FuturesSession(executor=ThreadPoolExecutor(max_workers=12))
def responseCallback( sess, resp ):
response = resp.text
if not "things are invalid" in response in response:
resp.data = "SUCCESS %s" % resp.headers['content-length']
else:
resp.data = "FAIL %s" % resp.headers['content-length']
proxies = {
"http":"http://localhost:8080",
"https":"https://localhost:8080"
}
url = 'https://www.examplehere.com/blah/etc/'
headers= {
'Host':'www.examplehere.com',
'Connection':'close',
'Cache-Control':'max-age=0',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Origin':'https://www.examplehere.com',
'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/533.32 (KHTML, like Gecko) Ubuntu Chromium/34.0.1847.123 Chrome/34.0.1847.123 Safari/337.12',
'Content-Type':'application/x-www-form-urlencoded',
'Referer':'https://www.exampleblah.etc/',
'Accept-Encoding':'gzip,deflate,sdch',
'Accept-Language':'en-US,en;q=0.8,de;q=0.6',
'Cookie':'blah=123; etc=456;',
}
for n in range( 0, 9999 ):
#wibble = n.zfill( 4 )
wibble = "%04d" % n
payload = {
'name':'test',
'genNum':wibble,
'Button1':'Push+Now'
}
#print payload
#r = requests.post( url, data=payload, headers=headers, proxies=proxies, verify=False )
future = session.post( url, data=payload, headers=headers, verify=False, background_callback=responseCallback )
response = future.result()
print( "%s : %s" % ( wibble, response.data ) )
Ideally i'd like to fix my actual code still using the library I have already utilised, but if it's bad for some reason i'm open to suggestions...
edit: i am currently using python2 with the concurrent.futures backport.
edit: slow - approx one request a second, and not concurrent, but one after the other, so request1, response1, request2, response2 - i would expect them to be interlaced as the requests go out and come in on multiple threads?
The following code is another way to submit multiple requests, work on several of them at a time, then print out the results. The results are printed as they are ready, not necessarily in the same order as when they were submitted.
It also uses extensive logging, to help debug issues. It captures the payload for logging. Multithreaded code is hard, so more logs is more better!
source
import logging, sys
import concurrent.futures as cf
from requests_futures.sessions import FuturesSession
URL = 'http://localhost'
NUM = 3
logging.basicConfig(
stream=sys.stderr, level=logging.INFO,
format='%(relativeCreated)s %(message)s',
)
session = FuturesSession()
futures = {}
logging.info('start')
for n in range(NUM):
wibble = "%04d" % n
payload = {
'name':'test',
'genNum':wibble,
'Button1':'Push+Now'
}
future = session.get( URL, data=payload )
futures[future] = payload
logging.info('requests done, waiting for responses')
for future in cf.as_completed(futures, timeout=5):
res = future.result()
logging.info(
"wibble=%s, %s, %s bytes",
futures[future]['genNum'],
res,
len(res.text),
)
logging.info('done!')
output
69.3101882935 start
77.9430866241 Starting new HTTP connection (1): localhost
78.3731937408 requests done, waiting for responses
79.4050693512 Starting new HTTP connection (2): localhost
84.498167038 wibble=0000, <Response [200]>, 612 bytes
85.0481987 wibble=0001, <Response [200]>, 612 bytes
85.1981639862 wibble=0002, <Response [200]>, 612 bytes
85.2642059326 done!
Related
I am using StormProxies to access Etsy data but despite using proxies and implementing retries I am getting 429 Too Many Requests error most of the time(~80%+). Here is my code to access data:
import requests
def create_request(url, logging, headers={}, is_proxy=True):
r = None
try:
proxies = {
'http': 'http://{}'.format(PROXY_GATEWAY_IP),
'https': 'http://{}'.format(PROXY_GATEWAY_IP),
}
with requests.Session() as s:
retries = Retry(total=5, backoff_factor=1, status_forcelist=[502, 503, 504, 429])
s.mount('http://', HTTPAdapter(max_retries=retries))
if is_proxy:
r = s.get(url, proxies=proxies, timeout=30, headers=headers)
else:
r = s.get(url, headers=headers, timeout=30)
r.raise_for_status()
if r.status_code != 200:
print('Status Code = ', r.status_code)
if logging is not None:
logging.info('Status Code = ' + str(r.status_code))
except Exception as ex:
print('Exception occur in create_request for the url:- {url}'.format())
crash_date = time.strftime("%Y-%m-%d %H:%m:%S")
crash_string = "".join(traceback.format_exception(etype=type(ex), value=ex, tb=ex.__traceback__))
exception_string = '[' + crash_date + '] - ' + crash_string + '\n'
print('Could not connect. Proxy issue or something else')
print('==========================================================')
print(exception_string)
finally:
return r
StormProxies guys say that I implement retries, this is how I have done but it is not working for me.
I am using Python multiprocessing and spawning 30+ threads at a time.
My recommendation is, remove huge overhead based on thread management in one process (30+ is really lot).
It is more efficiency to use more processes with only a few threads (2-4 threads, based on delay with I/O) because threads in one process have to play with GIL (Global Interpreter Lock). In this case all will be only about configuration for your Python code.
I am trying to use a cryptcurrency API to get some information from a remote server in python. The example from the API how to do it is here: https://developers.cryptoapis.io/technical-documentation/blockchain-data/unified-endpoints/get-transaction-details-by-transaction-id
But when I try to run it I get an exception
Exception has occurred: InvalidURL
nonnumeric port: '//rest.cryptoapis.io/v2'
I am not sure what is wrong here (new to Python). Can someone please point out? I thought at least the formal example from the API provider must work?
My code is:
import http.client
conn = http.client.HTTPConnection("https://rest.cryptoapis.io/v2")
headers = {
'Content-Type': "application/json",
'X-API-Key': "API key provided by the software provider"
}
conn.request("GET", "blockchain-data,bitcoin,testnet,transactions,4b66461bf88b61e1e4326356534c135129defb504c7acb2fd6c92697d79eb250", headers=headers )
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))
Looks like you misunderstood the documentation. You may find this helpful:-
import requests
import json
APIKEY = 'YourAPIKeyGoesHere' # <-----
BASE = 'https://rest.cryptoapis.io/v2'
BLOCKCHAIN = 'bitcoin'
NETWORK = 'testnet'
TID = '4b66461bf88b61e1e4326356534c135129defb504c7acb2fd6c92697d79eb250'
with requests.Session() as session:
h = {'Content-Type': 'application/json',
'X-API-KEY': APIKEY}
r = session.get(
f'{BASE}/blockchain-data/{BLOCKCHAIN}/{NETWORK}/transactions/{TID}', headers=h)
r.raise_for_status()
print(json.dumps(r.json(), indent=4, sort_keys=True))
I am trying to connect to Splunk via API using python. I can connect, and get a 200 status code but when I read the content, it doesn't read the content of the page. View below:
Here is my code:
import json
import requests
import re
baseurl = 'https://my_splunk_url:8888'
username = 'my_username'
password = 'my_password'
headers={"Content-Type": "application/json"}
s = requests.Session()
s.proxies = {"http": "my_proxy"}
r = s.get(baseurl, auth=(username, password), verify=False, headers=None, data=None)
print(r.status_code)
print(r.text)
I am new to Splunk and python so any ideas or suggestions as to why this is happening would help.
You need to authenticate first to get a token, then you'll be able to hit the rest of REST endpoints. The auth endpoint it at /servicesNS/admin/search/auth/login, which will give you the session_key, which you then provide to subsequent requests.
Here is some code that uses requests to authenticate to a Splunk instance, then start a search. It then checks to see if the search is complete, if not, wait a second and then check again. Keep checking and sleeping until the search is done, then print out the results.
import time # need for sleep
from xml.dom import minidom
import json, pprint
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
base_url = 'https://localhost:8089'
username = 'admin'
password = 'changeme'
search_query = "search=search index=*"
r = requests.get(base_url+"/servicesNS/admin/search/auth/login",
data={'username':username,'password':password}, verify=False)
session_key = minidom.parseString(r.text).getElementsByTagName('sessionKey')[0].firstChild.nodeValue
print ("Session Key:", session_key)
r = requests.post(base_url + '/services/search/jobs/', data=search_query,
headers = { 'Authorization': ('Splunk %s' %session_key)},
verify = False)
sid = minidom.parseString(r.text).getElementsByTagName('sid')[0].firstChild.nodeValue
print ("Search ID", sid)
done = False
while not done:
r = requests.get(base_url + '/services/search/jobs/' + sid,
headers = { 'Authorization': ('Splunk %s' %session_key)},
verify = False)
response = minidom.parseString(r.text)
for node in response.getElementsByTagName("s:key"):
if node.hasAttribute("name") and node.getAttribute("name") == "dispatchState":
dispatchState = node.firstChild.nodeValue
print ("Search Status: ", dispatchState)
if dispatchState == "DONE":
done = True
else:
time.sleep(1)
r = requests.get(base_url + '/services/search/jobs/' + sid + '/results/',
headers = { 'Authorization': ('Splunk %s' %session_key)},
data={'output_mode': 'json'},
verify = False)
pprint.pprint(json.loads(r.text))
Many of the request calls thare used include the flag, verify = False to avoid issues with the default self-signed SSL certs, but you can drop that if you have legit certificates.
Published a while ago at https://gist.github.com/sduff/aca550a8df636fdc07326225de380a91
Nice piece of coding. One of the wonderful aspects of Python is the ability to use other people's well written packages. In this case, why not use Splunk's Python packages to do all of that work, with a lot less coding around it.
pip install splunklib.
Then add the following to your import block
import splunklib.client as client
import splunklib.results as results
pypi.org has documentation on some of the usage, Splunk has an excellent set of how-to documents. Remember, be lazy, use someone else's work to make your work look better.
I am trying to read the response after I have made a call to an API using python Twisted web client. I have made a POST call to the endpoint passing in a json structure, it should then return a status with either a message (if failed) or a json strucure if successful.
Using the code below I am able to see the message is getting called along with the status code, but I am not seeing the message/json structure.
The 'BeginningPrinter' is never getting called and I don't uderstand why.
Example of output:
$ python sample.py
Response version: (b'HTTP', 1, 0)
Response code: 401 | phrase : b'UNAUTHORIZED'
Response headers:
Response length: 28
Apologies that the code is so long, but I wanted to make sure it contains everything that I used to run it in it.
from io import BytesIO
import json
from twisted.internet import reactor
from twisted.web.client import Agent
from twisted.web.http_headers import Headers
from twisted.internet.defer import Deferred
from twisted.internet.protocol import Protocol
from twisted.web.client import FileBodyProducer
agent = Agent(reactor)
class BeginningPrinter(Protocol):
def __init__(self, finished):
self.finished = finished
self.remaining = 1024 * 10
print('begin')
def dataReceived(self, bytes):
print('bytes')
if self.remaining:
display = bytes[:self.remaining]
print('Some data received:')
print(display)
self.remaining -= len(display)
def connectionLost(self, reason):
print('Finished receiving body:', reason.getErrorMessage())
self.finished.callback(None)
TESTDATA = { "keySequence": "2019-07-14" }
jsonData = json.dumps(TESTDATA)
body = BytesIO(jsonData.encode('utf-8'))
body = FileBodyProducer(body)
headerDict = \
{
'User-Agent': ['test'],
'Content-Type': ['application/json'],
'APIGUID' : ['ForTesting']
}
header = Headers(headerDict)
d = agent.request(b'POST', b' http://127.0.0.1:5000/receiveKeyCode', header, body)
def cbRequest(response):
print(f'Response version: {response.version}')
print(f'Response code: {response.code} | phrase : {response.phrase}')
print('Response headers:')
print('Response length:', response.length)
print(pformat(list(response.headers.getAllRawHeaders())))
print(response.deliverBody)
finished = Deferred()
response.deliverBody(BeginningPrinter(finished))
return finished
d.addCallback(cbRequest)
def cbShutdown(ignored):
#reactor.stop()
pass
d.addBoth(cbShutdown)
reactor.run()
You don't need all of that fluff code, if you are already using Flask then you can write to the API and get the values back in a few lines, If you are not then it makes sense to pip install it as it makes life a lot easier.
import json
import requests
headers = {
'content-type': 'application/json',
'APIGUID' : 'ForTesting'
}
conv = {"keySequence": "2019-07-14"}
s = json.dumps(conv)
res = requests.post("http://127.0.0.1:5000/receiveKeyCode",data=s, headers=headers)
print(res.text)
Reference: See this Stackoverflow link
I'm trying to create a HTTPSConnection to this address: "android-review.googlesource.com" and send a json request.
This address: "android-review.googlesource.com" is for Gerrit code review system which uses REST API. You can find more information about the Gerrit Rest-api here:
https://gerrit-review.googlesource.com/Documentation/rest-api.html.
Each review in Gerrit code review system is related to a change request which I tried to get the change request information with a json request. This is the url and request:
url = "/gerrit_ui/rpc/ChangeDetailService"
req = {"jsonrpc" : "2.0",
"method": "changeDetail",
"params": [{"id": id}],
"id": 44
}
you can find the complete code here:
import socket, sys
import httplib
import pyodbc
import json
import types
import datetime
import urllib2
import os
import logging
import re, time
def GetRequestOrCached( url, method, data, filename):
path = os.path.join("json", filename)
if not os.path.exists(path):
data = MakeRequest(url, method, data)
time.sleep(1)
data = data.replace(")]}'", "")
f = open(path, "w")
f.write(data)
f.close()
return open(path).read()
def MakeRequest(url, method, data, port=443):
successful = False
while not successful:
try:
conn = httplib.HTTPSConnection("android-review.googlesource.com", port)
headers = {"Accept": "application/json,application/jsonrequest",
"Content-Type": "application/json; charset=UTF-8",
"Content-Length": len(data)}
conn.request(method, url, data, headers)
conn.set_debuglevel(1)
successful = True
except socket.error as err:
# this means a socket timeout
if err.errno != 10060:
raise(err)
else:
print err.errno, str(err)
print "sleep for 1 minute before retrying"
time.sleep(60)
resp = conn.getresponse()
if resp.status != 200:
raise GerritDataException("Got status code %d for request to %s" % (resp.status, url))
return resp.read()
#-------------------------------------------------
id=51750
filename = "%d-ChangeDetails.json" % id
url = "/gerrit_ui/rpc/ChangeDetailService"
req = {"jsonrpc" : "2.0",
"method": "changeDetail",
"params": [{"id": id}],
"id": 44
}
data = GetRequestOrCached(url, "POST", json.dumps(req), filename)
print json.loads(data)
In the code id means review id which can be a number between 1 and 51750, but not necessary all of these ids exist in the system so different numbers can be tried to see finally which one responds. For example these three ids definitely exist: 51750-51743-51742. I tried for these numbers but for all of them I got the same error:
"{"jsonrpc":"2.0","id":44,"error":{"code":-32603,"message":"No such service method"}}"
so I guess there is something wrong with code.
Why are you using url = "/gerrit_ui/rpc/ChangeDetailService"? That isn't in your linked REST documentation at all. I believe this is an older internal API which is no longer supported. I'm also not sure why your method is POST.
Instead, something like this works just fine for me:
curl "https://android-review.googlesource.com/changes/?q=51750"